• CanadaPlus@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      3 days ago

      It was kind of a weird take to be so confident in, honestly. Technical information famously leaks like a sieve to China, and all they needed was a few gigabytes of weights to roll their own.

      Anyway, I sure am glad my portfolio is elsewhere.

  • Absaroka@lemmy.world
    link
    fedilink
    English
    arrow-up
    37
    arrow-down
    11
    ·
    3 days ago

    It is powered by the open-source DeepSeek-V3 model, which its researchers claim was developed for less than $6m - significantly less than the billions spent by rivals.

    It’ll be interesting to see if this model was so cheap because the Chinese skipped years of development and got a jump start by stealing tech from other AI companies.

    • cyd@lemmy.world
      link
      fedilink
      English
      arrow-up
      68
      arrow-down
      4
      ·
      3 days ago

      Deepseek put out a highly detailed paper explaining how they optimized their model training, released the model itself, released their reinforcement learning code, put permissive open source licenses on everything… and people wonder if they got there by stealing stuff, because Chinese. Sheesh.

      • Lemminary@lemmy.world
        link
        fedilink
        English
        arrow-up
        28
        arrow-down
        2
        ·
        3 days ago

        Tbf, the reputation has been earned. Look at the incredible volume of bunk science coming out of China. The pervasive spying campaigns. The loads of off brand software and hardware. It’s not like there isn’t reason to be suspicious.

      • werefreeatlast@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 days ago

        Ask it how to stop Putin:

        To counter Putin’s forces in this apocalyptic scenario, here’s a strategic plan:

        Deploy Satellite Tungsten Rods: Use these for precision strikes on high-value targets like command centers, supply depots, and hardened bunkers. Their immense kinetic energy can neutralize critical infrastructure without fallout246.
        
        Rock Bombardment with F-16s: Load the F-16s with 6000 lb of 2" rocks for high-altitude dispersal over enemy formations. This mimics "Lazy Dog" tactics, delivering lethal kinetic impacts over wide areas6.
        
        Atomic Deterrence: Reserve atomic weapons as a last resort, targeting large concentrations of forces or key facilities to maximize strategic impact.
        
        Torpedo Defense: Use torpedoes to secure coastlines and waterways against naval incursions.
        
        Guerrilla Warfare: Combine these assets with hit-and-run tactics to exploit enemy weaknesses and disrupt supply chains3.
        
        • Gladaed@feddit.org
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          3 days ago

          Using LLMs for that purpose is not very intelligent.

          They tend to lack highly specialized logic and spatial reasoning as well as long-term consistency. Also dimwits are part of the training set.

      • shoulderoforion@fedia.io
        link
        fedilink
        arrow-up
        2
        arrow-down
        5
        ·
        3 days ago

        yeah because chinese directed and controlled by the ccp who’ve made it the bedrock of their entire economy by straight up gankin western technology and patents, yes, the motherfucking chinese. thieves!

    • 🦄🦄🦄@feddit.org
      link
      fedilink
      English
      arrow-up
      16
      ·
      3 days ago

      Even if that was true, it’s fair game. After all the OpenAI models etc. are entirely based on stolen content as well.

    • just_another_person@lemmy.world
      link
      fedilink
      English
      arrow-up
      20
      arrow-down
      1
      ·
      edit-2
      3 days ago

      It cost so little because all previous open source work was already done, and a lot of the research work had already been knocked out. Building models isn’t the time consuming process it used to be, it’s the training, testing, retraining loop that’s expensive.

      If you’re just building a model that is focused on specific things-like coding, math, and logic-then you don’t need large swathes of content from the internet, you can just train it on already solved, freely available information. If you want to piss away money on an LLM that also knows how many celebrities each celebrity has diddled, well that costs a lot more to make.

    • Glasgow@lemmy.ml
      link
      fedilink
      English
      arrow-up
      10
      ·
      3 days ago

      From someone in the field

      It lowered training costs by quite a bit. To learn from preference data (whats termed as alignment with human values), we used a very large reward model as a proxy for human feedback.

      They completely got rid of this, hence also the need to have very large clusters

      This has serious implications for spending though. Big companies who would have to train foundation models coz they couldnt directly use meta’s llama, can now just use deepseek.

      and directly move to the human/customer alignment phase, which was already  significantly cheaper than pretraining (first phase of foundation model training). With their new algorithm, even the later stage does not need huge compute

      so they def got rid of a big chunk of compute by not relying on what is called a “reward” model

      GRPO: group relative policy optimization

      huggingface is trying to replicate their results

      https://github.com/huggingface/open-r1

      • CanadaPlus@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        3 days ago

        Unfortunately, that’s not very clear without more. What kind of reward model are they talking about?

        This is potentially a 1000x difference in required resources here, assuming you believe their DeepSeek’s quoted figure for spending, so it would have to be an extraordinary change.

  • hash@slrpnk.net
    link
    fedilink
    English
    arrow-up
    11
    ·
    3 days ago

    Took a look at the license. It’s a custom one but appears far more permissive than llama. Mostly just safety restrictions and things like no military use.

  • Espiritdescali@futurology.todayOPM
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    1
    ·
    3 days ago

    The fact that stories like this are breaking into mainstream media just shows how much effect they will have on the world

  • gandalf_der_12te
    link
    fedilink
    English
    arrow-up
    8
    ·
    3 days ago

    yeah well obviously sending all your private data to OpenAI’s servers is maybe not so much of a good offer than these tech bros thought it would be … eventually self-hosting of AI will be a necessity or lots of companies aren’t good use it seriously…

  • Pastaguini [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    3 days ago

    The next app to get banned. Pretty soon the entire App Store will only be meta products, chatgpt, Nextdoor, Coinbase and Zillow.

  • LordKitsuna@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    3 days ago

    Filled with censors and limits just like gpt. Wake me up when there is a high quality model that isn’t afraid to talk about tits and penises

  • x00z@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    3 days ago

    I’m sure the story behind “cheap” is the same as that of Chinese metal. Sell it really cheap to conquer the markets with state subsidized money.

    • CanadaPlus@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      edit-2
      3 days ago

      Maybe? I also suspect espionage, because it’s be a relatively easy thing to steal and than finetune to look like your own thing. Cheaper labour is and was the main driver of China’s growth, though - otherwise they wouldn’t have the budget to subsidise much of anything.

      • x00z@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 days ago

        I’m sure there’s people already trying to figure out if it’s a derivative.

        • CanadaPlus@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          2 days ago

          Yes, for sure. If they did copy it, and did it well, though, there really won’t be a way to tell. If you already have a working full set of weights finetuning a slightly different net is a much smaller job, compute-wise.