• flizzo@awful.systems
    link
    fedilink
    English
    arrow-up
    7
    ·
    12 hours ago

    Not that it should be a regular thing but it is fun to watch this post get swarmed by slopfan reply guys

    • self@awful.systems
      link
      fedilink
      English
      arrow-up
      9
      ·
      12 hours ago

      it’s turning out the most successful thing about deepseek was whatever they did to trick the worst fossbro reply guys you’ve ever met into going to bat for them

  • bjorney@lemmy.ca
    link
    fedilink
    English
    arrow-up
    38
    arrow-down
    6
    ·
    2 days ago

    I’m sorry but this says nothing about how they lied about the training cost - nor does their citation. Their argument boils down to “that number doesn’t include R&D and capital expenditures” but why would that need to be included - the $6m figure was based on the hourly rental costs of the hardware, not the cost to build a data center from scratch with the intention of burning it to the ground when you were done training.

    It’s like telling someone they didn’t actually make $200 driving Uber on the side on a Friday night because they spent $20,000 on their car, but ignoring the fact that they had to buy the car either way to get to their 6 figure day job

    • ebu@awful.systems
      link
      fedilink
      English
      arrow-up
      21
      ·
      1 day ago

      i think you’re missing the point that “Deepseek was made for only $6M” has been the trending headline for the past while, with the specific point of comparison being the massive costs of developing ChatGPT, Copilot, Gemini, et al.

      to stretch your metaphor, it’s like someone rolling up with their car, claiming it only costs $20 (unlike all the other cars that cost $20,000), when come to find out that number is just how much it costs to fill the gas tank up once

      • Soyweiser@awful.systems
        link
        fedilink
        English
        arrow-up
        5
        ·
        10 hours ago

        Now im imagining GPUs being traded like old cars.

        slaps GPU This GPU? perfectly fine, second hand yes, but only used to train one model, by an old lady, will run the upcoming monster hunter wilds perfectly fine.

      • bjorney@lemmy.ca
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        3
        ·
        1 day ago

        DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

        Emphasis mine. Deepseek was very upfront that this 6m was training only. No other company includes r&d and salaries when they report model training costs, because those aren’t training costs

        • ebu@awful.systems
          link
          fedilink
          English
          arrow-up
          9
          ·
          edit-2
          23 hours ago

          consider this paragraph from the Wall Street Journal:

          DeepSeek said training one of its latest models cost $5.6 million, compared with the $100 million to $1 billion range cited last year by Dario Amodei, chief executive of the AI developer Anthropic, as the cost of building a model.

          you’re arguing to me that they technically didn’t lie – but it’s pretty clear that some people walked away with a false impression of the cost of their product relative to their competitors’ products, and they financially benefitted from people believing in this false impression.

          • bjorney@lemmy.ca
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            5
            ·
            22 hours ago

            but it’s pretty clear that some people walked away with a false impression of the cost of their product relative to their competitors’ products

            Ask yourself why that may be, as you are the one who posted a link to a WSJ article that is repeating an absurd 100m-1b figure from a guy who has a vested interest in making the barrier of entry into the field seem as high as possible the increase the valuation of his company. Did WSJ make an attempt to verify the accuracy of these statements? Did it push for further clarification? Did it compare those statements to figures that have been made public by Meta and OpenAI? No on all counts - yet somehow “deepseek lied” because it explicitly stated their costs didn’t include capex, salaries, or R&D, but the media couldn’t be bothered to read to the end of the paragraph

            • ebu@awful.systems
              link
              fedilink
              English
              arrow-up
              5
              ·
              20 hours ago

              “the media sucks at factchecking DeepSeek’s claims” is… an interesting attempt at refuting the idea that DeepSeek’s claims aren’t entirely factual. beyond that, intentionally presenting true statements that lead to false impressions is a kind of dishonesty regardless. if you mean to argue that DeepSeek wasn’t being underhanded at all and just very innocently presented their figures without proper context (that just so happened to spurn a media frenzy in their favor)… then i have a bridge to sell you.

              besides that, OpenAI is very demonstrably pissing away at least that much money every time they add one to the number at the end of their slop generator

              • bjorney@lemmy.ca
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                6
                ·
                edit-2
                19 hours ago

                “the media sucks at factchecking DeepSeek’s claims” is… an interesting attempt at refuting the idea that DeepSeek’s claims aren’t entirely factual.

                That’s the opposite of what I’m saying. Deepseek is the one under scrutiny, yet they are the only one to publish source code and training procedures of their model. So far the only argument against them is “if I read the first half of a sentence in deepseeks whitepaper and pretend the other half of the sentence doesn’t exist, I can generate a newsworthy headline”. So much so that you just attempted to present a completely absurd and unverifiable number from a guy with a financial incentive to exaggerate, and a non apples-to-apples comparison made by WSJ as airtight evidence against them. OpenAI allegedly has enough hardware to invalidate deepseeks training claims in roughly five hours - given the massive financial incentive to do so, if deepseek was being untrustworthy, you don’t think they would have done so by now?

                if you mean to argue that DeepSeek wasn’t being underhanded at all and just very innocently presented their figures without proper context (that just so happened to spurn a media frenzy in their favor)… then i have a bridge to sell you.

                What do you mean proper context? I posted their full quote above, they presented their costs with full and complete context, such that the number couldn’t be misconstrued without one being willfully ignorant.

                OpenAI is very demonstrably pissing away at least that much money every time they add one to the number at the end of their slop generator

                It sounds to me like you have a very clear bias, and you don’t care at all about whether or not what they said is actually true or not, as long as the headlines about AI are negative

                • ebu@awful.systems
                  link
                  fedilink
                  English
                  arrow-up
                  5
                  ·
                  edit-2
                  17 hours ago

                  That’s the opposite of what I’m saying. Deepseek is the one under scrutiny, yet they are the only one to publish source code and training procedures of their model.

                  this has absolutely fuck all to do with anything i’ve said in the slightest, but i guess you gotta toss in the talking points somewhere

                  e: it’s also trivially disprovable, but i don’t care if it’s actually true, i only care about headlines negative about AI

                • self@awful.systems
                  link
                  fedilink
                  English
                  arrow-up
                  8
                  ·
                  19 hours ago

                  this is utterly pointless and you’ve taken up way too much space in the thread already

                  It sounds to me like you have a very clear bias, and you don’t care at all about whether or not what they said is actually true or not, as long as the headlines about AI are negative

                  oh no, anti-AI bias in TechTakes? unthinkable

      • msage@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        5
        ·
        1 day ago

        No, it’s not. OpenAI doesn’t spend all that money on R&D, they spent majority of it on the actual training (hardware, electricity).

        And that’s (supposedly) only $6M for Deepseek.

        So where is the lie?

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          23 hours ago

          shot:

          majority of it on the actual training (hardware, …)

          chaser:

          And that’s (supposedly) only $6M for Deepseek.

          citation:

          After experimentation with models with clusters of thousands of GPUs, High Flyer made an investment in 10,000 A100 GPUs in 2021 before any export restrictions. That paid off. As High-Flyer improved, they realized that it was time to spin off “DeepSeek” in May 2023 with the goal of pursuing further AI capabilities with more focus.

          So where is the lie?

          your post is asking a lot of questions already answered by your posting

          • msage@programming.dev
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            6
            ·
            1 day ago

            SemiAnalysis is “confident”

            They did not answer anything, only alluded.

            Just because they bought GPUs like everyone else doesn’t mean they could not train it cheaper.

            • self@awful.systems
              link
              fedilink
              English
              arrow-up
              7
              ·
              19 hours ago

              standard “fuck off programming.dev” ban with a side of who the fuck cares. deepseek isn’t the good guys, you weird fucks don’t have to go to a nitpick war defending them, there’s no good guys in LLMs and generative AI. all these people are grifters, all of them are gaming the benchmarks they designed to be gamed, nobody’s getting good results out of this fucking mediocre technology.

  • leisesprecher@feddit.org
    link
    fedilink
    English
    arrow-up
    43
    arrow-down
    7
    ·
    2 days ago

    Even if they greatly underreported costs and their services are banned: the models are out there, open source and way more efficient than anything Meta and OpenAI could produce.

    So it’s pretty obvious that the tech giants are burning money for mediocre output.

    • froztbyte@awful.systems
      link
      fedilink
      English
      arrow-up
      14
      ·
      1 day ago

      you do know that you don’t have to be a pliant useful idiot like this, right? doing the free “open source” pr repetition (when it’s none of that)? shit’s more like shareware (if that at all - certainly doesn’t have the same spiritual roots as shareware. for them it’s some shit thrown over the wall to keep the rabble quiet)

      (it’d be nice if we could popularise something like how kernel will go “tainted”, but unfortunately the entire fucking llm field is so we’d need a stronger word)

      • corbin@awful.systems
        link
        fedilink
        English
        arrow-up
        5
        ·
        21 hours ago

        Look, I get your perspective, but zooming out there is a context that nobody’s mentioning, and the thread deteriorated into name-calling instead of looking for insight.

        In theory, a training pass needs one readthrough of the input data, and we know of existing systems that achieve that, from well-trodden n-gram models to the wholly-hypothetical large Lempel-Ziv models. Viewed that way, most modern training methods are extremely wasteful: Transformers, Mamba, RWKV, etc. are trading time for space to try to make relatively small models, and it’s an expensive tradeoff.

        From that perspective, we should expect somebody to eventually demonstrate that the Transformers paradigm sucks. Mamba and RWKV are good examples of modifying old ideas about RNNs to take advantage of GPUs, but are still stuck in the idea that having a GPU perform lots of gradient descent is good. If you want to critique something, critique the gradient worship!

        I swear, it’s like whenever Chinese folks do anything the rest of the blogosphere goes into panic. I’m not going to insult anybody directly but I’m so fucking tired of mathlessness.

        Also, point of order: Meta open-sourced Llama so that their employees would stop using Bittorrent to leak it! Not to “keep the rabble quiet” but to appease their own developers.

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          5
          ·
          20 hours ago

          Look, I get your perspective, but zooming out there is a context that nobody’s mentioning

          I’m aware of that yeah, but it’s not a field I’m actively engaged in atm and not likely to be any time soon either (from no desire to work in it follows no desire to wade through the pool of scum). but also not really the place to be looking for insight. it is the place wherein to ridicule the loons and boosters

          we should expect somebody to eventually demonstrate that the Transformers paradigm sucks

          been wondering whether that or the next winter will get here first.

          If you want to critique something, critique the gradient worship

          did that a couple of years ago already, part of why I was already nice and burned out on so much of this nonsense when midjourney/stablediffusion started kicking around

          it’s like whenever Chinese folks do anything the rest of the blogosphere goes into panic

          [insert condensed comment about mentality of US/SFBA-influenced tech sector (and, really, it is US specifically; eurozone’s a somewhat different beast), american exceptionalism, sinophobia, and too-fucking-many years of “founder” stories]

          it really is tedious though, yeah. when it happens, I try to just avoid some feeds. limited spoons.

          but I’m so fucking tired of mathlessness

          as you know, the bayfucker way (for getting on close to 20y now) is to get big piles of money and try to outspend your competition. why bother optimising or thinking about things if you can just throw another 87345243 computers at the problem? (I do still agree with you, but see above re desire and intent)

          re the open source thing: it’s a wider problem than just that, and admittedly I’m peeved about it from this larger scope. I didn’t expound on it in my previous comment because (as above) largely not really the place. that said, soapbox:

          there’s a thing I’ve been noticing as a creeping trend lately. I call it “open source veneer”, which is still a bit imprecise[0] but I think you’ll get what I mean. it’s the phenomenon of shit like this. of “projects” on github that are no more than a fancy readme and some “contributors” and whatnot, but no actual code (or ability to make full use of what is provided). of companies that build “open source” and then as soon as something (usually VC-/“earnings”-related decisions) happens, the entire project gets deeply buried (links disappear off main sites, leaving product/service only), actively hobbled (“oh you want to set this up yourself? glhf gfy”, done in oh so many ways[1]), or often even entirely disappeared[2]

          [0] - still working through the thought, should probably write about it soon

          [1] - backend codebases lagging because “not feature priority”, entirely missing documentation, wholly missing key sections of code which are “conveniently” left out, etc etc; examples off the top of my head: zotero, signal, firefox weave for a while. there’s plenty more if you look

          [2] - been noticing this especially frequently with some security stuff, but it’s hardly the only example set

      • leisesprecher@feddit.org
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        11
        ·
        1 day ago

        The model is MIT licensed.

        Of course you’re free to go full Stallman, but that’s an open source license.

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          23
          ·
          1 day ago

          the build artifact is distributed MIT-licensed, that’s substantially different (and intentionally subversive). there is no reproducibility. which, you know, hint hint nudge nudge that thing that I already said

          I realize that outsourced thinking is why you want LLMs, but it clearly still doesn’t help. maybe you should try the old brainmeat. just stop huffing your farts first, those are bad for you

          • leisesprecher@feddit.org
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            13
            ·
            1 day ago

            So in that thinking, Wikipedia is not open source, if the editor used a proprietary browser?

            Maybe you should try not to act like a complete asshole. You’re pedantic in all the wrong places and extremely arrogant. I know, living in your lonely world makes a bitter person, but you’re still wrong and you’re still an asshole.

              • self@awful.systems
                link
                fedilink
                English
                arrow-up
                15
                ·
                1 day ago

                also:

                So in that thinking, Wikipedia is not open source, if the editor used a proprietary browser?

                fucking no! how in fuck do you manage to misunderstand LLMs so much that you think the weights not being reproducible is at all comparable to… editing Wikipedia from a proprietary browser??? this shit isn’t even remotely exotic from an open source standpoint — it’s a binary blob loaded by an open source framework, like how binary blob modules taint the Linux kernel (you glided right past this reference when our other poster made it, weird that) or how loading a proprietary ROM in an open source emulator doesn’t make the ROM open source. the weights being permissively licensed doesn’t make them open source (or really make any sense at all) if the source literally isn’t available.

    • tyler@programming.dev
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      6
      ·
      2 days ago

      I’m very confused by this, I had the same discussion with my coworker. I understand what the benchmarks are saying about these models, but have any of y’all actually used deepseek? I’ve been running it since it came out and it hasn’t managed to solve a single problem yet (70b param model, I have downloaded the 600b param model but haven’t tested it yet). It essentially compares to gpt-3 for me, which only cost OpenAI like $4-9 million to train (can’t remember the exact number right now).

      I just do not see the “efficiency” here.

      • self@awful.systems
        link
        fedilink
        English
        arrow-up
        19
        ·
        2 days ago

        what if none of it’s good, all of it’s fraud (especially the benchmarks), and having a favorite grifter in this fuckhead industry is just too precious

        • Pup Biru@aussie.zone
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          12
          ·
          1 day ago

          well, it’s free to download and run locally so i struggle to see what the grift is

          • istewart@awful.systems
            link
            fedilink
            English
            arrow-up
            6
            ·
            19 hours ago

            Customer acquisition cost for a future service, which is ~fixed after training costs, assuming we can consider distribution costs as marginal. Reasonably impressive accomplishment, if one is taking the perspective of SV SaaS-financing brain.*

            *I don’t recommend you do this for too long, that’s how some of the people currently prominent in the news got to be the way that they are

      • Ksin@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        12
        ·
        1 day ago

        The 70b model is a distilation of Llama3.3, that is to say it replicates the output of Llama3.3 while using the deepseekR1 architecture for better processing efficiency. So any criticism of the capability of the model is just criticism of Llama3.3 and not deepseekR1.

        • bitofhope@awful.systems
          link
          fedilink
          English
          arrow-up
          10
          ·
          1 day ago

          Thank you for shedding light on the matter. I never realized that 69b model is a pisstillation of Lligma peepee point poopoo, that is to say it complicates the outpoop of Lligma4.20 while using the creepbleakR1 house design for better processing deficiency. Now I finally realize that any criticism of Kraftwerk’s 1978 hit Das Model is just criticism of Sugma80085 and not deepthroatR1.

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          9
          ·
          1 day ago

          [to the tune of Fort Minor’s Remember The Name]

          10% senseless, 20% post
          15% concentrated spirit of boast
          5% reading, 50% pain
          and a 100% reason to not post here again
          
      • Pup Biru@aussie.zone
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        11
        ·
        1 day ago

        i haven’t seen another reasoning model that’s open and works as well… it’s LLM base is for sure about GPT-3 levels (maybe a bit better?) but like the “o” in GPT-4o

        the “thinking” part definitely works for me - ask it to do maths for example, and it’s fascinating to see it break down the problem into simple steps and then solve each step

        • blakestacey@awful.systems
          link
          fedilink
          English
          arrow-up
          16
          ·
          1 day ago

          the “thinking” part definitely works for me

          [bites tongue, tries really hard to avoid the obvious riposte]

  • veroxii@aussie.zone
    link
    fedilink
    English
    arrow-up
    31
    arrow-down
    7
    ·
    2 days ago

    banned from use by government employees in Australia

    So is every other AI except copilot built into Microsoft products. Government employees can’t use chatgpt directly. So this point is a bit disingenuous.

  • Empricorn@feddit.nl
    link
    fedilink
    English
    arrow-up
    17
    ·
    2 days ago

    I’m sure the next AI will be the ethical, uncensored, environmentally sustainable one…

  • skillissuer
    link
    fedilink
    English
    arrow-up
    18
    ·
    2 days ago

    wait, 2021 was when crypto was still a thing vcs poured money into, so that might be yet another case of crypto to ai pivot

    • ikt@aussie.zone
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      16
      ·
      edit-2
      1 day ago

      Jesus you still think AI is comparable to crypto? What year are you in? 2022?

      • istewart@awful.systems
        link
        fedilink
        English
        arrow-up
        7
        ·
        19 hours ago

        Both based on desperately trying to find an application for linear algebra accelerators that can generate VC-scale inflated financial valuations, so, yeah

      • skillissuer
        link
        fedilink
        English
        arrow-up
        13
        ·
        edit-2
        1 day ago

        mods, offer him a battle he has no chance of winning

        • skillissuer
          link
          fedilink
          English
          arrow-up
          23
          ·
          1 day ago

          ai is pushed by the same people as crypto, uses the same resources as crypto, captures attention of the same libertarian-brained vcs wanting to build their neofeudal empires, gives result equally as useless, unwanted and aggressively pushed by people that bought into it, not to mention crimes against environment, logic, abuse of workforce or general waste of everyone’s time and attention. but nOo iTs CoMpLeTeLy dIfFeReNt tHiS tImE

            • manicdave@feddit.uk
              link
              fedilink
              English
              arrow-up
              4
              ·
              16 hours ago

              To be fair to bitcoin, it might waste $160 worth of electricity per second, but it does get the right answer once every ten minutes.