I watched Nvidia's Computex 2024 keynote and it made my blood run cold

git [he/him, comrade/them]@hexbear.net · 7 months ago

I watched Nvidia's Computex 2024 keynote and it made my blood run cold

PorkrollPosadist [he/him, they/them]@hexbear.net · edit-2 7 months ago

All of these uses help me, both with my disability and with just making my day easier, and it all ran LOCAL HARDWARE. The fact is a almost any of the use cases for ML can be made small enough to run locally, but there are almost no good local ML accelerators. Something like 90% of sentences sent to Google home was requests to set a timer. You don’t need a data center for that.

I run LibreTranslate on matapacos.dog for inline post translation (and at home to avoid Google knowing every scrap of foreign language text I read) and it is a similar story. It runs locally (doesn’t even require a GPU) and makes no remote requests. Language models developed for specific purposes can accomplish great things for accessibility with much lower performance requirements than the gimmicky shit Silicon Valley tries to pawn off as “artificial general intelligence.”

KnilAdlez [none/use name]@hexbear.net · 7 months ago

Exactly! PCs today are powerful enough to run them in decent time without acceleration too, it would just be more efficient to have it, ultimately saving time and energy. I would be interested in seeing how much processing power is wasted to calculate what are effectively edge cases in a models real work load. What percentage of GPT-4 queries could not be answered accurately by GPT-3 or a local LLaMA model? I’m willing to bet it’s less than 10%. Terawatt-hours and hundreds of gallons of water to run a model that, for 90% of users, could be ran locally.