Move Over, ChatGPT

Powderhorn@beehaw.org · 11 days ago

Move Over, ChatGPT

spit_evil_olive_tips@beehaw.org · 11 days ago

I’m generally very skeptical of “AI” shit. but I work at a tech company, which has recently mandated “AI agents are the future, we expect everyone to use them everyday”

so I’ve started using Claude. partially out of self-preservation (since my company is handing out credentials, they are able to track everyone’s usage, and I don’t want to stick out by showing up at the very bottom of the usage metrics) and partially out of open-mindedness (I think LLMs are a pile of shit and very environmentally wasteful, but it’s possible that I’m wrong and LLMs are useful but still very environmentally wasteful)

fwiw, I have a bunch of coworkers who are generally much more enthusiastic about LLMs than I am. and their consensus is that Claude Code is indeed the best of the available LLM tools. specifically they really like the new Opus 4.5 model. Opus 4.1 is total dogshit, apparently, no one uses it anymore. AFAIK Opus 4.2, 4.3, and 4.4 don’t exist. version numbering is hard.

is Claude Code better than ChatGPT? yeah, sure. for one thing, it doesn’t try to be a fucking all-purpose “chatbot”. it isn’t sycophantic in the same way. which is good, because if my job mandated me to use ChatGPT I’d quit, set fire to my work laptop, dump the ashes into the ocean, and then shoot the ocean with a gun.

I used Claude to write a one-off bash script that analyzed a big pile of JSON & YAML files. it did a pretty good job of it. I did get the overall task done more quickly, but I think a big part of that is writing bash scripts of that level of complexity is really fucking annoying. when faced with a task where I have to do it, task avoidance kicks in and I’ll procrastinate by doing something else.

importantly, the output of the script was a text file that I sent to one of my coworkers and said “here’s that thing you wanted, review it and let me know if it makes sense”. it wasn’t mission critical at all. if they had responded that the text file was wrong, I could have told them “oh sorry, Claude totally fucked up” and poked at Claude to write a different script.

and at the same time…it still sucks. maybe these models are indeed getting “smarter”, but people continue to overestimate their intelligence. it is still Dunning-Kruger As A Service.

this week we had what infosec people call an “oopsie” with some other code that Claude had written.

there was a pre-existing library that expected an authentication token to be provided as an environment variable (on its own, a fairly reasonable thing to do)

there was a web server that took HTTP requests, and the job Claude was given was to write code that would call this library in order to build a response to the request.

Claude, being very smart and very good at drawing a straight line between two points, wrote code that took the authentication token from the HTTP request header, modified the process’s environment variables, then called the library

(98% of people have no idea what I just said, 2% of people have their jaws on the floor and are slowly backing away from their computer while making the sign of the cross)

for the uninitiated - a process’s environment variables are global. and HTTP servers are famously pretty good at dealing with multiple requests at once. this means that user A and user B would make requests at the same time, and user A would end up seeing user B’s data entirely by accident, without trying to hack or do anything malicious at all. and if user A refreshed the page they might see their own data, or they might see user C’s data, entirely from luck of the draw.

django · 11 days ago

Claude, being very smart and very good at drawing a straight line between two points, wrote code that took the authentication token from the HTTP request header, modified the process’s environment variables, then called the library

Brilliant, 10/10. 😂

thecodemonk@programming.dev · 10 days ago

took the authentication token from the HTTP request header, modified the process’s environment variables, then called the library

Not to defend claude or anything, but I had a junior do something extremely similar to this once. Lol

calliope@retrolemmy.com · 10 days ago

Yep, this is exactly how most people describe using an AI chat bot to write code.

It’s a junior developer who can’t learn.

That sounds so frustrating to me.

tal@lemmy.today · 11 days ago

In all fairness, while this is a particularly bad case, the fact that it’s often very difficult to safely fiddle with environment variables at runtime in a process, but very convenient as a way to cram extra parameters into a library have meant that a lot of human programmers who should know better have created problems like this too.

IIRC, setting the timezone for some of the Posix time APIs on Linux has the same problem, and that’s a system library. And IIRC SDL and some other graphics libraries, SDL and IIRC Linux 3D stuff, have used this as a way to pass parameters out-of-band to libraries, which becomes a problem when programs start dicking with it at runtime. I remember reading some article from someone who had been banging into this on Linux gaming about how various programs and libraries for games would setenv() to fiddle with them, and races associated with that were responsible for a substantial number of crashes that they’d seen.

setenv() is not thread-safe or signal-safe. In general, reading environment variables in a program is fine, but messing with them in very many situations is not.

searches

Yeah, the first thing I see is someone talking about how its lack of thread-safety is a problem for TZ, which is the time thing that’s been a pain for me a couple times in the past.

https://news.ycombinator.com/item?id=38342642

Back on your issue:

Claude, being very smart and very good at drawing a straight line between two points, wrote code that took the authentication token from the HTTP request header, modified the process’s environment variables, then called the library

for the uninitiated - a process’s environment variables are global. and HTTP servers are famously pretty good at dealing with multiple requests at once.

Note also that a number of webservers used to fork to handle requests — and I’m sure that there are still some now that do so, though it’s certainly not the highest-performance way to do things — and in that situation, this code could avoid problems.

searchs

It sounds like Apache used to and apparently still can do this:

https://old.reddit.com/r/PHP/comments/102vqa2/why_does_apache_spew_a_new_process_for_each/

But it does highlight one of the “LLMs don’t have a broad, deep understanding of the world, and that creates problems for coding” issues that people have talked about. Like, part of what someone is doing when writing software is identifying situations where behavior isn’t defined and clarifying that, either via asking for requirements to be updated or via looking out-of-band to understand what’s appropriate. An LLM that’s working by looking at what’s what commonly done in its training set just isn’t in a good place to do that, and that’s kinda a fundamental limitation.

I’m pretty sure that the general case of writing software is AI-hard, where the “AI” referred to by the term is an artificial general intelligence that incorporates a lot of knowledge about the world. That is, you can probably make an AI to program write software, but it won’t be just an LLM, of the “generative AI” sort of thing that we have now.

There might be ways that you could incorporate an LLM into software that can write software themselves. But I don’t think that it’s just going to be a raw “rely on an LLM taking in a human-language set of requirements and spitting out code”. There are just things that that can’t handle reasonably.

James R Kirk@startrek.website · 11 days ago

This is interesting but I wonder how he verified the data it was spitting out if he doesn’t know how to code?

Mark with a Z@suppo.fi · 11 days ago

He doesn’t need to understand it. Claude understands it.

James R Kirk@startrek.website · 11 days ago

🙌Hail all knowing Claude🙌

Powderhorn@beehaw.org · 10 days ago

Oh my god … they fixed Landru?

TehPers@beehaw.org · 11 days ago

If he doesn’t care or need to verify it, then it doesn’t really matter.

These tools are great at creating demoable MVPs. They’re terrible at creating maintainable codebases, and cannot be relied on to generate correct code. But if all you need is a demo or MVP, then it’s likely you don’t care, and that’s often the case for personal tools that non-coders want to use.

The people using it to manage their personal finances are nuts though.

James R Kirk@startrek.website · 10 days ago

Ah yeah I’m with you. I actually think LLMs are a useful tool for that initial push- a search query, rough draft (or demo). But I’m not convinced they could ever move beyond that, since creating rigid, reliable structure isn’t what they’re designed to do.

10 days ago

You can’t fully verify it, but Claude is somewhat chatty. It’ll output its whole “thought process”, which can be reviewed. I recently had Claude write some C# analyzers for me, which I don’t quite know how to write from scratch. I can easily review its reasoning and correct it if it makes a mistake. It’ll say something like “Oh, I need to change X or Y” and you can then tell it it’s an idiot and correct it.

It’s by no means perfect and it does need a good reviewer though. I’ve seen it just “give up” fixing a test, subsequently deleting the test entirely. If you’re a good code reviewer, you can probably fairly effectively use these tools.