Also just because the code works, doesn’t mean it’s good code.
I’ve had to review code the other day which was clearly created by an LLM. Two classes needed to talk to each other in a bit of a complex way. So I would expect one class to create some kind of request data object, submit it to the other class, which then returns some kind of response data object.
What the LLM actually did was pretty shocking, it used reflection to get access from one class to the private properties with the data required inside the other class. It then just straight up stole the data and did the work itself (wrongly as well I might add). I just about fell of my chair when I saw this.
So I asked the dev, he said he didn’t fully understand what the LLM did, he wasn’t familiar with reflection. But since it seemed to work in the few tests he did and the unit tests the LLM generated passed, he thought it would be fine.
Also the unit tests were wrong, I explained to the dev that usually with humans it’s a bad idea to have the person who wrote the code also (exclusively) write the unit tests. Whenever possible have somebody else write the unit tests, so they don’t have the same assumptions and blind spots. With LLMs this is doubly true, it will just straight up lie in the unit tests. If they aren’t complete nonsense to begin with.
I swear to the gods, LLMs don’t save time or money, they just give the illusion they do. Some task of a few hours will take 20 min and everyone claps. But then another task takes twice as long and we just don’t look at that. And the quality suffers a lot, without anyone really noticing.
Great description of a problem I noticed with most LLM generated code of any decent complexity. It will look fantastic at first but you will be truly up shit creek by the time you realise it didn’t generate a paddle.
It says it will finish the code, it doesn’t say the code will work.
Also just because the code works, doesn’t mean it’s good code.
I’ve had to review code the other day which was clearly created by an LLM. Two classes needed to talk to each other in a bit of a complex way. So I would expect one class to create some kind of request data object, submit it to the other class, which then returns some kind of response data object.
What the LLM actually did was pretty shocking, it used reflection to get access from one class to the private properties with the data required inside the other class. It then just straight up stole the data and did the work itself (wrongly as well I might add). I just about fell of my chair when I saw this.
So I asked the dev, he said he didn’t fully understand what the LLM did, he wasn’t familiar with reflection. But since it seemed to work in the few tests he did and the unit tests the LLM generated passed, he thought it would be fine.
Also the unit tests were wrong, I explained to the dev that usually with humans it’s a bad idea to have the person who wrote the code also (exclusively) write the unit tests. Whenever possible have somebody else write the unit tests, so they don’t have the same assumptions and blind spots. With LLMs this is doubly true, it will just straight up lie in the unit tests. If they aren’t complete nonsense to begin with.
I swear to the gods, LLMs don’t save time or money, they just give the illusion they do. Some task of a few hours will take 20 min and everyone claps. But then another task takes twice as long and we just don’t look at that. And the quality suffers a lot, without anyone really noticing.
Great description of a problem I noticed with most LLM generated code of any decent complexity. It will look fantastic at first but you will be truly up shit creek by the time you realise it didn’t generate a paddle.
I was going to say. The code won’t compile but it will be “finished “
A couple agent iterations will compile. Definitely won’t do what you wanted though, and if it does it will be the dumbest way possible.
Yeah you can definitely bully AI into giving you some thing that will run if you yell at it long enough. I don’t have that kind of patience
Edit: typically I see it just silently dump errors to /dev/null if you complain about it not working lol
And people say that AI isn’t humanlike. That’s peak human behavior right there, having to bother someone out of procrastination mode.
The edit makes it even better, swiping things under the rug? Hell yeah!