This is the technology worth trillions of dollars huh
Blows my mind people pay money for wrong answers.
I get the sentiment behind this post, and it’s almost always funny when LLM are such dumbass. But this is not a good argument against the technology. It is akin to climate change denier using the argument: “look! It snowed today, climate change is so dumb huh ?”
AI writes code for me. It makes dumbass mistakes that compilers automatically catch. It takes three or four rounds to correct a lot of random problems that crop up. Above all else, it’s got limited capacity - projects beyond a couple thousand lines of code have to be carefully structured and spoonfed to it - a lot like working with junior developers. However: it’s significantly faster than Googling for the information needed to write the code like I have been doing for the last 20 years, it does produce good sample code (if you give it good prompts), and it’s way less frustrating and slow to work with than a room full of junior developers.
That’s not saying we fire the junior developers, just that their learning specializations will probably be very different from the ones I was learning 20 years ago, just as those were very different than the ones programmers used 40 and 60 years ago.
Listen, we just have to boil the ocean five more times.
Then it will hallucinate slightly less.
Or more. There’s no way to be sure since it’s probabilistic.
If you want to get irate about energy usage, shut off your HVAC and open the windows.
“This is the technology worth trillions of dollars”
You can make anything fly high in the sky with enough helium, just not for long.
(Welcome to the present day Tech Stock Market)
Bubbles and crashes aren’t a bug in the financial markets, they’re a feature. There are whole legions of investors and analysts who depend on them. Also, they have been a feature of financial markets since anything resembling a financial market was invented.
Well, for anyone who knows a bit about how LLMs work, it’s pretty obvious why LLMs struggle with identifying the letters in the words
Well go on…
They don’t look at it letter by letter but in tokens, which are automatically generated separately based on occurrence. So while ‘z’ could be it’s own token, ‘ne’ or even ‘the’ could be treated as a single token vector. of course, ‘e’ would still be a separate token when it occurs in isolation. You could even have ‘le’ and ‘let’ as separate tokens, afaik. And each token is just a vector of numbers, like 300 or 1000 numbers that represent that token in a vector space. So ‘de’ and ‘e’ could be completely different and dissimilar vectors.
so ‘delaware’ could look to an llm more like de-la-w-are or similar.
of course you could train it to figure out letter counts based on those tokens with a lot of training data, though that could lower performance on other tasks and counting letters just isn’t that important, i guess, compared to other stuff
Good read. Thank you
Of course, when the question asks “contains the letter _” you might think an intelligent algorithm would get off its tokens and do a little letter by letter analysis. Related: ChatGPT is really bad at chess, but there are plenty of algorithms that are super-human good at it.
Con-ned-di-cut
Wouldn’t that only explain errors by omission? If you ask for a letter, let’s say D, it would omit words containing that same letter when in a token in conjunction with more letters, like Da, De, etc, but how would it return something where the letter D isn’t even in the word?
We’re turfing out students by the tens on academic misconduct. They are handing in papers with references that clearly state “generated by Chat GPT”. Lazy idiots.
This is why invisible watermarking of AI-generated content is likely to be so effective. Even primitive watermarks like file metadata. It’s not hard for anyone with technical knowledge to remove, but the thing with AI-generated content is that anyone who dishonestly uses it when they are not supposed to is probably also too lazy to go through the motions of removing the watermarking.
Couldn’t students just generate a paper with ChatGPT, open two windows wide by side and then type it out in a word document?
but that’s work.
Students view doing that as basically the same amount of work as writing the paper yourself
Depends on the watermark method used. Some people talk about watermarking by subtly adjusting the words used. Like if there’s 5 synonyms and you pick the 1st synonym, next word you pick the 3rd synonym. To check the watermark you have to access to the model and probabilities to see if it matches that. The tricky part about this is that the model can change and so can the probabilities and other things I don’t fully understand.
if you are going to do all that, just do the research and learn something.
Aye that’s exactly the same thing that I said
Huh that actually does sound like a good use-case of LLMs. Making it easier to weed out cheaters.
Connedicut.
Close. We natives pronounce it ‘kuh ned eh kit’
So does everyone else
Hey look the markov chain showed its biggest weakness (the markov chain)!
In the training data, it could be assumed by output that Connecticut usually follows Colorado in lists of two or more states containing Colorado. There is no other reason for this to occur as far as I know.
Markov Chain based LLMs (I think thats all of them?) are dice-roll systems constrained to probability maps.
Edit: just to add because I don’t want anyone crawling up my butt about the oversimplification. Yes. I know. That’s not how they work. But when simplified to words so simple a child could understand them, its pretty close.
I was wondering if you’d get similar results for states with the letter R, since there’s lots of prior art mentioning these states as either “D” or “R” during elections.
Oh l I was thinking it’s because people pronounce it Connedicut
Awe cute!
Just another trillion, bro.
Just another 1.21 jigawatts of electricity, bro. If we get this new coal plant up and running, it’ll be enough.
Behold the most expensive money burner!
Yesterday i asked Claude Sonnet what was on my calendar (since they just sent a pop up announcing that feature)
It listed my work meetings on Sunday, so I tried to correct it…
You’re absolutely right - I made an error! September 15th is a Sunday, not a weekend day as I implied. Let me correct that: This Week’s Remaining Schedule: Sunday, September 15
Just today when I asked what’s on my calendar it gave me today and my meetings on the next two thursdays. Not the meetings in between, just thursdays.
Something is off in AI land.
Edit: I asked again: gave me meetings for Thursday’s again. Plus it might think I’m driving in F1
Also, Sunday September 15th is a Monday… I’ve seen so many meeting invites with dates and days that don’t match lately…
Yeah, it said Sunday, I asked if it was sure, then it said I’m right and went back to Sunday.
I assume the training data has the model think it’s a different year or something, but this feature is straight up not working at all for me. I don’t know if they actually tested this at all.
Sonnet seems to have gotten stupider somehow.
Opus isn’t following instructions lately either.
A few weeks ago my Pixel wished me a Happy Birthday when I woke up, and it definitely was not my birthday. Google is definitely letting a shitty LLM write code for it now, but the important thing is they’re bypassing human validation.
Stupid. Just stupid.
pixel?
have you heard ~about grapheneOS tho…~
So the Dakotas get a pass
And Idaho
Connecticut do have a D in it: mine.
ChatGPT is just as stupid.
it’s actually getting dumber.
You joke, but I bet you didn’t know that Connecticut contained a “d”
I wonder what other words contain letters we don’t know about.
The d in Connecticut is between the e and the i. They don’t connect because it was cut.
Connecticut is Jewish?
The famous ‘invisible D’ of Connecticut, my favorite SCP.
That actually sounds like a fun SCP - a word that doesn’t seem to contain a letter, but when testing for the presence of that letter using an algorithm that exclusively checks for that presence, it reports the letter is indeed present. Any attempt to check where in the word the letter is, or to get a list of all letters in that word, spuriously fail. Containment could be fun, probably involving amnestics and widespread societal influence, I also wonder if they could create an algorithm for checking letter presence that can be performed by hand without leaking any other information to the person performing it, reproducing the anomaly without computers.
ct -> d is a not-uncommon OCR fuck up. Maybe that’s the source of it’s garbage data?
No, LLMs produce the most statistically likely (in their training data) token to follow a certain list of tokens (there’s nothing remotely resembling reasoning going on in there, it’s pure hard statistics, with some error and randomness thrown in), and there are probably a lot more lists where Colorado is followed by Connecticut than ones where it’s followed by Delaware, so they’re obviously going to be more likely to produce the former.
Moreover, there aren’t going to be many texts listing the spelling of states (maybe transcripts of spelling bees?), so that information is unlikely to be in their training data, and they can’t extrapolate because it’s not really something they do and because they use words or parts of words as tokens, not letters, so they literally have no way of listing the letters of a word if said list is not in their training data (and, again, that’s not something we tend to write, and if we did we wouldn’t include d in Connecticut even if we were reading a misprint). Same with counting how many letters a word has, and stuff like that.
Words are full of mystery! Besides the invisible D, Connecticut has that inaudible C…
SCP-00WTFDoC (lovingly called “where’s the fucking D of Connecticut” by the foundation workers, also “what the fuck, doc?”)
People think it’s safe, because it’s “just an invisible D”, not even a dick, just the letter D, and it only manifests verbally when someone tries to say “connecticut” or write it down. When you least expect it, everyone heard “Donnedtidut”, everyone read that thing and a portal to that fucking place opens and drags you in.
Every American I know does pronounce it like Connedicut 🤔
Really? Everyone I know calls it kinetic-cut. But I group up in new england.
“Kinetic” with a hard “T” like posh Brit is saying it to the queen? Everyone I’ve ever heard speaking US English pronounces it with a rolled “t” like “kinedic” so the alternate pronunciation still reads like it’d have a “d” sound
This phenomenon is called “T flapping” and it is common in North American English. I got into an argument with my dad who insisted he pronounces the T’s in ‘butter’ when his dialect, like nearly all North Americans pronounces the word as ‘budder’.
budder is softer than t flapping. further forward with the tongue on the palate.
It’s an approximation, but the t is partially vocalized giving it a ‘d’ sound even if it’s not made exactly the same way.
i just thought we were getting technical about the linguistics. i got and use both words frequently, thought the distinction might be appreciated. the difference is so subtle we sometimes have to ask each other which one we’re referring to. i’m willing to bet it shows up more on my face than in my voice.
That’s how I’ve always heard it pronounced on the rare occasions anybody ever mentions it. But I’ve never been to that part of the US so maybe the accents different there?
Connedicut
I was going to make a joke if you’re from connedicut you never pronounce first d in the word. Conne-icut
The letters that make up words is a common blind spot for AIs, since they are trained on strings of tokens (roughly words) they don’t have a good concept of which letters are inside those words or what order they are in.
I find it bizarre that people find these obvious cases to prove the tech is worthless. Like saying cars are worthless because they can’t go under water.
I find it bizarre that people find these obvious cases to prove the tech is worthless. Like saying cars are worthless because they can’t go under water.
This reaction is because conmen are claiming that current generations of LLM technology are going to remove our need for experts and scientists.
We’re not demanding submersible cars, we’re just laughing about the people paying top dollar for the lastest electric car while plannig an ocean cruise.
I’m confident that there’s going to be a great deal of broken… everything…built with AI “assistance” during the next decade.
Not bizarre at all.
The point isn’t “they can’t do word games therefore they’re useless”, it’s “if this thing is so easily tripped up on the most trivial shit that a 6-year-old can figure out, don’t be going round claiming it has PhD level expertise”, or even “don’t be feeding its unreliable bullshit to me at the top of every search result”.
A six year old can read and write Arabic, Chinese, Ge’ez, etc. and yet most people with PhD level experience probably can’t, and it’s probably useless to them. LLMs can do this also. You can count the number of letters in a word, but so can a program written in a few hundred bytes of assembly. It’s completely pointless to make LLMs to do that, as it’d just make them way less efficient than they need to be while adding nothing useful.
LOL, it seems like every time I get into a discussion with an AI evangelical, they invariably end up asking me to accept some really poor analogy that, much like an LLM’s output, looks superficially clever at first glance but doesn’t stand up to the slightest bit of scrutiny.
it’s more that the only way to get some anti AI crusader that there are some uses for it is to put it in an analogy that they have to actually process rather than spitting out an “ai bad” kneejerk.
I’m probably far more anti AI than average, for 95% of what it’s pushed for it’s completely useless, but that still leaves 5% that it’s genuinely useful for that some people refuse to accept.
I feel this. In my line of work I really don’t like using them for much of anything (programming ofc, like 80% of Lemmy users) because it gets details wrong too often to be useful and I don’t like babysitting.
But when I need a logging message, or to return an error, it’s genuinely a time saver. It’s good at pretty well 5%, as you say.
But using it for art, math, problem solving, any of that kind of stuff that gets tauted around by the business people? Useless, just fully fuckin useless.
It’s amazing that if you acknowledge that:
- AI has some utility and
- The (now tiresome and sloppy) tests they’re using doesn’t negate 1
You are now an AI evangelist. Just as importantly, the level of investment into AI doesn’t justify #1. And when that realization hits business America, a correction will happen and the people who will be effected aren’t the well off, but the average worker. The gains are for the few, the loss for the many.
it’s more that the only way to get some anti AI crusader that there are some uses for it
Name three.
I’m going to limit to LLMs as that’s the generally accepted term and there’s so many uses for AI in other fields that it’d be unfair.
-
Translation. LLMs are pretty much perfect for this.
-
Triaging issues for support. They’re useless for coming to solutions but as good as humans without the need to wait at sending people to the correct department to deal with their issues.
-
Finding and fixing issues with grammar. Spelling is something that can be caught by spell-checkers, but grammar is more context-aware, another thing that LLMs are pretty much designed for, and useful for people writing in a second language.
-
Finding starting points to research deeper. LLMs have a lot of data about a lot of things, so can be very useful for getting surface level information eg. about areas in a city you’re visiting, explaining concepts in simple terms etc.
-
Recipes. LLMs are great at saying what sounds right, so for cooking (not so much baking, but it may work) they’re great at spitting out recipes, including substitutions if needed, that go together without needing to read through how someone’s grandmother used to do xyz unrelated nonsense.
There’s a bunch more, but these were the first five that sprung to mind.
-
So if the AI can’t do it then that’s just proof that the AI is too smart to be able to do it? That’s your arguement is it. Nah, it’s just crap
You think just because you attached it to an analogy that makes it make sense. That’s not how it works, look I can do it.
My car is way too technologically sophisticated to be able to fly, therefore AI doesn’t need to be able to work out how many l Rs are in “strawberry”.
See how that made literally no sense whatsoever.
Except you’re expecting it to do everything. Your car is too “technically advanced” to walk on the sidewalk, but wait, you can do that anyway and don’t need to reinvent your legs
I don’t want to defend ai again, but it’s a technology, it can do some things and can’t do others. By now this should be obvious to everyone. Except to the people that believe everything commercials tell them.
358 instances (so far) of lawyers in Australia using AI evidence which “hallucinated”.
And this week one was finally punished.
Ok? So, what you are saying is that some lawyers are idiots. I could have told you that before ai existed.
It’s not the AIs which are crap, its what they’ve been sold as capable of doing and the reliability of their results that’s massivelly disconnected from reality.
The crap is what a most of the Tech Investor class has pushed to the public about AI.
It’s thus not at all surprising that many who work or manage work in areas were precision and correctness is essential have been deceived into thinking AI can do much of the work for them and it turns out AI can’t really do it because of those precision and correctness requirement that it simply cannot achieve.
This will hit more those people who are not Tech experts, such as Lawyers, but even some supposedly Tech experts (such as some programmers) have been swindled in this way.
There are many great uses for AI, especially stuff other than LLMs, in areas where false positives or false negatives are no big deal, but that’s not were the Make Money Fast slimy salesmen push for them is.
I think people today, after having a year experience with ai know it’s capabilities reasonably well. My mother is 73 and it’s been a while since she stopped joking about what ai wrote to her that was silly or wrong, so people using computers at their jobs should be much more aware.
I agree about that llms are good at some things. They are great tools for what they can do. Let’s use them for those things! I mean even programming has benefitted a lot from this, especially in education, junior level stuff, prototyping, …
When using any product, a certain responsibility falls on the user. You can’t blame technology for what stupid users do.
How many people do you think know that AIs are “trained on tokens”, and understand what that means? It’s clearly not obvious to those who don’t, which are roughly everyone.
You don’t have to know about tokens to see what ai can and cannot do.
Go to an art museum and somebody will say ‘my 6 year old can make this too’, in my view this is a similar fallacy.
That makes no sense. That has nothing to do with it. What are you on about.
That’s like watching tv and not knowing how it works. You still know what to get out of it.
Then why is Google using it for question like that?
Surely it should be advanced enough to realise it’s weakness with this kind of questions and just don’t give an answer.
They are using it for every question. It’s pointless. The only reason they are doing it is to blow up their numbers.
… they are trying to be infront. So that some future ai search wouldn’t capture their market share. It’s a safety thing even if it’s not working for all types of questions.
The only reason they are doing it is to blow up their numbers.
Ding ding ding.
It’s so they can have impressive metrics for shareholders.
“Our AI had n interactions this quarter! Look at that engagement!”, with no thought put into what user problems it solves.
It’s the same as web results in the Windows start menu. “Hey shareholders, Bing received n interactions through the start menu, isn’t that great? Look at that engagement!”, completely obfuscating that most of the people who clicked are probably confused elderly users who clicked on a web result without realising.
Line on chart must go up!
Yeah, but … they also can’t just do nothing and possibly miss out on something. Especially if they already invested a lot.
Well it also can’t code very well either
Removed by mod
I feel like that was supposed to be an insult but because it made literally no sense whatsoever, I really can’t tell.
No not really, just an observation. It literally said you are a boring person. Not sure whats not to get.
Bye.
You need to get back on the dried frog pills.
Understanding the bounds of tech makes it easier for people to gage its utility. The only people who desire ignorance are those that profit from it.
Sure. But you can literally test almost all frontier models for free. It’s not like there is some conspiracy or secret. Even my 73 year old mother uses it and knows it’s general limits.
Saying “it’s worth trillions of dollars huh” isn’t really promoting that attitude.