They don’t look at it letter by letter but in tokens, which are automatically generated separately based on occurrence. So while ‘z’ could be it’s own token, ‘ne’ or even ‘the’ could be treated as a single token vector. of course, ‘e’ would still be a separate token when it occurs in isolation. You could even have ‘le’ and ‘let’ as separate tokens, afaik. And each token is just a vector of numbers, like 300 or 1000 numbers that represent that token in a vector space. So ‘de’ and ‘e’ could be completely different and dissimilar vectors.
so ‘delaware’ could look to an llm more like de-la-w-are or similar.
of course you could train it to figure out letter counts based on those tokens with a lot of training data, though that could lower performance on other tasks and counting letters just isn’t that important, i guess, compared to other stuff
Wouldn’t that only explain errors by omission? If you ask for a letter, let’s say D, it would omit words containing that same letter when in a token in conjunction with more letters, like Da, De, etc, but how would it return something where the letter D isn’t even in the word?
Of course, when the question asks “contains the letter _” you might think an intelligent algorithm would get off its tokens and do a little letter by letter analysis. Related: ChatGPT is really bad at chess, but there are plenty of algorithms that are super-human good at it.
Well go on…
They don’t look at it letter by letter but in tokens, which are automatically generated separately based on occurrence. So while ‘z’ could be it’s own token, ‘ne’ or even ‘the’ could be treated as a single token vector. of course, ‘e’ would still be a separate token when it occurs in isolation. You could even have ‘le’ and ‘let’ as separate tokens, afaik. And each token is just a vector of numbers, like 300 or 1000 numbers that represent that token in a vector space. So ‘de’ and ‘e’ could be completely different and dissimilar vectors.
so ‘delaware’ could look to an llm more like de-la-w-are or similar.
of course you could train it to figure out letter counts based on those tokens with a lot of training data, though that could lower performance on other tasks and counting letters just isn’t that important, i guess, compared to other stuff
Good read. Thank you
Con-ned-di-cut
Wouldn’t that only explain errors by omission? If you ask for a letter, let’s say D, it would omit words containing that same letter when in a token in conjunction with more letters, like Da, De, etc, but how would it return something where the letter D isn’t even in the word?
Of course, when the question asks “contains the letter _” you might think an intelligent algorithm would get off its tokens and do a little letter by letter analysis. Related: ChatGPT is really bad at chess, but there are plenty of algorithms that are super-human good at it.