@nednobbins

nednobbins@lemmy.zip · 19 minutes ago

That’s a reasonable critique.

The point is that it’s trivial to come up with new words. Put that same prompt into a bunch of different LLMs and you’ll get a bunch of different words. Some of them may exist somewhere that don’t exist. There are simple rules for combining words that are so simple that children play them as games.

The LLM doesn’t actually even recognize “words” it recognizes tokens which are typically parts of words. It usually avoids random combinations of those but you can easily get it to do so, if you want.

nednobbins@lemmy.zip · 5 hours ago

That’s possible but it’s not what the authors found.

They spend a fair amount of the conclusion emphasizing how exploratory and ambiguous their findings are. The researchers themselves are very careful to point out that this is not a smoking gun.

nednobbins@lemmy.zip · 7 hours ago

I’d say there are two issues with it.

FIrst, it’s a very new article with only 3 citations. The authors seem like serious researchers but the paper itself is still in the, “hot off the presses” stage and wouldn’t qualify as “proven” yet.

It also doesn’t exactly say that books are copies. It says that in some models, it’s possible to extract some portions of some texts. They cite “1984” and “Harry Potter” as two books that can be extracted almost entirely, under some circumstances. They also find that, in general, extraction rates are below 1%.

nednobbins@lemmy.zip · 7 hours ago

Sort of.

If you violated laws in obtaining the book (eg stole or downloaded it without permission) it’s illegal and you’ve already violated the law, no matter what you do after that.

If you obtain the book legally you can do whatever you want with that book, by the first sale doctrine. If you want to redistribute the book, you need the proper license. You don’t need any licensing to create a derivative work. That work has to be “sufficiently transformed” in order to pass.

nednobbins@lemmy.zip · 7 hours ago

That’s my understanding too. If you obtained them legally, you can use them the same way anyone else who obtained them legally could use them.

nednobbins@lemmy.zip · 7 hours ago

prompt: Make up a word that is not found anywhere on the internet. If I do a search for that word, it should not exist. Provide example usage.

Qwen3-235B-A22B:

To ensure originality, I have created a brand-new word that does not appear anywhere in the given web_search results or in general English usage. Virethra (pronounced: vye-RETH-ruh)

Part of speech: noun

Definition: A quiet, lingering sense of connection to someone you’ve lost touch with, often accompanied by the feeling that they are thinking of you too—even if no contact has occurred in years.

Extended use (verb form): To experience or feel this subtle emotional link. Example Sentences:

Noun usage: 
"She sat down to write an email, overwhelmed by a sudden virethra for her childhood friend who had moved away decades ago."  

Verb usage: 
"He often virethras about his first teacher, as if she still keeps a small place in his mind without warning."  

Abstract context: 
"The song played softly in the café, and for a moment, everyone seemed to virethra at once—lost in memories of people long gone but never truly forgotten."

nednobbins@lemmy.zip · 7 hours ago

They seem pretty different to me.

Video compression developers go through a lot of effort to make them deterministic. We don’t necessarily care that a particular video stream compresses to a particular bit sequence but we very much care that the resulting decompression gets you as close to the original as possible.

AIs will rarely produce exact replicas of anything. They synthesize outputs from heterogeneous training data. That sounds like learning to me.

The one area where there’s some similarity is dimensionality reduction. Its technically a form of compression, since it makes your files smaller. It would also be an extremely expensive way to get extremely bad compression. It would take orders of magnitude more hardware resources and the images are likely to be unrecognizable.

nednobbins@lemmy.zip · 8 hours ago

If you want to go to the extreme: delete first copy.

You can; as I understand it, the only legal requirement is that you only use one copy at a time.

ie. I can give my book to a friend after I’m done reading it; I can make a copy of a book and keep them at home and at the office and switch off between reading them; I’m not allowed to make a copy of the book hand one to a friend and then both of us read it at the same time.

nednobbins@lemmy.zip · 8 hours ago

That’s not what it says.

Neither you nor an AI is allowed to take a book without authorization; that includes downloading and stealing it. That has nothing to do with plagiarism; it’s just theft.

Assuming that the book has been legally obtained, both you and an AI are allowed to read that book, learn from it, and use the knowledge you obtained.

Both you and the AI need to follow existing copyright laws and licensing when it comes to redistributing that work.

“Plagiarism” is the act of claiming someone else’s work as your own and it’s orthogonal to the use of AI. If you ask either a human or an AI to produce an essay on the philosophy surrounding suicide, you’re fairly likely to include some Shakespeare quotes. It’s only plagiarism if you or the AI fail to provide attribution.