While I understand their position, I disagree with it.
Training AI on copyrighted data - let’s take music for example - is no different to a kid at home listening to Beatles songs all day and using that as inspiration while learning how to write songs or play an instrument.
You cant copyright a style of music, a sound, or a song structure. As long as the AI isn’t just reproducing the copyrighted content “word for word”, I don’t see what the issue is.
Does the studio ghibli artist own that style of drawing? No, because you can’t own something like that. Others are free to draw whatever they want while replicating that style.
Exactly I’m a data engineer and people have no clue what they’re talking about in this thread.
If we require copyright for transformative work that would mean trillions lost in growth - its just something that cant even happen no matter how hard we’d want it. Most people are not even aware of the implications such copyright overreach would have.
So do you target AI training explicitly? How can that he even enforced? Is my review sentiment evaluation machine illegal now? What if I RAG copyrighted content in am I in jail now? How could this possible be ever enforced? It’s so stupid.
This issue is dominated by tech illiterate who jusy want to be angry at corporations but instead of doing something about it they fall for copyright propaganda.
If we don’t know how to control our emotions, they will lead us to make bad decisions. That emotion will only be temporary, but the decision will be permanent, and we’ll regret it later.
a) An AI is not a person. We do not WANT an AI to be regarded as equal to a person under law. That’s a terrible idea
b) How is that AI training material being generated? Did they buy copies of every copyrighted song and every movie by every artist to include in the training data? If it’s music and streamed, are they paying the artist royalties based on every “play” the AI is processing during training the same as of a human played the song over and over again to learn a long? How about sheet music? Because if a PERSON is learning from training material, the license for sheet music and training materials is different than a playable copy of the same work.
I’m willing to bet that the AI companies didn’t even pay for the regular copies of works much less ones licensed for use as training materials for humans, but it didn’t matter because an AI is an advanced algorithm and NOT A HUMAN.
a) No one is suggesting AI be regarded as equal to a person under law though?
b) if the music is being streamed then it’s up to the streaming company to pay the artists royalties. I have Spotify and I don’t pay the artists - Spotify does.
If the argument is “the people feeding data into the AI illegally acquired the content” then sure, argue that and prosecute them for piracy or whatever. That’s not the argument that is being made though.
That’s not what AI is doing though. A better analogy using your book example would be learning a book by heart, then going and writing a new book in that same style.
Some company’s own some wildly absurd things, copyright is only enforced if you have the money to do your own policing sometimes in multiple continents
They do, but the point still stands. No one “owns” what these AIs are learning. That’s what they’re doing - learning, and they’re learning from copyrighted material the same way people learn from copyrighted material. The copyright holders - mainly artists - are just super upset about it because it’s showing that what they provide can be easily learned and emulated by computers.
They’re the horse and carriage sellers when cars were invented.
You miss the part where the copyright owner did not assign them the rights to use the material for such a purpose, and yes most copyright does cover a ton of stuff like retransmission, reproduction, public production and a bunch of other shit which is all separate license. It’s not so simple as “they did what a human does” because even the WAYS a human uses said material is limited under the terms of the copyright
While I understand their position, I disagree with it.
Training AI on copyrighted data - let’s take music for example - is no different to a kid at home listening to Beatles songs all day and using that as inspiration while learning how to write songs or play an instrument.
You cant copyright a style of music, a sound, or a song structure. As long as the AI isn’t just reproducing the copyrighted content “word for word”, I don’t see what the issue is.
Does the studio ghibli artist own that style of drawing? No, because you can’t own something like that. Others are free to draw whatever they want while replicating that style.
Exactly I’m a data engineer and people have no clue what they’re talking about in this thread.
If we require copyright for transformative work that would mean trillions lost in growth - its just something that cant even happen no matter how hard we’d want it. Most people are not even aware of the implications such copyright overreach would have.
So do you target AI training explicitly? How can that he even enforced? Is my review sentiment evaluation machine illegal now? What if I RAG copyrighted content in am I in jail now? How could this possible be ever enforced? It’s so stupid.
This issue is dominated by tech illiterate who jusy want to be angry at corporations but instead of doing something about it they fall for copyright propaganda.
If we don’t know how to control our emotions, they will lead us to make bad decisions. That emotion will only be temporary, but the decision will be permanent, and we’ll regret it later.
No. Same rules as everyone else.
Disclosure of training sources
If your sources are copyrighted, yes.
Unlikely. None payment of restitution in a civil case could end in jail via contempt of court.
The same way other copyright claims are enforced.
Literacy in technology has no effect on the law.
We’re had many years of publishing strengthening their legal position. It’s case law, not propaganda.
Hit the nail on the head.
a) An AI is not a person. We do not WANT an AI to be regarded as equal to a person under law. That’s a terrible idea
b) How is that AI training material being generated? Did they buy copies of every copyrighted song and every movie by every artist to include in the training data? If it’s music and streamed, are they paying the artist royalties based on every “play” the AI is processing during training the same as of a human played the song over and over again to learn a long? How about sheet music? Because if a PERSON is learning from training material, the license for sheet music and training materials is different than a playable copy of the same work.
I’m willing to bet that the AI companies didn’t even pay for the regular copies of works much less ones licensed for use as training materials for humans, but it didn’t matter because an AI is an advanced algorithm and NOT A HUMAN.
a) No one is suggesting AI be regarded as equal to a person under law though?
b) if the music is being streamed then it’s up to the streaming company to pay the artists royalties. I have Spotify and I don’t pay the artists - Spotify does.
If the argument is “the people feeding data into the AI illegally acquired the content” then sure, argue that and prosecute them for piracy or whatever. That’s not the argument that is being made though.
In Meta’s court case it is one of the arguments.
That arguments not going to be of any use then.
if i learn a book by heart, and then go around making money by reciting it, then that’s illegal. same thing.
On the other hand, it is not the learning in your example that is illegal, but the recital.
If you learn ten books by heart and make money writing shitty fanfics, thats not necessarily illegal.
well yeah. And it has been proven time and again that they can, and do, regurgitate that training material out quite often
Yup. I don’t think training should be considered breaking copyright. Regurgitating though should.
There are examples of use cases besides the right now obvious one of LLMs “creating” “original” content.
One that comes to my mind is indexing books. Allowing for people to search for books based on a description.
That’s not what AI is doing though. A better analogy using your book example would be learning a book by heart, then going and writing a new book in that same style.
Is that illegal? No.
but that’s not what they’re doing when they’re spitting out open source code verbatim, with no attribution or license
They don’t do that.
except that they regularly do. It isn’t even news at this point
can you please show me some examples? Should be easy to find them based on your comment.
Some company’s own some wildly absurd things, copyright is only enforced if you have the money to do your own policing sometimes in multiple continents
Even if it benefits big players more, copyright still benefits small artists
They do, but the point still stands. No one “owns” what these AIs are learning. That’s what they’re doing - learning, and they’re learning from copyrighted material the same way people learn from copyrighted material. The copyright holders - mainly artists - are just super upset about it because it’s showing that what they provide can be easily learned and emulated by computers.
They’re the horse and carriage sellers when cars were invented.
You miss the part where the copyright owner did not assign them the rights to use the material for such a purpose, and yes most copyright does cover a ton of stuff like retransmission, reproduction, public production and a bunch of other shit which is all separate license. It’s not so simple as “they did what a human does” because even the WAYS a human uses said material is limited under the terms of the copyright
But they didn’t use it for any of those purposes. Training an AI model isn’t doing any of that. Which do you think they did specifically?
Humans can learn from any copyrighted material they want to. Copyright doesn’t, and can’t, prevent that.