I just don't see why everyone seems to not be cheering that perhaps we are not going to go back to the days where all those kids are going to be re charged. It almost feels like everyone wants to go back to labels carpet bombing students with lawsuits[0]
As someone who’s engaged in private piracy basically my entire life I’ve never even considered venturing into gray areas of licensing when procuring for my company. In fact I’ve done the opposite and rooted it out wherever I’ve found it.
It just seems obvious to me that a profit seeking venture should be held to a higher standard when it comes to infringing on the property rights of other companies and individuals, especially if they seek to enforce their own.
Those kids weren’t hypocritically enforcing their own property rights and making employees sign ndas while downloading shit from tpb.
> I just don't see why everyone seems to not be cheering that perhaps we are not going to go back to the days where all those kids are going to be re charged. It almost feels like everyone wants to go back to labels carpet bombing students with lawsuits
It’s currently just as bad but in a different way, imho.
The ability for labels (or whoever owns the rights) to wantonly invoke automated DMCA copyright strikes and demonetization on social media channels like YouTube is borderline criminal to me.
Their lobby did a great job getting them more than they deserved (specifically with regards to the facilitation of capricious invoking of DMCA), but the abuse of the rules limits the growth of the creator economy in very unhealthy ways.
Because there is no reasonable expectation that we are not going back to those days. In fact, we are more likely to go back to those days then not.
Those students are not Zuckenberg. They will not be treated as Zuckenberg. The legal theories that apply to them dont apply to Zuckenberg and vice versa. They do not have money to mount defense and if they do, they will be in debt till the end of their lives.
False dichotomy. We can obviously have both. We can destroy corporations that rely on copyright to exist and then abuse that system to profit. We can also ignore college students and minor contributory copyright infringement.
The difference in scope here should be obvious.
We can similarly punish drug dealers while not punishing drug users. In fact it's already policy in large parts of the USA.
"Thats such a non sequitur. This isnt a weed legalisation argument, its "Do we make IP worse for everyone, because you dont like some people benefiting from fair use"."
Because the 'perhaps' there is a load-bearing word that is doing a lot of work and it's going to be come crashing down sooner or later.
Of course some kids are going to be charged for this kind of shit, it's still a rules for thee but not for me world, the 'not for me' folks are just a hell of a lot more brazen about it.
I also find it funny, I said this regarding the other thread and article[0]
'"They then copied those stolen fruits"
How are these fruits "stolen" if they still have what was allegedley stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"
And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.'
>How are these fruits "stolen" if they still have what was allegedley stolen?
If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen? And I didn't even buy a copy of your book to copy it.
> If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen?
The trouble with this analogy is that it proves too much.
Suppose you write a book, and so does someone else, but they have better marketing than you and then people in the market for that genre buy theirs instead of yours. Let's even stipulate that the existence of their book actually lowers your sales, because people who want that kind of book already bought theirs by the time they find out about yours and then some people don't have time to read or can't afford to buy both.
Notice that we haven't yet said a word about the contents of either book. They could be completely independent and they've never even heard of you or your book -- they "didn't even buy a copy of your book to copy it". All we know is that they're the same genre and the existence of theirs is costing you sales. By that logic all competition would thereby be "stealing", and that can't be right.
Which implies that you don't have a property right to the customers.
A better analogy would be that you do original research or work and produce a valuable book. Somebody else looks at your work, decides it has value, and reproduces it in a new book under their name. The new book is cheaper, or easier to find, or for whatever reason displaces your original book created through your own research and investment. Now somebody else is profiting off your creativity or work, without payment or even acknowledgement.
I'm not sure how this plays out legally, but it certainly seems unethical
So for example, when Disney sees value in public domain stories like Cinderella, Rapunzel/Tangled or Snow White, and they make movies out of them, profiting from the creativity and work of the Brothers Grimm without paying anything to their estate, or high school plays do Shakespeare, that seems unethical to you?
Would it be fair for Greece to do retroactive term extensions all the way back to Plato and then sue anyone who copies the idea of having a university or uses the Platonic solids or distributes religious texts that incorporate the dualistic theory of the soul?
Your examples, as you say, are all public domain. Are all the works we train LLMs on public domain too? Was the original book in my analogy in the public domain? What do you think about training on material that isn't yet in the public domain?
You're framing this as an ethical question, but copyright term lengths are essentially arbitrary. They're set by the government, as are the boundaries of fair use. At which point you're making a circular argument. That it's bad if it's illegal and that it should be illegal because it's bad. So what happens if someone argues the opposite? That it's not unethical if it's fair use and then it should be fair use because it's not unethical.
I'm not making a circular argument, nor one based on legality. You explicitly changed your example to use "public domain" content, and ignoring the legal specifics of that it's clear that's a separate category of content. Most people have no ethical issue with remixing or using content that has already done the rounds and delivered most of its immediate value to the creator. This is very different to your earlier examples with books, framed as two contemporary pieces of media competing with each other.
Letting companies train LLMs on the "classics" is very different to training on contemporary media where the creator still depends on it.
I like your argument, not because it is a good analogy for AI but because it is a good contrast. Copyright isn't a guarantee or magic force field blocking fair competition. It is a permeable buffer against lazy knockoffs and time-boxed so that buffer doesn't choke all future creativity.
People on this thread need to focus on what "derivative" and "fair use" mean and understand both are measured on a somewhat fuzzy spectrum, subject to interpretation.
In a perfectly fair world AIs/MLs could vacuum up all human knowledge, fair and square. (In an ideal world, they would do that adhering to polite opt-in/opt-out agreements with copyright holders. We can dream). Input isn't theft.
On output, two magic genies would stand at the gate, the Derivative Genie and Fair Use Genie and review anything spat out by the AI/ML. If it crossed agreed upon thresholds the Genies would bar the gates and issue a stern warning to prompt again (or maybe the AL/ML would auto-adjust the prompt and try again).
So, if your prompt asked for a 300-word poem about thrash metal mosh pit dancing and it spat out a poem where 85% of it match one of the handful of available mosh pit poems in its database, the Derivative Demon would block the output and raise an alarm.
On the other hand, if you asked for a line by line analysis of a famous mosh pit dancing poem (by name) or maybe asked for a satirical spoof of said poem, the Fair Use Demon would overrule the Derivative Demon and give the output a pass.
That's as fair as this could get, especially if you add one more thing: An Appeals Court (maybe corporate, maybe 3rd party, maybe state run) with a Settlement Pool. If a copyright holder could prove the Genies let pass something they shouldn't, the AL/ML would fix that. If real damage is done, the creator would get a settlement from the pool.
The point is that the Input Genie is out of the bottle. Creators just look foolish trying to squeeze it back in. Better, they should focus on making the output Genies and the Appeals process as effective and fair as possible for everyone.
Why are you talking about this case that case nothing to do with the topic at hand? The comment you’re replying to gives a very clear and narrow analogy, and you’re talking about something else.
How is it something else? It's the same analogy. The problem with it is that the harm from the alleged theft doesn't require any use of the original material in order to happen, since that "harm" is competition rather than expropriation.
The attempt to distinguish them is through copying, but that's the part that isn't depriving anyone of anything.
The main point here is _using_ copyrighted materials to create a commercial product, that you then sell, that may be used as alternative or substitute for the original materials. You’re missing that point and talking about two independent projects competing.
Because the competition is the only source of alleged harm, but people can do that even if they don't copy anything. There isn't actually a property right to the customers. You can lose sales to someone else whether they copied anything or not.
Yes. That's not to say that something damaging wasn't done, but nothing was stolen. Stealing/theft requires deprivation of property. It's like receiving a normal nonlethal punch in the face and calling it murder. Murder requires someone dying.
> Theft [...] is the act of taking another person's property or services without that person's permission or consent with the intent to deprive the rightful owner of it. --- https://en.wikipedia.org/wiki/Stealing
My God, I can't believe chodes are still playing this "how many angels can you fit on the head of a pin" navel gazing semantic argument. Thirty years at least, it was all you saw on fin de ciecle Slashdot from anyone with a six-digit UID. No one cares about your hyper literalist meaning of "theft," that's not the goddamn point. Christ, this place looks like Reddit more and more.
This isn't a court of law. We don't have to talk like lawyers. If you replaced "theft" with "copyright infringement" in the comment you had such a problem with, what meaningfully changes besides we all have about five additional brain cells?
Even the case for copyright infringement is weak. LLMs are not copying machines, we already have copying machines at much lower price, almost zero, and perfect fidelity and much faster than generating it probabilistically. So it makes no economic sense to spend billions on training and inference to make a copier. In fact the value of LLMs is where they do not copy but apply knowledge a new situation.
> If you replaced "theft" with "copyright infringement" in the comment you had such a problem with, what meaningfully changes besides we all have about five additional brain cells?
The obvious difference that copyright is subject to fair use and various other limitations that personal property isn't.
Violation of these rights may be criminal without meeting the strict legal definition of theft.
This can even extend to stealing physical property.
Depending on local laws, stealing a car may not actually be theft if the defendent can prove they intended to return it before the owner got home from work, though it would certainly be considered theft in the colloquial sense of the term, and they would still be guilty of a lesser offense like civil and/or criminal conversion.
> Depending on local laws, stealing a car may not actually be theft if the defendent can prove they intended to return it before the owner got home from work
I doubt there's even one place where the law works like that.
> I doubt there's even one place where the law works like that.
In a lot of places, that's how it works. A key element of theft is the intent to permanently deprive someone of property.
This is why joyriding isn't classified as auto theft and is instead a lesser offense. It's because joyriding is an intent to temporarily deprive, while GTA is an intent to permanently deprive.
In some jxns (the UK is one), there is a tort called trespass to goods, and an example of this would be "stealing" someone's property to deliver to another location for them to use there. The tort of conversion is similar: interference with someone's property right to treat it as your own (silent as to length of time).
Yea in the us if someone tries to steal your car and you are in it or threatened by it you can shoot them dead or something like that (ianal) You may have a court day but in many situations no punishment will follow.
Theft is not the breach of any property right. It's specifically the deprivation of property without consent. Yes, I have checked the definition in my jurisdiction.
Getting punched in the face also violates rights, yet isn't murder. Murder is specifically about dying.
You forget that laws are made by people and at anytime they can change interpretations are arbitrary, roe vs wade today but not tomorrow.
People seem to think what ai is today is theft. If enough people agree, it will be theft. Big companies dont like this and push the other way. An objectiveness doesnt exist here. It is too wiggly
You’re splitting hairs over a definition that isn’t relevant here (theft and copyright infringement are different things) to defend something that even you agree is bad.
It isn't splitting hairs. The damages are completely different in nature.
With theft, the entire damage is the deprivation. It could be an heirloom or some other object that may have been entrusted to you, something that can never be replaced, memorabilia of loved ones. Something that you may have needed in your posession to survive (e.g. a car to go to your job).
With a given copyright violation, the damage is that maybe[1] you made less profit than you could have. The potential for profit is not property. Profit isn't guaranteed.
[1] The loss is not certain, because there's no guarantee that the ones consuming the copyrighted content could have even afforded it.
Cool cool cool. So all the code and data you send to anthropic and chatgpt should be mass distributable to forward other peoples arts and science? All your meeting notes with ai summarizers, slack chats with bots? Might as well put your entire company and all plans for it on github mit licensed. Ill take a peek, see if there's anything valuable to me in that. Don't worry you can keep it all on your github too. It's still yours afterall. Copilot will be training on it too though btw
No it's not. You exposed that data to an LLM. Should have read the fine print. The laws around that don't make sense to me anymore so therefore I own that stuff now. That's how this works right? You do know chatgpt etc can read everything you write, right?
Also social media profile pics. Great way to get faces for deep fake ads. Most people are just 1 phone call away from being voice cloned. Our likeness isn't all that important either if you think about it.
Maybe meta will clone your writing style and sign into your meta account and message your friends telling them about this awesome new product. Meta owns the account and you uploaded data to it.
I think Anthorpic has pledged to not use team and enterprise user's data for training purposes. I don't mind if they do some verification or whatever as long as it doesn't end up in the responses it gives others.
You were swiftly corrected about your misunderstanding under your original comment. Reposting it here, removing the quote farther from its context, and hoping to not be downvoted again is very weird!
I don't see how me quoting the actual complaint the news was about, in both threads, was me being swiftly corrected. If you where to base it on upvotes then this one shows I'm right and you got swiftly corrected here. In both cases it was relevant as both threads where not yet merged and about the same complaint. And held two positons on front page and I was adding to the discourse.
How are these fruits "stolen" if they still have what was allegedley stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"
And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.
I think you are confusing the idiom "stolen fruits" with an actual accusation of criminal theft. Aside from its use in this phrasing, neither "theft" nor "steal" appears anywhere else in the article.
I started Free.ai as a weekend project with the same mindset. And a month in the work hasn't stopped. So I second this. Just find a good name, it helps.
Funny how the copyright industry was able to spin copyright infringment into the pejorative "stealing". If you still have the item, what was stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.
It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.
It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.
It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2. Any law restricting that would be the worst form of tyranny.
It would not be reasonable to allow machines to do that at unlimited scale without restrictions.
(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)
You're taking the metaphor much too seriously. It was only an example to illustrate that human rights don't automatically apply to machines. Let's not read too much into it.
You made a claim and used a metaphor to demonstrate that claim. I asked a very simple question about the bounds of the metaphor and thus the claim. You are dodging answering the questions which mean that you cannot defend the logic of your claim. Thus you have forfeited that your claim is valid and 'human rights don't automatically apply to machines' has not been illustrated.
What's your strategy for solving problems where there are diverse viewpoints if there is no desire to convince anyone else? Rhetoric is time proven set of communication standards that allow us to demonstrate the validity of our positions and thus gives us a way to find agreement or at least understand what others think. Few people are completely irrational and understanding why they think what they do, even if one does not agree with them, is important in a system where people have to co-exist with the decisions that effect everyone.
Because the alternative would be to just railroad people who don't agree, and even when it does work in one's favor the pendulum tends to swing back hard in response.
It's a relevant extension if you think the ability to learn from a work is a right people have that exempts them from the more general lockdown copyright would impose.
If you come at it from the view of copyright being a limited set of control over some areas but not others, then if copyright doesn't block human learning it shouldn't affect anything similar either, unless a specific rule is added to make those situations be handled differently.
Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".
The mental calisthenics required to justify this stuff must be exhausting.
> The mental calisthenics required to justify this stuff must be exhausting.
It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.
> Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".
Are you finding people that actually say this?
When it can quote something like that, it's a training error. A popular enough work gets quoted and copied by people online, and then it's not properly deduplicated. It's a very small fraction of works it can do that with, and the cleaner your data the less it happens.
I'll once again quote that stable diffusion launched with fewer weights than training images. It had some accidental memorizations, but there wasn't room for its core functionality to be memorization-based.
This is a perfect example of 'begging the question'. Arriving at a conclusion from a fact assumed as true without evidence. Your reductio does not actually demonstrate that copyright applies to LLMs, because you did not demonstrate how transcoding is comparable to inference, just that LLMs can reproduce some passages from copyrighted works. You could also produce passages from copyrighted works by generating enough random sequences of words, but no one is arguing that is comparable to transcoding. That the people who do not share this conclusion are engaging in motivated reasoning is based only on your assumption and has no logical backing, and is therefore begging the question.
"Learning" for LLMs is just as goofy and propagandistic a metaphor as "stealing" for copyright. I find it predictive of your position that you'll accept one dumb metaphor for something that we didn't need a metaphor for, but not the other.
Are you for stealing and against learning?
We know exactly what is happening in both cases. We can talk about that, or we can use obfuscating euphemisms that make our preferred position seem obviously true.
I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.
I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.
I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".
>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.
I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.
I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.
Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.
The part I agree to is that copyright law calls out humans specifically as the potential owners of copyright. So what you suggest seems to be the only possibility out. Calling out humans could imply that when a human reads a thousand books and then writes something basis the same but which is not a substantial copy of anything explicitly read, that human owns the copyright to the text written. Whereas, if an artificial neural network does the same (hypothetically writing the same text), it would not.
The above does not follow from, imply or conclude anything about learning in artificial neural networks and humans being similar or dissimilar.
The issue is that of copyright law WRT to derivative works. Machine transformations on original works does not create a new copyright for the person that directed the machine transformation. That's why you can't pirate a bunch of media by simply adding a red pixel to the righthand corner or by color shifting the video.
Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.
It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."
I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.
Shredding is a machine transformation. Does it mean that shreds retain original copyright even if the content can't be restored and the provenance can't be traced? Just an example that treating all machine transformations equally with no regard to the specifics doesn't make much sense.
And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.
> Does it mean that shreds retain original copyright even if the content can't be restored?
Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.
The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.
> And the specifics of autoregressive pretraining is that it is lossy compression.
That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.
> Good luck finding which copyrighted materials have made it into the final weights.
That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.
Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.
Well, it's not the first time when the law contradicts laws of nature (for the entertainment of the future generations). Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.
> in fact, pull out full NYT articles
That's when they used their knowledge of the exact text they wanted to "retrieve" to get the text? It wouldn't be so efficient with a random number generator, but it's doable.
> Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.
You can restore shredded documents with enough time and effort. And if you did that and started making photo copies, even if they are incomplete, you will run afoul of copyright law.
Bittorrent is a relevant example because it shows that shredding doesn't destroy copyright.
Remember, copyright is about the right to copy something. Simply shredding or destroying a thing isn't applicable to copyright. Nor is giving that thing away. What's applicable is when you start to actually copy the thing.
I've meant idealized shredding: a destructive transformation, which is still a machine transformation (think blender instead of shredder). When you need the exact knowledge of a thing to make its (imperfect) copy using some mechanism, it doesn't mean that the mechanism violates copyright.
EDIT: I don't say that neural networks can't rote learn extensive passages (it's an effect of data duplication). I'm saying that they are not designed to do that and it's possible to prevent that (as demonstrated by the latest models).
I'd assume it's still a copyright violation if you copied and distributed the shredded copy.
The way I arrive at that is imagine you add just 1 pixel of static to a video, that'd still be a copyright violation. Now imagine you slowly keep adding those random pixels. Eventually you get to the point where the whole video is just static, but at some point it wasn't.
Now, would any media company or court sue over that? Probably not. But I believe that still falls under copy right (but maybe fair use?).
The issue with neural networks is they aren't people. Even when you point your LLM at a website and say "summarize this" the output of that summation would be owned by the website itself by nature of it being a machine transformed work.
Remembered, it's not just mere rote recitation which violates the law, any transformation counts as well. The fact that AI companies are preventing it doesn't really solve the problem that they are in fact transforming multiple copyrighted works into their responses.
When you point your browser at a website the browser creates a (transformed) local copy of the information that is owned by the website itself. The browser needs to do that to render the website on your screen. Is it a violation of copyright (that the website is willing to tolerate because it profits from advertisements)?
No, because your browser is dealing with the distribution of data in a way intended by the copyright holder. You also aren't redistributing the webpage after rendering. Client side modifications fall under fair use which is what keeps the likes of ad blockers and other page modifiers legal.
What would violate copyright is if you took that rendered page, turned it into a jpeg, and then hosted that jpeg from your own servers. That's the copying that would run afowl of copyright law.
LLMs seem to be so devoid of intelligence, I think it's arguable if that's learning: https://machinelearning.apple.com/research/illusion-of-think... Typically, you would imply a level of understanding when you say learning. LLMs apparently can't do that, by design.
A human is not a commercial product. Here we have commercial product that was created by using a lot of various copyrighted and protected IP, without licensing agreements, without paying, without even citing it.
Copy/pasting at scale is how tons of software has been written for a long time, or have we all forgotten the jokes people used to make about StackOverflow?
Everybody has had a complete 180 in terms of copyright protections. Before, nobody cared about downloading music, movies, TV shows, or pirating games. Now, when the copyright law is affecting them, they are gungho about protecting these billion-dollar companies' copyrights.
You are attempting to invoke strawman. So is your point that there is not a significant overlap between posters who think that AI companies should not be allowed to pirated use copyrighted material in their training corpus and posters who themselves pirated copyrighted material such as movies, music, games, etc.?
Yes, that is their point. Do you have evidence against it?
I'm sure you can find some overlap, but I bet the vast majority is caused by people making a distinction between commercial and noncommercial piracy. I don't think there's a big cohort of piracy hypocrites.
Due to the nature of the argument, of course I do not have evidence for or against it. However, I am willing to leave it at that, because I think that any rational observer will be able to look at the general mood toward copyright/privacy online (including using Limewire back in the day, pirating movies, downloading Photoshop etc.) and come to their own conclusion whether or not it's plausible that there isn't a significant overlap between the two.
The music and movie companies have power. They have the funds to bankrupt you with a small army of lawyers. You as an individual do not stand a chance against corporate lawyers. They can destroy your life over fairly minimal and non-violent offenses.
AI companies are backed by the very powerful. They can steal all they want and use the same army of lawyers to bankrupt any small rights holder. The big rights holders go to the same parties and allow it to happen.
Regardless of the actual take on copyright, both methods skullfuck the little guy without power.
People cry foul because, at least in the US, we claim to live in a free country based on equality, yet there is a very obvious caste system of the haves and the havenots.
It errodes the legitimacy of the system. Imagine if for years you see news reports of a mother getting a judgment against her where she owes 100s of thousands because she seeded a Brittany Spears song. Then you suddenly see the same laws that were leveraged to instill fear in you, tossed aside when the rich and powerful say it doesn't count anymore, you're going to cry foul!
It's not a hypocrisy of position on copyright, it's bearing witness to the illegitimacy of the laws they're bound by.
Its not a 180. You can be against copyright but as long as copyright is still being enforced on you then you can think it should be enforced on AI companies.
I'd prefer no copyright but we live in a world where there is copyright so its unfair that only AI companies get to be immune.
Its not about "billion-dollar companies' copyrights", but also about voluntary copyleft free software. If I license my code under GPL I don't want other persons/companies just whitewash that code through LLMs and use it in their proprietary code.
I agree with this, and I think that it is an open question whether or not training on copyrighted material is considered transformative or not. However, someone said that thumbnails of full photos are considered transformative enough to allow fair use, and LLM training is (in my opinion) clearly more transformative than converting a picture to a thumbnail. But we will see how it plays out.
Can't imagine using MTG to learn a language. But it does seem intuitive in hindsight. Back when I played in the junior super series and nationals I could recall almost every card and what it did. So I can see how that leap would be tantermount. Kudos.
Note that he's starting from N2 Japanese, which is already a high level of Japanese proficiency (although it does not test writing/speaking at all, so it's very feasible to have N2 yet be terrible at conversation). He's not exactly learning hiragana from M:TG.
The M:TG competitions are giving him a framework to practice that conversation, which believe it or not can be hard to come by in Tokyo without deliberate effort (see 'expat bubble'). The vocab/grammar on the cards is mostly incidental to all that. If he was playing online M:TG in Japanese he wouldn't be getting anywhere near the payoff.
It is more like: I love MTG, how to learn a language through this hobby?
As far as games go, tabletop RPGs are probably better than MTG because they are all about talking. But nothing beats doing what you enjoy doing, and if what you enjoy is MTG, then MTG is the best.
reply