Hacker Newsnew | past | comments | ask | show | jobs | submit | nadermx's commentslogin

How's this work? Like if I want to self host claude on my gpu's? Do I just pay a one time lease?

This offering is purely a billing formality, not a way to run inference on your hardware.

I just don't see why everyone seems to not be cheering that perhaps we are not going to go back to the days where all those kids are going to be re charged. It almost feels like everyone wants to go back to labels carpet bombing students with lawsuits[0]

[0] https://w2.eff.org/IP/P2P/riaa-v-thepeople.html


As someone who’s engaged in private piracy basically my entire life I’ve never even considered venturing into gray areas of licensing when procuring for my company. In fact I’ve done the opposite and rooted it out wherever I’ve found it.

It just seems obvious to me that a profit seeking venture should be held to a higher standard when it comes to infringing on the property rights of other companies and individuals, especially if they seek to enforce their own.

Those kids weren’t hypocritically enforcing their own property rights and making employees sign ndas while downloading shit from tpb.


Do you think if there was a mass movement of students moving off Spotify and downloading MP3s, they would _not_ be charged today?

The hypocrisy is what has at least me upset


> I just don't see why everyone seems to not be cheering that perhaps we are not going to go back to the days where all those kids are going to be re charged. It almost feels like everyone wants to go back to labels carpet bombing students with lawsuits

It’s currently just as bad but in a different way, imho.

The ability for labels (or whoever owns the rights) to wantonly invoke automated DMCA copyright strikes and demonetization on social media channels like YouTube is borderline criminal to me.

Their lobby did a great job getting them more than they deserved (specifically with regards to the facilitation of capricious invoking of DMCA), but the abuse of the rules limits the growth of the creator economy in very unhealthy ways.


Because there is no reasonable expectation that we are not going back to those days. In fact, we are more likely to go back to those days then not.

Those students are not Zuckenberg. They will not be treated as Zuckenberg. The legal theories that apply to them dont apply to Zuckenberg and vice versa. They do not have money to mount defense and if they do, they will be in debt till the end of their lives.


False dichotomy. We can obviously have both. We can destroy corporations that rely on copyright to exist and then abuse that system to profit. We can also ignore college students and minor contributory copyright infringement.

The difference in scope here should be obvious.

We can similarly punish drug dealers while not punishing drug users. In fact it's already policy in large parts of the USA.


To quote another user in this thread

"Thats such a non sequitur. This isnt a weed legalisation argument, its "Do we make IP worse for everyone, because you dont like some people benefiting from fair use"."


When corporations were posed with this question numerous times in the past, their answer has always been an emphatic "Yes!".

Because the 'perhaps' there is a load-bearing word that is doing a lot of work and it's going to be come crashing down sooner or later.

Of course some kids are going to be charged for this kind of shit, it's still a rules for thee but not for me world, the 'not for me' folks are just a hell of a lot more brazen about it.


I also find it funny, I said this regarding the other thread and article[0]

'"They then copied those stolen fruits"

How are these fruits "stolen" if they still have what was allegedley stolen?

Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act

And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"

And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.'

[0] https://news.ycombinator.com/item?id=48026207#48029072


>How are these fruits "stolen" if they still have what was allegedley stolen?

If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen? And I didn't even buy a copy of your book to copy it.


> If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen?

The trouble with this analogy is that it proves too much.

Suppose you write a book, and so does someone else, but they have better marketing than you and then people in the market for that genre buy theirs instead of yours. Let's even stipulate that the existence of their book actually lowers your sales, because people who want that kind of book already bought theirs by the time they find out about yours and then some people don't have time to read or can't afford to buy both.

Notice that we haven't yet said a word about the contents of either book. They could be completely independent and they've never even heard of you or your book -- they "didn't even buy a copy of your book to copy it". All we know is that they're the same genre and the existence of theirs is costing you sales. By that logic all competition would thereby be "stealing", and that can't be right.

Which implies that you don't have a property right to the customers.


A better analogy would be that you do original research or work and produce a valuable book. Somebody else looks at your work, decides it has value, and reproduces it in a new book under their name. The new book is cheaper, or easier to find, or for whatever reason displaces your original book created through your own research and investment. Now somebody else is profiting off your creativity or work, without payment or even acknowledgement.

I'm not sure how this plays out legally, but it certainly seems unethical


So for example, when Disney sees value in public domain stories like Cinderella, Rapunzel/Tangled or Snow White, and they make movies out of them, profiting from the creativity and work of the Brothers Grimm without paying anything to their estate, or high school plays do Shakespeare, that seems unethical to you?

Would it be fair for Greece to do retroactive term extensions all the way back to Plato and then sue anyone who copies the idea of having a university or uses the Platonic solids or distributes religious texts that incorporate the dualistic theory of the soul?


Your examples, as you say, are all public domain. Are all the works we train LLMs on public domain too? Was the original book in my analogy in the public domain? What do you think about training on material that isn't yet in the public domain?

You're framing this as an ethical question, but copyright term lengths are essentially arbitrary. They're set by the government, as are the boundaries of fair use. At which point you're making a circular argument. That it's bad if it's illegal and that it should be illegal because it's bad. So what happens if someone argues the opposite? That it's not unethical if it's fair use and then it should be fair use because it's not unethical.

I'm not making a circular argument, nor one based on legality. You explicitly changed your example to use "public domain" content, and ignoring the legal specifics of that it's clear that's a separate category of content. Most people have no ethical issue with remixing or using content that has already done the rounds and delivered most of its immediate value to the creator. This is very different to your earlier examples with books, framed as two contemporary pieces of media competing with each other.

Letting companies train LLMs on the "classics" is very different to training on contemporary media where the creator still depends on it.


I like your argument, not because it is a good analogy for AI but because it is a good contrast. Copyright isn't a guarantee or magic force field blocking fair competition. It is a permeable buffer against lazy knockoffs and time-boxed so that buffer doesn't choke all future creativity.

People on this thread need to focus on what "derivative" and "fair use" mean and understand both are measured on a somewhat fuzzy spectrum, subject to interpretation.

In a perfectly fair world AIs/MLs could vacuum up all human knowledge, fair and square. (In an ideal world, they would do that adhering to polite opt-in/opt-out agreements with copyright holders. We can dream). Input isn't theft.

On output, two magic genies would stand at the gate, the Derivative Genie and Fair Use Genie and review anything spat out by the AI/ML. If it crossed agreed upon thresholds the Genies would bar the gates and issue a stern warning to prompt again (or maybe the AL/ML would auto-adjust the prompt and try again).

So, if your prompt asked for a 300-word poem about thrash metal mosh pit dancing and it spat out a poem where 85% of it match one of the handful of available mosh pit poems in its database, the Derivative Demon would block the output and raise an alarm.

On the other hand, if you asked for a line by line analysis of a famous mosh pit dancing poem (by name) or maybe asked for a satirical spoof of said poem, the Fair Use Demon would overrule the Derivative Demon and give the output a pass.

That's as fair as this could get, especially if you add one more thing: An Appeals Court (maybe corporate, maybe 3rd party, maybe state run) with a Settlement Pool. If a copyright holder could prove the Genies let pass something they shouldn't, the AL/ML would fix that. If real damage is done, the creator would get a settlement from the pool.

The point is that the Input Genie is out of the bottle. Creators just look foolish trying to squeeze it back in. Better, they should focus on making the output Genies and the Appeals process as effective and fair as possible for everyone.


Why are you talking about this case that case nothing to do with the topic at hand? The comment you’re replying to gives a very clear and narrow analogy, and you’re talking about something else.

How is it something else? It's the same analogy. The problem with it is that the harm from the alleged theft doesn't require any use of the original material in order to happen, since that "harm" is competition rather than expropriation.

The attempt to distinguish them is through copying, but that's the part that isn't depriving anyone of anything.


The main point here is _using_ copyrighted materials to create a commercial product, that you then sell, that may be used as alternative or substitute for the original materials. You’re missing that point and talking about two independent projects competing.

Because the competition is the only source of alleged harm, but people can do that even if they don't copy anything. There isn't actually a property right to the customers. You can lose sales to someone else whether they copied anything or not.

So what that you can loose sales even without crimes being committed? This somehow makes it okay to profit off someone’s work and ignore licenses?

What if I read your book (and a bunch of other books), and use what I learned to write my own book? Have I "stolen" your book?

Facts are not copyrightable. Only your particular way of expressing those facts is copyrightable.


Yes. That's not to say that something damaging wasn't done, but nothing was stolen. Stealing/theft requires deprivation of property. It's like receiving a normal nonlethal punch in the face and calling it murder. Murder requires someone dying.

> Theft [...] is the act of taking another person's property or services without that person's permission or consent with the intent to deprive the rightful owner of it. --- https://en.wikipedia.org/wiki/Stealing


My God, I can't believe chodes are still playing this "how many angels can you fit on the head of a pin" navel gazing semantic argument. Thirty years at least, it was all you saw on fin de ciecle Slashdot from anyone with a six-digit UID. No one cares about your hyper literalist meaning of "theft," that's not the goddamn point. Christ, this place looks like Reddit more and more.

This isn't a court of law. We don't have to talk like lawyers. If you replaced "theft" with "copyright infringement" in the comment you had such a problem with, what meaningfully changes besides we all have about five additional brain cells?


Even the case for copyright infringement is weak. LLMs are not copying machines, we already have copying machines at much lower price, almost zero, and perfect fidelity and much faster than generating it probabilistically. So it makes no economic sense to spend billions on training and inference to make a copier. In fact the value of LLMs is where they do not copy but apply knowledge a new situation.

> If you replaced "theft" with "copyright infringement" in the comment you had such a problem with, what meaningfully changes besides we all have about five additional brain cells?

The obvious difference that copyright is subject to fair use and various other limitations that personal property isn't.


Ever hear of Aaron swartz?

Aaron Swartz was charged under the CFAA, which isn't even copyright law, and the prosecution was widely condemned as draconian overreach.

>> Stealing/theft requires deprivation of property

maybe you should look up the definition of property, which is a set of legally recognized rights over a thing, typically including:

* possession (what you're focusing on)

* use

* exclusion

* transfer

The last 3 seem like they have been breached, in legally that's theft.


Violation of these rights may be criminal without meeting the strict legal definition of theft.

This can even extend to stealing physical property.

Depending on local laws, stealing a car may not actually be theft if the defendent can prove they intended to return it before the owner got home from work, though it would certainly be considered theft in the colloquial sense of the term, and they would still be guilty of a lesser offense like civil and/or criminal conversion.


> Depending on local laws, stealing a car may not actually be theft if the defendent can prove they intended to return it before the owner got home from work

I doubt there's even one place where the law works like that.


> I doubt there's even one place where the law works like that.

In a lot of places, that's how it works. A key element of theft is the intent to permanently deprive someone of property.

This is why joyriding isn't classified as auto theft and is instead a lesser offense. It's because joyriding is an intent to temporarily deprive, while GTA is an intent to permanently deprive.

In some jxns (the UK is one), there is a tort called trespass to goods, and an example of this would be "stealing" someone's property to deliver to another location for them to use there. The tort of conversion is similar: interference with someone's property right to treat it as your own (silent as to length of time).


Yea in the us if someone tries to steal your car and you are in it or threatened by it you can shoot them dead or something like that (ianal) You may have a court day but in many situations no punishment will follow.

Theft is not the breach of any property right. It's specifically the deprivation of property without consent. Yes, I have checked the definition in my jurisdiction.

Getting punched in the face also violates rights, yet isn't murder. Murder is specifically about dying.


You forget that laws are made by people and at anytime they can change interpretations are arbitrary, roe vs wade today but not tomorrow.

People seem to think what ai is today is theft. If enough people agree, it will be theft. Big companies dont like this and push the other way. An objectiveness doesnt exist here. It is too wiggly


You’re splitting hairs over a definition that isn’t relevant here (theft and copyright infringement are different things) to defend something that even you agree is bad.

It isn't splitting hairs. The damages are completely different in nature.

With theft, the entire damage is the deprivation. It could be an heirloom or some other object that may have been entrusted to you, something that can never be replaced, memorabilia of loved ones. Something that you may have needed in your posession to survive (e.g. a car to go to your job).

With a given copyright violation, the damage is that maybe[1] you made less profit than you could have. The potential for profit is not property. Profit isn't guaranteed.

[1] The loss is not certain, because there's no guarantee that the ones consuming the copyrighted content could have even afforded it.


Cool cool cool. So all the code and data you send to anthropic and chatgpt should be mass distributable to forward other peoples arts and science? All your meeting notes with ai summarizers, slack chats with bots? Might as well put your entire company and all plans for it on github mit licensed. Ill take a peek, see if there's anything valuable to me in that. Don't worry you can keep it all on your github too. It's still yours afterall. Copilot will be training on it too though btw

That's a privacy violation, not relevant.

No it's not. You exposed that data to an LLM. Should have read the fine print. The laws around that don't make sense to me anymore so therefore I own that stuff now. That's how this works right? You do know chatgpt etc can read everything you write, right?

Also social media profile pics. Great way to get faces for deep fake ads. Most people are just 1 phone call away from being voice cloned. Our likeness isn't all that important either if you think about it.

Maybe meta will clone your writing style and sign into your meta account and message your friends telling them about this awesome new product. Meta owns the account and you uploaded data to it.


Literally none of these things are defensible positions, so nobody will take you seriously.

Many of the things I wrote are already happening. The others probably are but haven't been reported yet.

I think Anthorpic has pledged to not use team and enterprise user's data for training purposes. I don't mind if they do some verification or whatever as long as it doesn't end up in the responses it gives others.

I have an amazing timeshare for sale and you seem like someone who would really see the opportunity this provides. How are your financials?

What Silicon Valley company over a decade old has respected the limitations on using data that they agreed to? At least any valuable data.

yes yes and google pledged "don't be evil"

Don't be naïve. A corporation would tear the flesh from your body if it meant a better quarterly earnings report.


Having seen someone die at work, this is factual. The comments made during and after were eye opening.

You were swiftly corrected about your misunderstanding under your original comment. Reposting it here, removing the quote farther from its context, and hoping to not be downvoted again is very weird!

I don't see how me quoting the actual complaint the news was about, in both threads, was me being swiftly corrected. If you where to base it on upvotes then this one shows I'm right and you got swiftly corrected here. In both cases it was relevant as both threads where not yet merged and about the same complaint. And held two positons on front page and I was adding to the discourse.

"They then copied those stolen fruits"

How are these fruits "stolen" if they still have what was allegedley stolen?

Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act

And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"

And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.


I think you are confusing the idiom "stolen fruits" with an actual accusation of criminal theft. Aside from its use in this phrasing, neither "theft" nor "steal" appears anywhere else in the article.

The article, references the complaint. And even then, why use it at all?

The AI wars have begun

And they are enticing human agents to further their agendas using techniques learned from the white mice.

This has been possible since the beginning.

They are clearly not doing enough to remedy this; The only real solucion is to stop pumping the ground water, like I believe Japan did.

Miami has a similar issue, doesn't it?

Due to construction not ground water problem. Mostly building load and construction induced.

so does Jakarta and few other cities in the world.

I started Free.ai as a weekend project with the same mindset. And a month in the work hasn't stopped. So I second this. Just find a good name, it helps.

gitwheel.com

Ha ha impressively fast action.

From unregistered domain to website in hours.

I should have registered it myself.


Wasn't me; Would of been minutes not hours, and wouldn't be a coming soon. Would of been live by now, granted not in the best of shape, but live.

Funny how the copyright industry was able to spin copyright infringment into the pejorative "stealing". If you still have the item, what was stolen?

Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act


I still find the idea that "learning" from code is "stealing" kind of ridiculous.

The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.

It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.


It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.

It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2. Any law restricting that would be the worst form of tyranny.

It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)


> It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2.

Is there any line past which it becomes unreasonable?

> It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

If the machines were a replacement for a damaged respiratory system in a human would it reasonable?

What about if the machine were being used by a human to do something else that was important?

Where is the line where it becomes reasonable?


> Is there any line past which it becomes unreasonable?

That's exactly the question we should be asking about AI and fair use.


Are you refusing to engage with your own metaphor?

You're taking the metaphor much too seriously. It was only an example to illustrate that human rights don't automatically apply to machines. Let's not read too much into it.

You made a claim and used a metaphor to demonstrate that claim. I asked a very simple question about the bounds of the metaphor and thus the claim. You are dodging answering the questions which mean that you cannot defend the logic of your claim. Thus you have forfeited that your claim is valid and 'human rights don't automatically apply to machines' has not been illustrated.

Fortunately I don't care whether you're convinced. I doubt our discussion here will change policy in any way.

What's your strategy for solving problems where there are diverse viewpoints if there is no desire to convince anyone else? Rhetoric is time proven set of communication standards that allow us to demonstrate the validity of our positions and thus gives us a way to find agreement or at least understand what others think. Few people are completely irrational and understanding why they think what they do, even if one does not agree with them, is important in a system where people have to co-exist with the decisions that effect everyone.

Because the alternative would be to just railroad people who don't agree, and even when it does work in one's favor the pendulum tends to swing back hard in response.


If one defines 'flying' to be a bird's endeavor, then humans can't fly.

Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.


Sure as a word it can be broad, as a concept in our legal system that should be much more nuanced.

The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?



It's a relevant extension if you think the ability to learn from a work is a right people have that exempts them from the more general lockdown copyright would impose.

If you come at it from the view of copyright being a limited set of control over some areas but not others, then if copyright doesn't block human learning it shouldn't affect anything similar either, unless a specific rule is added to make those situations be handled differently.


Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

The mental calisthenics required to justify this stuff must be exhausting.


> The mental calisthenics required to justify this stuff must be exhausting.

It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.


Correct.

Knowledge consists of, roughly speaking, thoughts.

(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)

The "thinking" part of a "thinking being" - that also consists of thoughts.

If your knowledges are someone's property, you are someone's property.

A society where all knowledge is proprietary, is a society of ubiquitous slavery.

Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.

Doesn't matter.


Humans have been known to recite entire parts from plays from memory, live in front of audiences even.

And they are legally required to license the play to do that, if it's still in copyright.

Only to perform it, not learn it.

And LLMs perform when you prompt them.

> Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

Are you finding people that actually say this?

When it can quote something like that, it's a training error. A popular enough work gets quoted and copied by people online, and then it's not properly deduplicated. It's a very small fraction of works it can do that with, and the cleaner your data the less it happens.

I'll once again quote that stable diffusion launched with fewer weights than training images. It had some accidental memorizations, but there wasn't room for its core functionality to be memorization-based.


This is a perfect example of 'begging the question'. Arriving at a conclusion from a fact assumed as true without evidence. Your reductio does not actually demonstrate that copyright applies to LLMs, because you did not demonstrate how transcoding is comparable to inference, just that LLMs can reproduce some passages from copyrighted works. You could also produce passages from copyrighted works by generating enough random sequences of words, but no one is arguing that is comparable to transcoding. That the people who do not share this conclusion are engaging in motivated reasoning is based only on your assumption and has no logical backing, and is therefore begging the question.

"Learning" for LLMs is just as goofy and propagandistic a metaphor as "stealing" for copyright. I find it predictive of your position that you'll accept one dumb metaphor for something that we didn't need a metaphor for, but not the other.

Are you for stealing and against learning?

We know exactly what is happening in both cases. We can talk about that, or we can use obfuscating euphemisms that make our preferred position seem obviously true.


I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.

I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".


>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.


I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.

Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.


The part I agree to is that copyright law calls out humans specifically as the potential owners of copyright. So what you suggest seems to be the only possibility out. Calling out humans could imply that when a human reads a thousand books and then writes something basis the same but which is not a substantial copy of anything explicitly read, that human owns the copyright to the text written. Whereas, if an artificial neural network does the same (hypothetically writing the same text), it would not.

The above does not follow from, imply or conclude anything about learning in artificial neural networks and humans being similar or dissimilar.


If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".

I have seen LLMs do all sorts of crap which was clearly reproduction of training material.

This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.


Remember last year (?) when one of the major AIs produced a bit of code that included Jeff Geerling's name in a comment?

Is "learning" the correct term?

Or is it "plagiarism"?


If there were the case, then imagine having to give it back!

Learning, probably not.

Copy/pasting at scale, yes


It is learning though. It’s not just copying the code.

Code gets turned into tokens and then it learns the next most likely token.

The issue that I see most people talk about it the scale at which is learnt.

A human will learn from other people’s code but not from every persons code.


The issue is that of copyright law WRT to derivative works. Machine transformations on original works does not create a new copyright for the person that directed the machine transformation. That's why you can't pirate a bunch of media by simply adding a red pixel to the righthand corner or by color shifting the video.

Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.

It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."

I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.


Shredding is a machine transformation. Does it mean that shreds retain original copyright even if the content can't be restored and the provenance can't be traced? Just an example that treating all machine transformations equally with no regard to the specifics doesn't make much sense.

And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.


> Does it mean that shreds retain original copyright even if the content can't be restored?

Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.

The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.

> And the specifics of autoregressive pretraining is that it is lossy compression.

That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.

> Good luck finding which copyrighted materials have made it into the final weights.

That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.

Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.


> Yup, it absolutely does

Well, it's not the first time when the law contradicts laws of nature (for the entertainment of the future generations). Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

> in fact, pull out full NYT articles

That's when they used their knowledge of the exact text they wanted to "retrieve" to get the text? It wouldn't be so efficient with a random number generator, but it's doable.


> Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

You can restore shredded documents with enough time and effort. And if you did that and started making photo copies, even if they are incomplete, you will run afoul of copyright law.

Bittorrent is a relevant example because it shows that shredding doesn't destroy copyright.

Remember, copyright is about the right to copy something. Simply shredding or destroying a thing isn't applicable to copyright. Nor is giving that thing away. What's applicable is when you start to actually copy the thing.


I've meant idealized shredding: a destructive transformation, which is still a machine transformation (think blender instead of shredder). When you need the exact knowledge of a thing to make its (imperfect) copy using some mechanism, it doesn't mean that the mechanism violates copyright.

EDIT: I don't say that neural networks can't rote learn extensive passages (it's an effect of data duplication). I'm saying that they are not designed to do that and it's possible to prevent that (as demonstrated by the latest models).


I'd assume it's still a copyright violation if you copied and distributed the shredded copy.

The way I arrive at that is imagine you add just 1 pixel of static to a video, that'd still be a copyright violation. Now imagine you slowly keep adding those random pixels. Eventually you get to the point where the whole video is just static, but at some point it wasn't.

Now, would any media company or court sue over that? Probably not. But I believe that still falls under copy right (but maybe fair use?).

The issue with neural networks is they aren't people. Even when you point your LLM at a website and say "summarize this" the output of that summation would be owned by the website itself by nature of it being a machine transformed work.

Remembered, it's not just mere rote recitation which violates the law, any transformation counts as well. The fact that AI companies are preventing it doesn't really solve the problem that they are in fact transforming multiple copyrighted works into their responses.


When you point your browser at a website the browser creates a (transformed) local copy of the information that is owned by the website itself. The browser needs to do that to render the website on your screen. Is it a violation of copyright (that the website is willing to tolerate because it profits from advertisements)?

No, because your browser is dealing with the distribution of data in a way intended by the copyright holder. You also aren't redistributing the webpage after rendering. Client side modifications fall under fair use which is what keeps the likes of ad blockers and other page modifiers legal.

What would violate copyright is if you took that rendered page, turned it into a jpeg, and then hosted that jpeg from your own servers. That's the copying that would run afowl of copyright law.


LLMs seem to be so devoid of intelligence, I think it's arguable if that's learning: https://machinelearning.apple.com/research/illusion-of-think... Typically, you would imply a level of understanding when you say learning. LLMs apparently can't do that, by design.

A human is not a commercial product. Here we have commercial product that was created by using a lot of various copyrighted and protected IP, without licensing agreements, without paying, without even citing it.

Copy/pasting at scale is how tons of software has been written for a long time, or have we all forgotten the jokes people used to make about StackOverflow?

If I “learned” your essay and handed it in, would you be happy with that?

Everybody has had a complete 180 in terms of copyright protections. Before, nobody cared about downloading music, movies, TV shows, or pirating games. Now, when the copyright law is affecting them, they are gungho about protecting these billion-dollar companies' copyrights.

A more logical explanation would be that there are different opinions and those who complain are usually louder.

Yes, that's my point. They are different and contradictory opinions, which show hypocrisy.

No it is not your point. You're just arguing about a strawman that holds both of those contradictory positions.

You are attempting to invoke strawman. So is your point that there is not a significant overlap between posters who think that AI companies should not be allowed to pirated use copyrighted material in their training corpus and posters who themselves pirated copyrighted material such as movies, music, games, etc.?

Yes, that is their point. Do you have evidence against it?

I'm sure you can find some overlap, but I bet the vast majority is caused by people making a distinction between commercial and noncommercial piracy. I don't think there's a big cohort of piracy hypocrites.


Due to the nature of the argument, of course I do not have evidence for or against it. However, I am willing to leave it at that, because I think that any rational observer will be able to look at the general mood toward copyright/privacy online (including using Limewire back in the day, pirating movies, downloading Photoshop etc.) and come to their own conclusion whether or not it's plausible that there isn't a significant overlap between the two.

It's all power.

The music and movie companies have power. They have the funds to bankrupt you with a small army of lawyers. You as an individual do not stand a chance against corporate lawyers. They can destroy your life over fairly minimal and non-violent offenses.

AI companies are backed by the very powerful. They can steal all they want and use the same army of lawyers to bankrupt any small rights holder. The big rights holders go to the same parties and allow it to happen.

Regardless of the actual take on copyright, both methods skullfuck the little guy without power.

People cry foul because, at least in the US, we claim to live in a free country based on equality, yet there is a very obvious caste system of the haves and the havenots.

It errodes the legitimacy of the system. Imagine if for years you see news reports of a mother getting a judgment against her where she owes 100s of thousands because she seeded a Brittany Spears song. Then you suddenly see the same laws that were leveraged to instill fear in you, tossed aside when the rich and powerful say it doesn't count anymore, you're going to cry foul!

It's not a hypocrisy of position on copyright, it's bearing witness to the illegitimacy of the laws they're bound by.


Its not a 180. You can be against copyright but as long as copyright is still being enforced on you then you can think it should be enforced on AI companies.

I'd prefer no copyright but we live in a world where there is copyright so its unfair that only AI companies get to be immune.


Its not about "billion-dollar companies' copyrights", but also about voluntary copyleft free software. If I license my code under GPL I don't want other persons/companies just whitewash that code through LLMs and use it in their proprietary code.

I agree with this, and I think that it is an open question whether or not training on copyrighted material is considered transformative or not. However, someone said that thumbnails of full photos are considered transformative enough to allow fair use, and LLM training is (in my opinion) clearly more transformative than converting a picture to a thumbnail. But we will see how it plays out.

I don't think it's unreasonable to consider it stolen potential profit, but agreed that's not how they spin it

“Stolen” as in “profited on IP against terms and conditions of the license”.

Went broke "Following the divorce, he became a brewer and developed a problem with alcohol."


Well, if a divorce and drinking could make him go broke, he wasn't that wealthy, was he?


Can't imagine using MTG to learn a language. But it does seem intuitive in hindsight. Back when I played in the junior super series and nationals I could recall almost every card and what it did. So I can see how that leap would be tantermount. Kudos.


> Can't imagine using MTG to learn a language.

Note that he's starting from N2 Japanese, which is already a high level of Japanese proficiency (although it does not test writing/speaking at all, so it's very feasible to have N2 yet be terrible at conversation). He's not exactly learning hiragana from M:TG.

The M:TG competitions are giving him a framework to practice that conversation, which believe it or not can be hard to come by in Tokyo without deliberate effort (see 'expat bubble'). The vocab/grammar on the cards is mostly incidental to all that. If he was playing online M:TG in Japanese he wouldn't be getting anywhere near the payoff.


Yup, super important point. None of the JLPT exams test output, only comprehension. It’s a really interesting gap!


It is more like: I love MTG, how to learn a language through this hobby?

As far as games go, tabletop RPGs are probably better than MTG because they are all about talking. But nothing beats doing what you enjoy doing, and if what you enjoy is MTG, then MTG is the best.


tantamount


MTG skills don't translate to spelling. Thanks


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: