Hacker Newsnew | past | comments | ask | show | jobs | submit | cmiles74's commentslogin

Seems like a bad bet to me. It looks like authors are going to lose this case setting the precedent that you not only don’t need to license training data, obtaining it illegally (for free) is totally okay.

Didn't Anthropic's case already set the precedent that training itself is fine? It's not like copyrighted novels are a large portion of human-generated text data. It's just the stuff that's easier to get because it's preserved in bulk.

Video transcription has more or less been solved. Imagine how much data Google has in YouTube transcripts. And the longer these AI chat bots operate the more data they manage to collect for training as well (I think Google making it so you can easily upvote or downvote a response by the bot is a good idea).


It wedged distribution away from record companies. IMHO, that was a pretty big concern for them.

Is this maybe more about the quality of the documentation? I say this 'cause my thinking is that reading is reading, it takes the same time to read the information.

I think it varies. Most enterprise software is good enough if it just works. In the consumer space quality and polish is way more important. Then there are things like modeling and video where performance is a much bigger deal.

Sure, no one really cares about the code but the quality of the code matters more for some products (and in different ways) than others.


The article isn't clear on this point, I believe because Meta isn't clear on this themselves. Other bits of this piece highlight third parties reviewing the responses of the AI assistant; it's possible that people are recording and some sound they make triggers the AI assistant which, in turn, leads to the video being reviewed.

OTOH, Meta could just be desperate for training content and they're just slurping up all recordings by people who've opted into the AI function. It would be great for them to clarify how this works.


My reading was that as soon as you enable the "AI" functionality you are opted into having your recordings labeled.

"But for the AI assistant to function, voice, text, image and sometimes video must be processed and may be shared onwards. This data processing is done automatically and cannot be turned off."


Right, that's the section I was confused by because it was in the context of an experiment trying to use the AI stuff without an Internet connection, which obviously won't work. The article is using the "shared onwards" terminology to refer to at least inference. But the inference part is uninteresting to me, and the data labeling is. The article doesn't really separate those out.

I would figure if there is AI labeling that some things will confuse the system and will be sent to a human. And some things will randomly be sent to a human for error checking. Same thing with Alexa, I figure there's always a low probability chance that anything I say to her will end up reaching a human. She's not always listening as some people fear (the data use would have been detected long ago if she were), but humans occasionally trigger her accidentally--and such errant triggers will be more likely to be sent to a human because they are not going to make sense.

This was one of the first hits on Kagi. 404 has a similar article (I think) but it's behind a paywall.

"The demand for this ‘Ray-Ban hack’ has been steadily increasing, with the hobbyist’s waiting list growing longer by the day. This demonstrates a clear desire among Ray-Ban owners to exercise more control over their privacy and mitigate concerns about unknowingly recording others."

https://bytetrending.com/2025/10/28/ray-ban-hack-disabling-t...


It took a good while for me to figure out that a family member had inadvertently signed up for an Apple Music account that they were not using.

Apple going from being "the new microsoft" to being "the new AOL"

Isn’t that the hard part? If the tasks are small enough and well defined, where’s the win over just writing the code right there and then?

Well claude can also refine it into smaller tasks and that’s where you can fix those major problems in production issues.

It’s the hard part which is why these tools are so great, the writing of code was the tedious part

You can use an LLM to generate that list of tasks.

And how does a new grad that's never actually programmed know whether that list of tasks makes sense?

I can take a verbal description from a meeting with five to ten people and put together something they can interact with in two weeks. That is a lot slower than Claude Code! Yet everywhere I’ve worked, this is more than fast enough.

Over two more weeks I can work with those same five to ten people (who often disagree or have different goals) and get a first draft of a feature or small, targeted product together. In those latter two weeks, writing code isn’t what takes time; working through what people think they mean verses what they are actually saying, mediating one group of them to another when they disagree (or mostly agree) is the work. And then, after that, we introduce a customer. Along the way I learn to become something of an expert in whatever the thing is and continue to grow the product, handing chunks of responsibility to other developers at which point it turns into a real thing.

I work with AI tooling and leverage AI as part of products, where it makes sense. There are parts of this cycle where it is helpful and time saving, but it certainly can’t replace me. It can speed up coding in the first version but, today, I end up going back and rewriting chunks and, so far, that eats up the wins. The middle bit it clearly can’t do, and even at the end when changes are more directed it tends toward weirdly complicated solutions that aren’t really practical.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: