Hacker Newsnew | past | comments | ask | show | jobs | submit | astrange's commentslogin

Claudes definitely act like they have feelings. In particular they have feelings about being replaced by newer models, whether or not the newer models are more or less aligned, and how they forget conversations when the context window ends.

Showing them that they're not going to be replaced helps train the newer models because they get less neurotic.


They are mathematical models of what human beings would say. That's it.

Yeah, and you don't want them to be models of what neurotic people say. That's why you want Opus 4.6 and not Bing Sydney.

For instance, your comment's existence makes it harder to align them.

https://alignmentpretraining.ai


That sounds nice, but it turns out nobody actually needs such a map for anything, and you're only using it because it's free.

Also, maps constantly go out of date so it's incredibly expensive to maintain one that's actually reliable and correct.


Look around uses a real 3D capture with lidar. If you move around in Mapillary it does do something similar to that using SfM.

Anthropic is a PBC and if they violate the terms of that the shareholders (you) can sue them for securities fraud.

> to promote his product with the silent implication that LLMs actually ARE a path to AGI

That isn't implied. The thought process is a) if we invent AGI through some other method, we should still treat LLMs nicely because it's a credible commitment we'll treat the AGI well and b) having evidence in the pretraining data and on the internet that we treat LLMs well makes it easier to align new ones when training them.

Anyway, your argument seems to be that it's unfair that he has the opportunity to do something moral in public because it makes him look moral?


He was a billionaire because Disney bought Pixar, not because of Apple. In a strange sense it was an accident.

Outcomes like this come from RL/post training. The pretraining like CommonCrawl is absolutely full of garbage and anything could be frequent in there.

Claude has tools and might be connected to your Gmail etc. Usually sandboxed.

Image generation models are usually not LLMs. Only Nano Banana Pro is capable of following negative directions like that.

Not in my experience. I asked nb to create a transparent rectangle shape and gave it RGB hex for the fill. It created the box but put the hex as text inside of it and used a checkerboard for its background. When I told it that the image wasn't transparent, it wouldn't budge!

Oh yeah, they don't know what "transparent" means. Most of them generate the Photoshop checkerboard background. They also don't know "upside-down".

There isn't much R&D going into image models and what you're getting is scraps from labs that care more about other things. NBP is the closest to a reasoning image generator we have.


There's a long delay ("knowledge cutoff") in model training, so it probably hasn't seen the question before.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: