> 2. forced to continually output tokens without...
I reference one paper that explores this. It lets a model output identical ellipsis tokens that are then discarded before the final output. It worked pretty well and is interesting.
There is another restriction that I thought about, and think I mentioned a bit in the article which is the grammatical structure. All model outputs must be grammatically sound so you would expect the first few and last few layers to be dedicated to decoding and encoding any thoughts into grammatically correct language. This restricts the number of layers working with free-form thoughts.
> I find most pessimistic dismissive arguments are based on criticism against some imagined standard of perfection...
We are in agreement there. They have their imperfections and issues, we have ours. Our natures are fundamentally different so this should be expected.
> 2. forced to continually output tokens without...
I reference one paper that explores this. It lets a model output identical ellipsis tokens that are then discarded before the final output. It worked pretty well and is interesting.
There is another restriction that I thought about, and think I mentioned a bit in the article which is the grammatical structure. All model outputs must be grammatically sound so you would expect the first few and last few layers to be dedicated to decoding and encoding any thoughts into grammatically correct language. This restricts the number of layers working with free-form thoughts.
> I find most pessimistic dismissive arguments are based on criticism against some imagined standard of perfection...
We are in agreement there. They have their imperfections and issues, we have ours. Our natures are fundamentally different so this should be expected.