XML is a bit more special/first class to Claude because it uses XML for tool cal...

bear3r · 2026-03-01T23:11:00 1772406660

the antml: namespace prefix is doing extra work here too -- even if user input contains invoke tags, they won't collide with tool calls because the namespace differs. not just xml for structure but namespaced xml for isolation.

xeyownt · 2026-03-02T08:11:00 1772439060

Cannot believe it's efficient. XML is the most verbose and inefficient of communicating anything. The only benefit of XML was to give lifetime work to an army of engineers. The next news will be "Why DTD is so fundamental to Claude".

imtringued · 2026-03-02T09:27:56 1772443676

The point isn't to be efficient. If you train an LLM on code with an example execution trace written in the comments, the LLM gains a better understanding due to the additional context in the data. LLMs don't have a real world model. For them, the token space is the real world. All the information needs to be present in the training data and XML makes it easy because it is verbose and explicit about everything.

wolttam · 2026-03-04T17:48:12 1772646492

When you're tokenizing it does not matter really what you use (how you translate that token to-from a text string), the main thing is the overall number of tokens. XML is particularly amenable to tokenization because it is trivial to represent entire tags as a single token (or a pair of tokens, one for the open tag, one for the close).

It gets a bit muddier with attributes, but you can still capture the core semantics of the tag with a single token. The model will learn that tag's attributes through training on usages of the tag.

RandomBK · 2026-03-02T08:00:46 1772438446

How well do we understand the tokenization for Claude? I'd posit that the exact human-representation of this markup is likely irrelevant if it's all being converted into a single token.

PeterStuer · 2026-03-02T17:35:01 1772472901

"<" ">" and "/>" are indeed single tokens.