Is your parser general-purpose? Is it template-based? Is the performance variance due only to the impact on the processor's cache or are there other factors? Is the code open-source, maybe I could just look myself?
It is template based so that each types JSON parser is statically known and to give the compiler more opportunity to optimize. https://github.com/beached/daw_json_link