I did some assembly optimization for an internal RTL-level simulator. We had ~1000 machines on a three year upgrade cycle, i.e., we upgraded 333 machines / year = $333k / year. Lets say I cost the company $200k / year. Several days = perhaps $2k, so I'd only need to get a .6% speedup for it to be worth it, not even including the cost of powering and maintaining our machines.
When I worked on it, our simulator was an order of magnitude faster than commercially available simulators (Synopsis VCS and Cadence NC-Verilog), which cost between $1k and $10k per license per year. I worked for a tiny hardware startup; established hardware companies use a few orders of magnitude more compute power than we did, so the equation is probably at least four orders of magnitude further in favor of doing assembly optimization in a commercial simulator.
When I worked on it, our simulator was an order of magnitude faster than commercially available simulators (Synopsis VCS and Cadence NC-Verilog), which cost between $1k and $10k per license per year. I worked for a tiny hardware startup; established hardware companies use a few orders of magnitude more compute power than we did, so the equation is probably at least four orders of magnitude further in favor of doing assembly optimization in a commercial simulator.