It will take some time until that question can be answered. For some it will likely matter a lot, e.g. engineering or HPC applications that rely heavily on dense solvers. however, even for non FP-heavy workloads (like integer-heavy stuff say databases) the wider SIMD with the new mask and blend instructions [1] _could_ make a significant difference. Still, in many cases significant difference will likely be seen only after careful tuning!
Also note that Intel is refining their pesky market segmentation game: the full 2x 512-bit FMA/cycle is only available on high-end CPUs, only lower-end it's only 1 FMA/cycle and you still get the extra AVX512 clock throttle hit [2]!
I really like the fact that AMD stayed away from such devil-in-the-details feature-based market segmentation!
Wide availability of an ISA creates the economic opportunity for people to hand tune / optimize for it. I wonder if Intel is shooting themselves in the foot by creating instructions that are available only in a tiny fraction of shipping CPUs. AMD would have had a much harder time with Ryzen if AVX 512 was shipping in volume on a bulk of the chips and if a wide variety of software had been written to exploit it.
Depends on the size. Sure AVX512 is great until either A) you thermally throttle or B) you cache miss. Keep in mind that AMD has 8 memory channels vs Intel's 6. So if you need to hit main memory (randomly or sequentially), AMD has a decent edge.
Also note that Intel is refining their pesky market segmentation game: the full 2x 512-bit FMA/cycle is only available on high-end CPUs, only lower-end it's only 1 FMA/cycle and you still get the extra AVX512 clock throttle hit [2]!
I really like the fact that AMD stayed away from such devil-in-the-details feature-based market segmentation!
[1] https://software.intel.com/en-us/blogs/2013/avx-512-instruct... [2] https://www.servethehome.com/wp-content/uploads/2017/07/Inte...