There is, it's Druid. Intel announced the first four codenames in 2021.
> [...] first generation, based on the Xe HPG microarchitecture, codenamed Alchemist (formerly known as DG2). Intel also revealed the code names of future generations under the Arc brand: Battlemage, Celestial and Druid.
In the post about the texture unit, that ROM table for mip level address offsets seems to use quite a bit of space. Have you considered making the mip base addresses a part of the texture spec instead?
The problem with doing that is it would require significantly more space in that spec. At a minimum, one offset for each possible mip level. That data needs to be moved around the GPU internally quite a bit, crossing clock domains and everything else, and would require a ton of extra registers to keep track of. Putting it in a ROM is basically free - a pair of BRAM versus a ton of registers (and the associated timing considerations), the BRAM wins almost every time.
I don't understand why the programs are the same. The partial render store program has to write out both the color and the depth buffer, while the final render store should only write out color and throw away depth.
E.g in their example in the link above for deferred rendering (figure 4) the multiple G buffers won't actually need to leave the on-chip tile buffer - unless there's a partial render before the final shading shader is run.
Right, I had the article's bunny test program on my mind, which looks like it has only one pass.
In OpenGL, the driver would have to scan the following commands to see if it can discard the depth data. If it doesn't see the depth buffer get cleared, it has to be conservative and save the data. I assume mobile GPU drivers in general do make the effort to do this optimization, as the bandwidth savings are significant.
In Vulkan, the application explicitly specifies which attachment (i.e. stencil, depth, color buffer) must be persisted at the end of a render pass, and which need not. So that maps nicely to the "final render flush program".
The quote is about Metal, though, which I'm not familiar with, but a sibling comment points out it's similar to Vulkan in this aspect.
So that leaves me wondering: did Rosenzweig happen to only try Metal apps that always use MTLStoreAction.store in passes that overflow the TVB, or is the Metal driver skipping a useful optimization, or neither? E.g. because the hardware has another control for this?
That's what I thought, too, until I saw ARM's Hot Chips 2016 slides. Page 24 shows that they write transformed positions to RAM, and later write varyings to RAM. That's for Bifrost, but it's implied Midgard is the same, except it doesn't filter out vertices from culled primitives.
That makes me wonder whether the other GPUs with position-only shading - Intel and Adreno - do the same.
As for PowerVR, I've never seen them described as position-only shaders - I think they've always done full vertex processing upfront.
Mali's slides here still show them doing two vertex shading passes, one for positions, and again for other attributes. I'm guessing "memory" here means high-performance in-unit memory like TMEM, rather than a full frame's worth of data, but I'm not sure!
Linux allocates page tables lazily, and fills them lazily. The only upfront work is to mark the virtual address range as valid and associated with the file. I'd expect mapping giant files to be fast enough to not need windowing.
There are still some cases where you'd not want unlimited VM mapping, but those are getting a bit esoteric and at least the most obvious ones are in the process of getting fixed.
Only one row at a time has voltage applied to it. In one update, the image is scanned out multiple times, so it appears as if all pixels were changing simultaneously (and perhaps they do, if the electrodes have significant capacitance.)
With fewer rows to update, each row gets a push more often, flipping the grains faster.
Thread migration only costs on the order of 100 microseconds, including the effect of cold caches. If you keep the AVX thread on the big core for at least 100 milliseconds at a time, you only lose ~0.2% performance.
> [...] first generation, based on the Xe HPG microarchitecture, codenamed Alchemist (formerly known as DG2). Intel also revealed the code names of future generations under the Arc brand: Battlemage, Celestial and Druid.
https://www.intel.com/content/www/us/en/newsroom/news/introd...