Or maybe use the best of both worlds, with soldered-in ultra fast ram, plus larg...

kllrnohj · on Nov 30, 2020

> Or maybe use the best of both worlds, with soldered-in ultra fast ram

That's basically what L3 cache is on Intel & AMD's existing CPUs. You could add an L4, but at some point the amount of caches you go through is also itself a bottleneck, along with being a bit ridiculous.

sjwright · on Nov 30, 2020

The way I see it, you could have a Mac Pro with (let’s say) 32GB of super-fast on-package RAM and arbitrarily upgradable DIMM slots. The consequence would be that some RAM would be faster and some would be a bit slower.

They would be contiguous, not layered.

compiler-guy · on Nov 30, 2020

The non-uniform memory performance of such a solution would be a software nightmare.

sliken · on Nov 30, 2020

Doesn't seem much different than various multichip or multisocket solutions where different parts of memory have different latencies, called NUMA. Basically the OS keeps track of how busy pages are and rebalances things for heavily used pages that are placed poorly.

Similarly, Optane (in dimm form) is basically slow memory, OSs seem to handle it fine. NUMA support seems pretty mature today and handle common use cases well.

With all that said, apple could just add a second CPU to double the ram and cores, seems like a great fit for a Mac Pro.

loeg · on Nov 30, 2020

It doesn't seem any worse than existing NUMA systems today, where memory latency depends on what core you're running on. In contrast, the proposed system would have the same performance for on-board vs plugged DIMM regardless of which CPU is accessing it, which simplifies scheduling — from a scheduling perspective, it's all the same. I think that's easier to work with than e.g. Zen1 NUMA systems.

bhuber · on Nov 30, 2020

OSes have had this problem solved for decades; the solution is called "swap files". You could naively get any current OS working in a system with fast and slow RAM by simply creating a ramdisk on the slow memory address block and telling the OS to create a swap file there.

loeg · on Nov 30, 2020

> OSes have had this problem solved for decades; the solution is called "swap files".

What operating systems handle NUMA memory through swapping? The only one I'm familiar with doesn't use a swapping design for NUMA systems, so I'm curious to learn more.

temac · on Nov 30, 2020

Not really the best idea for the kind of speed baselines and differences discussed here. You can use better ideas like putting GPU memory first in the fast part then the rest in the slow area. You know, like XBox Series does.

sliken · on Nov 30, 2020

Yet apple is managing excellent performance with just a l1+l2.

creato · on Nov 30, 2020

But the context of this thread is that it is being done with soldered RAM. I don't know how much that matters, just pointing out that you are taking the conversation in a circle.