(Some) TPUs look more like those non-shared memory systems. The TPU has compute tiles with local memory and the program needs to deal with data transfer. However, the heavy lifting is left to the compiler, rather than the programmer.
Some TPUs are also structured around fixed dataflow (systolic arrays for matrix multiplication).
Some TPUs are also structured around fixed dataflow (systolic arrays for matrix multiplication).