We're building our LLVM backend, allowing us to support any language with a LLVM frontend. You do not need an OS or host CPU to move memory (unlike a lot of similar many core processors), and each core is fully capable of loading/storing from any cores scratchpad (either on a single chip or over multiple connected chips) or to/from DRAM.
Programmed via OCL? Don't I need some kind of OS/host CPU functionality to load the inputs into DRAM and retrieve them?