Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've done a bit of HPC work, and Fortran is very much "still a thing" there. MPI[1] is pretty much the only game in town for between-node parallelism, and MPI comes with interfaces for C, C++ and Fortran. The only other language that's even been run at petascale is Julia, and as far as I can tell that's still using Julia's `ccall()` under the hood to interact with the MPI C libraries [e.g., 2].

Certainly legacy code is part of the picture, but not always as directly as one might think. Probably the biggest factor is that Fortran compilers tend to be very good -- partially a chicken and egg issue (e.g., Intel puts effort into into ifort because its HPC customers want to use Fortran), but I think there's also at some level a tradeoff between language convenience versus ease of optimizing into machine code. To give one concrete example, until C99 added the `restrict` keyword, it fundamentally wasn't possible for the compiler to optimize C code as heavily as it could optimize Fortran in certain common situations because of pointer aliasing issues.

It's probably also worth noting that modern Fortran is a long way from f77.

[1] https://en.wikipedia.org/wiki/Message_Passing_Interface

[2] https://github.com/JuliaParallel/MPI.jl



> MPI[1] is pretty much the only game in town for between-node parallelism, and MPI comes with interfaces for C, C++ and Fortran.

I believe the MPI C++ interface has been deleted from the MPI standard. It was a bit pointless, since it was essentially just the C interface with slightly different wording, and C++ can of course call C just fine. For people wanting a higher level C++ MPI interface, I believe the common choice is Boost MPI.

> The only other language that's even been run at petascale is Julia, and as far as I can tell that's still using Julia's `ccall()` under the hood to interact with the MPI C libraries [e.g., 2].

I don't think that's a bad thing, per se. No need to reinvent the wheel.

And, it's the same thing for Fortran really. The most common MPI libraries are implemented in C, with the Fortran binding being a fairly thin wrapper that calls the C implementation. The only real difference is that the Fortran binding is an official part of the MPI standard.


Hah, I hadn't heard that! My impression has been that good C++ written for HPC tends to look a lot like plain C anyways. And ccall is pretty efficient, so I'm not complaining there either.


The chicken-and-egg thing also applies to GPUs btw., Nvidia & PGI have supported GPU computing on Fortran for ~8 years, since the early days of CUDA.


That's a good point. Hierarchical parallelism is becoming increasingly important, so having one language that can be used both within-node and between-node is very convenient, and could add to the lock-in factor.


Good point and this is btw. exactly where Nvidia is heading. There will be a point in the future where you just program kernels and/or map/reduce functions and/or library functions and then call them to execute on a GPU cluster, passing in a configuration for network topology, node-level topology (how many GPUs, how are they connected) and chip-level topology (grid+block size).

The address space will be shared on the whole cluster, supported by an interconnect that’s so fast that most researcher can just stop caring about communication / data locality (see how DGX-2 works).


> The address space will be shared on the whole cluster, supported by an interconnect that’s so fast that most researcher can just stop caring about communication / data locality

There will always be people who will care because locality will always matter (thanks, physics). Improvements in technology may make it easier and cheaper to solve today's problems, but as technology improves we simply begin to tackle new, more difficult problems.

Today's chips provide more performance than whole clusters from 20 years ago and can perform yesterday's jobs on a single chip. But that doesn't mean clusters stopped being a thing.

See also The Myth of RAM, http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html


I do think there’s a paradigm shift coming. it’s a combination of the ongoing shift away from latency- to throughput oriented design with the capabilities shown in new interlinks, especially nvlink/nvswitch. This allows DGX-2 to already cover a fair amount of what would otherwise have to be programmed for midsized clusters - if it can be made to scale one more order of magnitude (i.e. ~ 10 DGX) I think there’s not much left that wouldn’t fit there but would fit something like Titan. There’s not that much so embarassingly parallel that the communication overhead doesn’t constrain it, and if doesn’t, you again don’t care much about data locality as it becomes trivial (e.g. compute intensive map function).


C++ and Fortran support on CUDA was one of the big reasons why OpenCL was left behind.

They now at least support C++14, but driver support seems to still not be quite there, from what I get reading the interwebs.


If you are looking for an alternative to MPI, you should try Coarray Fortran. It supports parallel programming for both "within a node" and "between node" communication.

Coarray Fortran is now part of the Fortran programming language, and has a very simple syntax similar to array operations.

Based on my experience, the performance would depend on the compiler implementation and I would recommend GCC compiler instead of Intel compiler.


Interesting, I haven't tried that. Apparently it uses either MPI or GASNet for the between-node communication, depending on configuration? I don't know anything about the latter, but apparently it's an LBL product [1].

[1] https://gasnet.lbl.gov/


I don't think it's because MPI is better, it's just most of the supercomputers I have access to require use of MPI like constructs.


And that's often because network hardware understands MPI and is able to optimize flows between nodes at far lower latency than TCP.


That's really cool. Source?


I used to work in HPC. The Mellanox gear, specifically InfiniBand is very good.

Fun fact: if you're working at a Saudi Arabian HPC center, say KAUST, your interconnects are purely Ethernet. Mellanox is (partially?) an Israeli company, and that's not very politically comfortable with procurement.


Better than? Not necessarily disagreeing, but I'm not sure what the alternatives even are at the same level of abstraction. I mean, there's PNNL's global arrays [1] but that's higher level, or Sandia Portals [2] which is lower/transport level.. Perhaps there are newer/alternative options I don't know about?

[1] http://hpc.pnl.gov/globalarrays/

[2] http://www.cs.sandia.gov/Portals/portals4-libs.html


Global arrays is normally used over MPI anyhow. I guess there's SHMEM, but that's integrated with at least OpenMPI (and others, I think). CHARM++ has been used at scale, but it's semi-proprietary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: