Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Potential of the Julia programming language for high energy physics computing (arxiv.org)
120 points by npalli on Dec 4, 2023 | hide | past | favorite | 206 comments


Okey, I gotta mention this feeling I've had in my chest for a while.

A couple of years ago, when I was so faschinated by different programming languages, I stumpled upon Julia. I don't exactly remember how but I think I was scrolling through a list of programming languages that looked interesting so I could write "Hello, world" in it.

So I stumpled upon Julia, knowing nothing about it and downloaded it. At the time the language was still very young, but I remember thinking to myself that this language has a lot of potential, it promises something we actually need, like being dynamic yet high performance.

I've slowly been following Julias progress overtime and damn, it is growing rapidly and gained a lot of attention! I'm happy because it deserves it.

I've been working with Matlab again and I'm thinking about giving Julia another shot, this is also because Apache Octave was a disaster for me.

The end.


We had our first JuliaHEP workshop this year: https://indico.cern.ch/event/1292759/


As someone that took part on the early Python courses during the Summer Schools in 2003/4, and was doing CMT build scripts, as Python was slowly being adopted at CERN, it will be interesting to follow how Julia will be picked up.


Cool! Writing math-heavy code in Julia is such a pleasure. C++ is a pain, numpy is alright but Julia just makes sense. I really wish I could use it for my day-job.

Back when I was on ATLAS we mostly did things in C++. If I never have to do another matrix multiplication in C++ again it will be too soon.


C++ is pain, numpy is pain if you need a loop, numba and cython are pain if you need any more complex data structures.

Sadly Julia is pain too if you don't do REPL/notebook (which you shouldn't). Julia has the design to solve the two language problem but not the implementation. And will probably never have because Julia community refuses to see this as a problem.


I thought that notebook based development and package based development were diametrically opposed in the past, but Pluto.jl notebooks have changed my mind about this.

A Pluto.jl notebook is a human readable Julia source file. The Pluto.jl package is itself developed via Pluto.jl notebooks.

https://github.com/fonsp/Pluto.jl

Also, the VSCode Julia plugin tooling has really expanded in functionality and usability for me in the past year. The integrated debugging took some work to setup, but is fast enough to drop into a local frame.

https://code.visualstudio.com/docs/languages/julia

Julia is the first language I have achieved full life cycle integration between exploratory code to sharable package. It even runs quite well on my Android. 2023 is the first year I was able to solve a differential equation or render a 3D surface from a calculated mesh with the hardware in my pocket.


Pluto.jl does solve the state problem of notebooks, which is the biggest problem in them. Probably should give it another go.

How is the module reloading story nowadays? Last time I gave up the state of the art was Revise.jl, which was a world of pain.


I have no idea what kind of problems you have ;) I use Revise.jl on a daily basis and it works flawlessly. I don't have any comparable workflow in any other language I work with on a daily basis. I wish I could do this style in those as well...

Yes, you should give it a go (again)!



I'm not sure how Revise would work with a notebook though... Guess it's only useful in the REPL, which is a different way to work.


In VSCode, in an ordinary Julia file you can place the magic comments `# %%` at the start of a code cell (needs to be on its own line), and then you can send these “cells” to the REPL to execute. No notebook needed.


What do you do with it? Especially that benefits from Revise?


> Pluto.jl does solve the state problem of notebooks

Yes it does, unless you do crazy stuff (which you probably should put in a Julia file and let it precompile). The order of cells and execution order does not matter (unless you do crazy stuff).


A Pluto.jl notebook is also a regular file. You need to use a package for precompilation caching though.


> It even runs quite well on my Android. 2023 is the first year I was able to solve a differential equation or render a 3D surface from a calculated mesh with the hardware in my pocket.

Is that via webassembly or... ?


No. Julia runs natively on ARM.

  julia> versioninfo() 
                                                 
  Julia Version 1.9.3                                                   
  Commit bed2cd540a1 (2023-08-24 14:43 UTC)                             
  Build Info:                                                             
  Official https://julialang.org/ release
  Platform Info:                                                          
  OS: Linux (aarch64-linux-gnu)                                         
  CPU: 6 × Cortex-A55                                                   
  WORD_SIZE: 64                                                         
  LIBM: libopenlibm                                                     
  LLVM: libLLVM-14.0.6 (ORCJIT, cortex-a55)                             
  Threads: 1 on 8 virtual cores


Would you expand on how you are running Julia on Android?



I'm starting to think that the two language problem is not really solvable at the language level. Maybe it is, but there's a significant friction.

High level naturally pushes you towards abstract types, GC, not caring about allocation, do what I mean not what I said.

Low level naturally pushes you towards concrete types, deterministic mallocs, do what I said not what I mean.

e.g. do I want integers like Python or integers like C? Yes.


I feel F# is a pretty good candidate. One can get pretty down and dirty with low-level CLR stuff or unmanaged code, but then you have a very nice high-level language.


+1 on F#, one of the most underrated language around.


It's so odd how it's been probably a decade of "F# is underrated" but nothing makes it shine to the eyes of the market .. really strange


It's really more of a result of human sentiment than anything. F# is underrated and underappreciated, and it just doesn't have a critical mass of people developing libraries for it. The issues to its adoption were being associated with .NET (Windows-only at the time) and Microsoft and being sold as a C# alternative rather than its own thing and an alternative to other languages like Python.

An interesting thought experiment is to think if someone was to release F# today as a new language but calling it something else, like Flosure or something. I bet people would lose their minds over it.


Running on CLR and coming from Mordor are the prime reasons I won't even evaluate it. Not sure how prevalent or fair this is, but at least for N=1 makes it not shine.


As much as i love the CLR on technical level (as opposed to the JVM), and feel like Oracle is a much worst devil than MSFT. I completely understand the sentiment, msft have consistently done some very anti-community/open-source move with their open source offering.


Yep. Much of this is sentimental. I associate Java with Sun, which doesn't have the same aura of evil as MSFT or Oracle. Thus I hate Java mostly based on its technical merits and the thousands of hours I've wasted with it.

While this is sentimental, as in it can't be exhaustively explained by logic and known facts, I still think it's somewhat rational. Fool me once etc.

The newer generations probably don't really grasp what MSFT pulled on OSS, developers and computing in general to make it the monopoly it is. It's possible that it has changed course, but I'll probably never trust it has.


I’ve always thought Microsoft should embed it into Excel as a coequal language to VBA.

Finance programmers would go nuts!


Yeah it would have been great for finance, and the ocaml and ml langauge seems to be popular in some HFT firms.

Saddly i think even this boat has sailed since they added python to Excel now. A true miss opportunity.


True, and it may even attract R users since it seems functional in its core.

MS did include lambdas as a primitive but also decided to use python as new formula language.


> do I want integers like Python or integers like C? Yes.

Julia has BigInt and Cint. Maybe there could be a more Pythonic implementation that scales between machine types and BigInt. That would not be hard to implement in Julia. I just have not found a good use case.

What I like about Julia is that I can do both the high level and low level in Julia. Abstract code becomes concrete due to late binding.

You can access Libc.malloc and Libc.free if needed. See GPUCompiler.jl

The effect system exists. If you can prove no side effects, eager finalization is also possible. That is the finalizer and deallocation will run deterministically. It's new though so effect analysis is a mostly manual affair at the moment.


I sincerely agree with your comments so much. People that don't live with code don't see this and have weird romantic ideas of what it is and what it does. Mixing levels of abstraction has never made much sense (ultimately) to me.


Javascript is not that far from solving the two language problem. In fact v8/spidermonkey speed is very acceptable for most number crunching. And Javascript isn't even very friendly for high performance but The JITs are miraculous. Sadly Javascript as language nor its "ecosystem" are very suitable for data-analysis like use.

I want integers like Python that are as fast as integers like C. I don't care how they are allocated or deallocated. Javascript comes very close even without having integers at all.

Probably nobody expects to squeeze every last cycle with a higher level language. I'd say getting consistently within magnitude of C performance can be said to have solved the two-language problem. GC pauses etc are perfectly acceptable for e.g. data analysis.


Let's see how you do those high energy physics simulations with JS...


Why would I do that?

Although I wouldn't be surprised them being faster than ROOT with e.g. asm.js.


In some cases yes that will be true. But good luck having the inertia to replace ROOT in general (by anything). While many people hate ROOT (I do and feel the pain daily) it is highly tailored optimized piece that will take decades to try to implement something like it. And I doubt you would be able to handle low-level tasks like optimizing memory footprint needed for HEP applications when running stuff in online world (while collecting data).

JS+ASM greatest advantage is cross-platform which is one of the least features requested in the field. i.e No one expect you to be able to run your analysis code on windows.


I wasn't proposing ROOT to be reimplemented in JS. That was what the GP attributed to me.

My point is that JS is really fast for some things, at least compared to what some people's impressions seem to be. Even though it's very optimization unfriendly in its semantics. And that this hints that the two language problem is indeed solvable for analysis at least.

"Laypeople" may also think that code is optimized to the last cycle in something like HEP simulations. It's made fast enough and the optimization is nowhere near the level of e.g. graphics heavy games.

Real-time usage like high frequency large data collection will probably never happen on the "single language". But I'd guess ROOT is not used at that level either? Also at least last time I checked, ROOT is moving to Python (probably not for the hottest loops of the simulation though).

(Off-topic: C++ interpretation like done in ROOT seems like a really bad idea.)


> I wasn't proposing ROOT to be reimplemented in JS. That was what the GP attributed to me.

Sorry for assuming that. I really felt the pain of thinking of possibility of combining two things I hate so much together (JS+ROOT)

> "Laypeople" may also think that code is optimized to the last cycle in something like HEP simulations. It's made fast enough and the optimization is nowhere near the level of e.g. graphics heavy games.

I understand that in other areas there might be more sophisticated optimizations, but does not change things much inside HEP field community. And it is not optimized only for simulations but for other things too. It is not one problem optimization.

> Real-time usage like high frequency large data collection will probably never happen on the "single language". But I'd guess ROOT is not used at that level either? Also at least last time I checked, ROOT is moving to Python (probably not for the hottest loops of the simulation though).

I did not mean to indicate that ROOT is being used to handle the online processing (In HEP terms). It is usually handled via optimized C++ compiled code. My idea is that you will probably never use JS or any interpreted language (or anything other than C++ to be pessimistic) for that. ROOT at the end of the day is much closer to C++ than anything else. So learning curve wouldn't be that much if you come with some C++ knowledge initially.

> Also at least last time I checked, ROOT is moving to Python (probably not for the hottest loops of the simulation though).

I think you mean PyROOT [1]? This is the official python ROOT interface It provides a set of Python bindings to the ROOT C++ libraries, allowing Python scripts to interact directly with ROOT classes and methods as if they were native Python. But that does not represent and re-writing. It makes things easier for end users who are doing analysis though, while be efficient in terms of performance, especially for operations that are heavily optimized in ROOT.

There is also uproot [2] which is a purely Python-based reader and writer of ROOT files. It is not a part of the official ROOT project and does not depend on the ROOT libraries. Instead, uproot re-implements the I/O functionalities of ROOT in Python. However, it does not provide an interface to the full range of ROOT functionalities. It is particularly useful for integrating ROOT data into a Python-based data analysis pipeline, where libraries like NumPy, SciPy, Matplotlib, and Pandas ..etc are used.

> Off-topic: C++ interpretation like done in ROOT seems like a really bad idea.)

I will agree with you. But to be fair the purpose of ROOT is interactive data analysis but over the decades a lot of things gets added, and many experiments had their own soft forks and things started to get very messy quickly. So that there is no much inertia to fix problems and introduce improvements.

[1] https://root.cern/manual/python/

[2] https://github.com/scikit-hep/uproot5


While I was at CERN in 2003 - 2004, that was exactly what some people where doing on their Windows computers.

The alternative being OS X, or maybe Solaris, if one was in a building that still had a couple of pizza boxes that weren't yet on the throw away pile.


Via Pluto.jl notebooks, I have been able explore the Julia-JavaScript interface. It's quite fun using JavaScript and SVG to interactively render the accelerated BLAS and FFTW outputs from Julia.


Ooh, interesting!

Jupyter got rid of their direct javascript interface for security reasons that I don't agree with. Does Pluto.jl have a model where trusting display code is as easy as trusting the code being run?


What's the two language problem? One language for a high level interface and one language for low level, high performance computations?


exactly


https://yuri.is/not-julia/ is a good write-up of one person's opinion on the problems of Julia. I'm much less experienced with Julia but I somewhat agree. There's too much focus on "magic" instead of correctness for me to try building serious software using Julia. An amazing language in many aspects though.


Some of the response in the Julia community: https://discourse.julialang.org/t/discussion-on-why-i-no-lon...


The maintainers seem to have a very narrow focus on what they think their target audience is and everything outside of that gets ignored.


Also a lot of people left early on because their concerns and use cases were dismissed as "holding it wrong". Thus what's left of Julia audience is just the one they catered for.

The big selling point from the getgo was that Julia is general purpose. It's now over 10 years and it's still about as general purpose as MATLAB (although on a totally different level design-wise).

A big smell from the getgo was 1-based indexing. I appreciate that it's more familiar from some branches of math and physics (and MATLAB), and it's something I could live with, but more or less all general purpose languages index from zero. And there's a good reason for that. And that makes interoperability trickier.

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...


Lol sorry but if you read it, Dijkstra's argument for 0-based indexing is literally that it looks prettier... I really wonder if anyone who has posted that link ever read the actual argument and thought about it critically.


He gives various reasons and practical examples for it.


One can give analogous reasons (literally his main reason is it looks nicer lol) and practical examples that looks nicer in 1-based indexing too. It's a terrible argument that somehow got memed over way too long a time. Probably because no one needs to read the text, just post the link and you're done.


Julia doesn't even have upper-exclusive ranges — `0:0` has length 1. (The length-0 sequence starting at 0 is, naturally, `0:-1`.) It fails on both counts!


I think the julia team is still relatively small. So it might be was they have to do to make meaningful progress?


Julia's window has probably passed. Pytorch, JAX, Numba etc are eating its lunch. Which is sad because they are all pretty hacky stuff compared to what Julia could be.


JAX is the least hacky of them and my personal favorite, but ultimately I have to agree. I'm a huge Julia stan but they simply don't have the institutional support that Meta and Google can provide.


Sadly i think you maybe right. With support of one the big 3 (meta,google or MSFT),it will be hard for Julia to create and sustain significant mind share. Wich is really a shame because from technical design standpoint Julia is just brilliant.

Adding to that modular mojo, and i think we might have a repeat of DART vs typescript...


As much as I love proper foundation and language design, people must accept a certain dose of realism, 80/20 approach.


I honestly do not understand comments like this with regard to an open source project. The language and its libraries are rather hackable as is with rather powerful packages. While I may not agree with every decision, I do not complain that someone is not implementing my favorite feature. I just do it myself.

I frankly would be concerned if the main contributors were not focused.


>Sadly Julia is pain too if you don't do REPL/notebook (which you shouldn't)

Can you be more precise on this? In my case, I do a lot work around visual outputs (such as images). So notebook based development (such as Pluto.jl) feels perfectly fine.


> Can you be more precise on this?

Usage where you don't keep the Julia process running. E.g. "$ julia stuff.jl" form a shell.


Yes, this has been discussed a lot recently in the Julia community and some efforts have been made to make it happen, there will be more about this in the 1.11 release. Standalone binaries are next.


If this is fixed, I'll probably switch from Python to Julia. As a language Julia is vastly superior to Python. The complaint was dismissed for so long (and is still dismissed e.g. here in HN discussions) I gave up hope it will be fixed, but I'd be ecstatic to be proven wrong.

Do standalone binaries here include shared libraries (with a C ABI)? That would be a dream.


The one you need is this which is already merged but was after the 1.10 feature freeze so it has to wait till 1.11, though you can test it with nightly builds which is available on julialang site: https://github.com/JuliaLang/julia/pull/51435

Unfortunately, the core devs are not too chatty about standalone binaries, because of how Julia's internals are set there are going to be a lot of unforeseen challenges, so they are not trying to promise how things will be rather let's wait and see how things will turnout. Since packagecompiler.jl already has C ABI and one goal discussed about binaries being easily callable from other languages and vice versa, I would bet that it will have shared libraries.


> The one you need is this which is already merged but was after the 1.10 feature freeze so it has to wait till 1.11, though you can test it with nightly builds which is available on julialang site: https://github.com/JuliaLang/julia/pull/51435

The discussion seems to be too deep in Julia internals for me to follow. Is this about startup time or defining an entry point (or both?). I haven't had problems with Julia entrypoints (yet at least).

With the nightly `./julia-f7618602d4/bin/julia -e "using DynamicalSystems"` still takes over 5 seconds. Can I somehow define a main to make this faster or precompile more efficiently?

> Since packagecompiler.jl already has C ABI and one goal discussed about binaries being easily callable from other languages and vice versa, I would bet that it will have shared libraries.

Sounds promising. Shared libraries are not a musthave for me, but could allow Julia save us from C++ in more cases.


Thats for an entry point, you can search `Base.@main` to see a little summary of it. Later it will be able to be callable with `juliax` and `juliac` i.e. `~juliax test.jl` in shell.

DynamicalSystems looks like a heavy project. I don't think you can do much more on your own. There have been recent features in 1.10 that lets you just use the portion you need (just a weak dependency), and there is precompiletools.jl but these are on your side.

You can also look into https://github.com/dmolina/DaemonMode.jl for running a Julia process in the background and do your stuff in the shell without startup time until the standalone binaries are there.


The half-life of major Julia problems is impressively short.

Compiler latency ("time to first plot") used to be miserable but after a few releases with incremental improvements it feels mostly solved to me.

Just now on Friday at JuliaCon Local in Eindhoven one of the keynotes was about similar ongoing work on stand-alone binaries (including shared libraries to call like C/Fortran.)


You can create a shared library via PackageCompiler.jl today and `@cfunction`. It's just not as modular as it could be.

https://julialang.github.io/PackageCompiler.jl/stable/libs.h...


Uh man, there aren't Eindhoven videos online. What was the tdlr. ?


What's the problem when working without REPL/notebook? The debugger IDE support will be fixed eventually™ (I really think it will.. eventually...). What else is there?


Startup speed.


That's getting much better since they started caching compiled code. With more software (and hardware, and synergy-between-them) improvements coming, this will soon no longer be an issue.


Julia does have binary executable generation options, but most people only find this out after a few weeks. And yes, one can also call it as a library (.so or .a) from within C/C++ .

Julia is the first unique language I've been excited about in decades, as it can often outperform C, C++, and numpy in several use-cases. The only downside is until the 170MB+ lib is cached by the kernel, it gets panned by BS perf stats due to initial i/o constrained load times on some platforms.

Good luck, and have a great day =)


> C++ is a pain

Fortran would be less painful than C++, for math-heavy code.


Julia code is actually very Fortran-like (if Fortran was designed in the 21st century).

I did some Fortran in undergrad, and while it's pleasant to work with, if you have to do anything beyond numerical calculations you are basically screwed.


Yes, but Fortran is outright hell outside mathy stuff.

C++ has the features as a language to be almost Fortran level pain for math heavy. But the libraries tend to fall short. E.g. Eigen kind of does it, but it will just explode with a thousand line compile error when you least expect it. Also it's missing some quite elemental stuff like ND arrays. And getting code to work across even minor Eigen versions is a crapshoot.

Xtensor seems to have some potential, although it was too buggy to use back when I last tried it.

However, due to many fundamental design mistakes in C++ I'm not really expecting any code, including math heavy, to be very plesant or productive in it.


Why would you need Fortran for anything other than math? To put it differently, what’s the problem with just using 2 languages, e.g. Fortran plus Python with f2py?


Sometimes you want to write fast code for things that aren't purely numerical (e.g. any string processing such as CSV/Arrow/JSON etc) or things that are mostly numerical but benefit from abstractions (like generic algorithms so you can run programs in arbitrary precision or autodifferentiation). Fortran is pretty good for writing 3 loops over a double precision matrix.

Aside from "why not Fortran" the "why not 2 languages" is because moving your implimentation to a language few of your users know creates a big barrier between users and developers of your code. In Python, ~90% of users don't even know the language that the packages they use are written in which makes it a lot harder for them to become contributors. Using a single language means that as users learn how to use libraries they are also learning how to contribute to them in the future.


How much of that is down to C++ and how much of it is just ROOT? ;)


Little bit of column A, little bit of column B ;-)


Julia is great for numerical simulations. Companies such as Google and Facebook should have supported Julia for machine learning, in my opinion.

Numpy is OK, but verbose and a bit of a pain.


> Companies such as Google and Facebook should have supported Julia for machine learning, in my opinion.

Given that ML is becoming so important and Julia has ergonomic handling of numerical code, why don't they explicitly support it? Is it inertia?


From my point, there is still too much change around automatic differentiation libraries.

There is Zygote.jl, which is used in Flux.jl. However, it's more in maintenance mode. At some point, Diffractor.jl was hyped but it didn't take off yet. And then there is Enzyme.jl which people hype now.

But for me as a user, it's not clear what I should really do to make my code well differentiable for those libraries.

If you stick with torch, jax or tensorflow, everything seems to work better regarding AD.

[0]: https://github.com/FluxML/Flux.jl [1]: https://github.com/FluxML/Zygote.jl [2]: https://github.com/JuliaDiff/Diffractor.jl [3]: https://github.com/EnzymeAD/Enzyme.jl


The developers of Pytorch put out an article about this a few years ago and they admitted Julia would make sense from a pure ML perspective but the weight of the Python ecosystem was too critical.

I wonder if GPT-4 could be used to efficiently start porting more stuff over to Julia. I feel like Python is really a poor fit for deep learning. Julia is much more pleasant.


The only reason PyTorch exists is that Yann LeCunn couldn't convince people at Facebook to use the original Lua implementation.

So yeah, I'd be surprised if FB ever moved away from Python.


Note that Meta recently added a Julia ruleset to its buck2 prelude set. An important step toward working with Julia on prod environments.

https://github.com/facebook/buck2/tree/main/prelude/julia


They have a 100,000 Python, C++, and Java devs. My assumption is they don't care, i.e., inertia like you said.

The big mistake was Microsoft not having .NET cross-platform from the beginning and not pitching F# as a Python alternative.


> F# as a Python alternative

To play devil advocate, i think when F# was out (and still being developed), the Zeitgeist was really enamored with the idea of dynamic language, and the envisioned productivity gain from them. So F# was really too early to the dance.

Since then, things have change quite a bit with most major dynamic language adding a proto-type system.

Right now, i think we understand better what made language such as python and ruby so popular and are able to design better statically typed version with better tooling around them ( rust, zig, julia etc...)


Instead we got Tensorflow for ... Swift, which nobody asked for.


Apple probably asked and paid for it. Isn't there potential in accelerating LLMs and transformers on smartphones? Most of what they are good at fits the smartphone use case well.


Not at all, it was more related to well known Swift folks at Google.


You didn't look hard enough.

ML on CUDA is very much part of Julia already, and Meta seems into that lib as well. =)


Is it good? PyCUDA was so bad (as of 2018) that after 3 major bugfixes from me and no end in sight it still almost sank a major project until I gave up on it and went with native CUDA.


Anecdotally, the Julia approach certainly seems... nicer via e.g. KernelAbstractions etc. I don't know if the performance/flexibility is quite there vs. native CUDA, but going from zero to GPU kernel programming in Julia is as close to painless as I've ever seen, especially given how modular everything is (for example, you can use OffsetArrays.jl directly in GPU kernel code).


"Is it good?"

Compared to the CUDA dumpster fire at a cat food factory, it is often trivial and nearly transparent syntax for the users familiar with ML.

Really depends on the use-case =)

The conventional options are fairly well documented:

https://sciml.ai/

https://fluxml.ai/Flux.jl/stable/gpu/

https://github.com/SciML/DiffEqFlux.jl

https://github.com/alan-turing-institute/MLJ.jl


I tried using the latest stable Flux.jl with the latest stable Julia interpreter about a year ago, and most of it didn't work. Some of the simplest model usage examples led to a traceback with dozens of frames, ending in an incomprehensible internal error that I couldn't possibly understand as a beginner.

It seems to me that Julia is still moving too fast for its own good. At some point, something needs to stabilize for it to be worth adopting.


Use of Python in ML was bottom-up, not dictated by the CEO.


Numpy is not really "OK" for everyone, it is still classed as a functional compatibility layer for a fundamentally broken language paradigm.

People often conflate popularity with good design. Have a great Monday =)


Are there any good references for what a library like Numpy should look like?


Most of Erlang, Julia, and Go... As they tended to re-factor dependencies and trend toward a homogeneous distributed ecosystem.

Python is the core issue rather than specifically Numpy, as it was bodged on (SWIG) to try to fix real use-cases... much like how 30 years of bodged on GPU mailbox structures made people biased to implement ridiculous solutions that necessitated ridiculous software paradigms.

I don't think any one person has the will and resources to resolve the core problem. However, Julia will likely also initially bind the same nonsense for awhile due to compatibility needs, but users tend to re-factor nonsense out of the ecosystem with better native solutions over time.

"All software is terrible, but some of it is useful" =)


I am specifically looking for Numpy-like libraries and designs, not languages that have generally well-designed libraries. For example, Elixir has Nx.

It seems to me that, even given Python's constraints as a language, that a nicer wrapper could have been developed in Numpy that still called out to bindings to do the heavy work.

> functional compatibility layer for a fundamentally broken language paradigm

What did you mean by this? Is it mainly the fact that Python is a bit an undesigned language in general that calls into unmanaged languages like C/C++ and Fortran in its libraries?


"I am specifically looking for Numpy-like libraries and designs"

Why would one bottleneck a design with polyglot stacks even before a single line of code was implemented? This sounds like naive nonsense.

"What did you mean by this? ... Python"

Python was never designed to handle threads or parallelism properly, and has performance issues Numpy tries to address though its wrapped C/C++ libraries.

Python became 30 years of spiral development, and implodes into a new implementation every so often. Depending on the use-case it may prove appropriate, but never optimal. =)


I was wondering about libraries that do Numpy-like things, not libraries implemented like Numpy.


There are many native packages and wrapper scaffold libraries like Numpy. It will depend on your problem domain, but Julia often transparently supports its broadcast operator on most core data-structure math ("using LinearAlgebra" and the like).

https://gist.github.com/AlexanderFabisch/6343090 (small feature comparison list, but ignore the incorrect opinions)

https://sciml.ai/

https://fluxml.ai/Flux.jl/stable/gpu/

https://alan-turing-institute.github.io/MLJ.jl/dev/about_mlj...


Yes, the paper in the original post has a few examples. Julia uses expressions that are based on operators we use in regular arithmetic, even in plain Python. So, just ordinary operators instead of np.matmul, np.ling.expm, etc.


Matrix multiplication works with the @ operator, e.g. A@B. (Or with the * operator if defined as np.matrix, but nobody uses that and probably shouldn't). For matrix exponentials you need scipy.linalg.expm, but there is nothing in Python preventing why it couldn't be done with e.g. e**M if wanted (or even e^M if wanted but probably shouldn't). You can even implement it yourself in a few lines.

You don't seem to know much about what you're criticizing.


Lol Good luck working in a project where people have decided to create wrappers like you suggested.


For expm? The matmul as @ is numpy standard and used widely. If you have something that needs expm so much it needs to be an operator, good luck working in a project where people can't understand one-line class definition.


I simply refuse to work with code written by people who have had a hundred such harebrained ideas. No luck needed, but thanks!


Why is it any more or less harebrained for the language devs to overload an operator in the core?


Julia!


Won't there be too much cultural inertia for all the people who invested their brain into numpy ?


Physics (and more generally numerical simulations) is the killer app for Julia Language. No other language makes it as easy to work with arbitrary precision math. Whatever precision you need/want, just call setprecision(), Initialize variables/constants as BigInt/BitFloat, then do all your math just like you would in C.

Just remember to put the value in quotes when initializing:

BigFloat("1.0") NOT BigFloat(1.0). Without quotes it's silently initialized as double precision (!)


I don’t see that:

    julia> BigFloat(1.0) |> typeof
    BigFloat


The problem isn't the type, it's the bits. If you convert from Float64, you'll get garbage bits once you go past the bits determined by float64:

    julia> BigFloat(1.1)
    1.100000000000000088817841970012523233890533447265625
I'd recommend using the big string macro though isntead of converting explictly from a string:

    julia> big"1.1"
    1.100000000000000000000000000000000000000000000000000000000000000000000000000003


Oh, I see. Thank you.

(And, in Julia’s defense, this is all spelled out in the documentation, immediately accessible from the REPL. I just never read it carefully enough.)


> And, in Julia’s defense, this is all spelled out in the documentation, immediately accessible from the REPL. I just never read it carefully enough.

My concern is most physicists won't read that either and may not even notice the loss of precision until it's too late (after a paper has been published).


I wish the FEM ecosystem was a little more mature in Julia (particularly weak forms, their derivatives, and assembly). I'm confident about rewriting the other pieces of my research, but it took me a long time to be happy with FEniCS, and I'm not entirely sure if going through the same with Gridap is worthwhile. Despite being fairly happy with Python, I've used a lot of Julia before settling into this field and would be happy to use it again.


I've been doing economics research in Julia for a couple of months. The scientific community surrounding it is fairly rich, and there's a lot of very well-written packages for performance.


In academia it totally makes sense. If you have to do numerical work a d don't have to integrate into some legacy code base it is totally the way to go


Can you call Julia routines from other languages? Otherwise you sort of have to go all in...


Yes, julia can be called from other languages rather easily, Julia functions can be exposed and called with a C-like ABI [1], and then there's also various dedicated packages for languages like Python[2], R[3], C++[4], Rust[5] and more to call Julia code.

With PackageCompiler.jl [6] you can even make AOT compiled standalone binaries, though these are rather large. They've shrunk a fair amount in recent releases, but they're still a lot of low hanging fruit to make the compiled binaries smaller, and some manual work you can do like removing LLVM and filtering stdlibs when they're not needed.

Work is also happening on a more stable / mature system that acts like StaticCompiler.jl [7] except provided by the base language and people who are more experienced in the compiler (i.e. not a janky prototype)

[1] https://docs.julialang.org/en/v1/manual/embedding/

[2] https://pypi.org/project/juliacall/

[3] https://www.rdocumentation.org/packages/JuliaCall/

[4] https://github.com/Clemapfel/jluna

[5] https://github.com/Taaitaaiger/jlrs

[6] https://github.com/JuliaLang/PackageCompiler.jl

[7] https://github.com/tshort/StaticCompiler.jl


> Yes, julia can be called from other languages rather easily

This seems false to me. StaticCompiler.jl [1] puts in their limitations that "GC-tracked allocations and global variables do not work with compile_executable or compile_shlib. This has some interesting consequences, including that all functions within the function you want to compile must either be inlined or return only native types (otherwise Julia would have to allocate a place to put the results, which will fail)." In practice, this means that you can therefore not use the base library in your own programs. PackageCompiler.jl [2] has the same limitations if I'm not mistaken. So then you have to fall back to distributing the Julia "binary" with a full Julia runtime, which is pretty heavy. There are some packages, such as PySR [3], which do this. It seems pretty usable for research I'd say, but difficult to put in production.

There is some word going around though that there is an even better static compiler in the making, but as long as that one is not publicly available I'd say that Julia cannot easily be called from other languages.

[1]: https://github.com/tshort/StaticCompiler.jl

[2]: https://github.com/JuliaLang/PackageCompiler.jl

[3]: https://github.com/MilesCranmer/PySR


I think you misunderstood me. Nothing about calling julia from other languages requires StaticCompiler.jl, that's just an experimental proof of concept for small, standalone binary generation with some heavy limitations.

Calling julia from another language does not in general require AOT compilation, though it does help to make things more portable and self contained. If one wants, they can literally just spawn a julia process, define a function on the fly and then hook into that process and call that function from another language.

> PackageCompiler.jl [2] has the same limitations if I'm not mistaken

You are mistaken. PackageCompiler works by basically bundling an entire julia system image with your program and the whole compiler, runtime, and stdlib compiled into one bundle together. If a piece of code works in the REPL it'll work from PackageCompiler


> If one wants, they can literally just spawn a julia process, define a function on the fly and then hook into that process and call that function from another language.

Okay but how do you distribute the program that depends on Julia to clients? You then also need to ensure that they have the right Julia available on their system. That's much heavier than dynamic binaries (which rely only on some shared libraries). My point was that it's not always "easy" to call Julia. That it's easy in HPC is a reasonable statement since you probably have full control over the system and it has loads of RAM available, but I wouldn't call it easy in general.

> You are mistaken.

I guess we're both right. I was talking about the "libraries" functionality [1].

[1]: https://julialang.github.io/PackageCompiler.jl/dev/libs.html


> Okay but how do you distribute the program that depends on Julia to clients?

That's not the question that was asked though. The question the person asked as "Can you call Julia routines from other languages?" and the answer to that is "yes, you can easily do that".

__________________________________________________

> I guess we're both right. I was talking about the "libraries" functionality

I'm confused, what functionality exactly are you saying is missing from the libraries functionality? That very example you link to shows one using the julia runtime via printing.

And yes, dynamic dispatch and dynamic code generation do work from libraries created by PackageCompiler.jl as well...


I guess the problem is that if you create a shared-library, yes, you can call it from Python, but you are now restricted to C datatypes, and don't get any of the type-conversion or type-proxying that the other bridges provide --- as far as I can see.

That said, I think the bridges are actually pretty developed compared to what exists between many pairs of high-level languages. They are clearly working for some people since you will find Python-wrapper packages for Julia code. I'm not so sure it's so important to have a single `.so` package. There are solutions which allow Python to be-in-charge and bundle Julia to various extents. There are at least:

1. https://github.com/jlapeyre/julia_project

2. Usage of Conda and https://github.com/JuliaPy/pyjuliapkg


I would quibble with your use of the word "easily", but I consider it amazing that a language with such a complex runtime manages to expose an embeddable interface at all. Julia is an incredible language.


So you can embed it in the same way you can embed e.g. python or octave. This is fine for some uses, but it is a bit invasive. How does it interact with multiple threads? That's a huge pain point in embedding python code...

edit: just found https://docs.julialang.org/en/v1/manual/embedding/#Thread-sa...

Ok, so, it's tricky, but at least you can start an interpreter per thread...


You can also run multi-threaded Julia from external threads. Julia unlike python doesn't have a GIL (or similar), and supports native multi-threading. As of 1.10 (I think, it might be 1.9), you can also add threads after startup (which is necessary for interfacing with spark or similar systems).


Yep that's right. Again though, PackageCompiler.jl does work quite well for creating .so / .dylib / etc shared libraries from julia code for more traditional library-style interaction, and is quite mature and widely used.

It produces quite big binaries, but the more involved and complicated your application is, the less that matters.


Going all in is precisely the point.

The biggest selling point of Julia is that it's as easy to write as python if you don't care about performance, and if you do care about performance it's as fast as C++ if you can just follow some simple principles. This actually works in practice. For my day job I write the slow path exactly as I would python, and that's a really fast way to write code.

If you want a slow language that can call fast code, you already have python.


Biggest selling point of Julia is that you're locked in to Julia if you use it?

For me this is the biggest point for giving up on Julia. Julia has a lot going for it in the basic design and I was really excited about it in the beginning.

But even after all these years the environment is sadly almost unusable if you want to do something saner than REPL/notebook.

The TTFP is too damn high and I don't think it will ever be fixed.


> environment is sadly almost unusable if you want to do something saner than REPL/notebook

I work in "one of those industries" where failure can easily cost millions in the space of minutes, and we use Julia almost exclusively. Unfortunately I can't give you details because of reasons, other than tell you you're very poorly informed.

> you're locked in to Julia

You're not "locked in" to Julia any more you're locked in to Python. Arguably less, as Julia can call C natively, Python can't.

My point isn't that you can't interop, and it seems you're intentionally (?) equivocating on this point. Surely you didn't misinterpret what I wrote as "you can't iterop"? My point is that you can more benefits if you don't. Like I said, if you want a slow language that can call fast routines, why not just use Python.


>I work in "one of those industries" where failure can easily cost millions in the space of minutes, and we use Julia almost exclusively. Unfortunately I can't give you details because of reasons, other than tell you you're very poorly informed.

I vaguely recall a post from an ex Julia dev who had raised some issues regarding correctness of computations in Julia; I am not very familiar with the situation, have these concerns been addressed?

Edit: now that I recall better, I think the correctness concerns had to do with ambiguous composition of disparate components due to vague interfaces. I think this has to do with multiple dispatch in Julia, but I don't know details. I would love any info about this.


My impression is that this comes down to expectations.

You might have a function:

foo(x,y) = x + y

and multiple dispatch means you can call that with integers, or floats, or arrays, or GPU-resident arrays, or automatically-differentiating numbers, or symbolic algebra terms, or...

Just because you can, does that mean you should? It depends...


I think it was more like you have a function over an abstract type like:

foo(x::Number) = x+1

And then someone else creates a new type thats a subtype of Number. And they run foo(x) and get errors or unexpected output.

Problem is the new type they created doesn't follow the assumptions that foo expects. Throw multiple dispatch into the mix and it gets even harder.


> foo(x::Number) = x+1

> Problem is the new type they created doesn't follow the assumptions that foo expects.

If you define that function and someone else (i.e. a user of your lib) defines a subtype of Number and calls your function on it, and it fails, then they haven't respected the interface to Number. There's nothing wrong with that function or type specialization in that case. In that case it's all about defining useful interfaces and respecting, as much of programming is.

E.g. if someone defines a subtype of Number because he has something that's kind of like a number in some respects but not others, maybe that shouldn't be an subtype of Number.

You also shouldn't define functions that have obscure conditions for being properly useable - at least if you expect your code to be useful more broadly. Nothing here is specific to Julia though.

If you have other specific examples we could discuss more.


    using Unitful: m; foo(2.0m)
With the above definition, this will give DimensionError: 2.0 m and 1.0 are not dimensionally compatible.

Probably this means you should define `foo(x::Number) = x + oneunit(x)` to respect the Number interface. But this interface isn't very strictly defined. I believe `Base.oneunit` was added after someone started writing the Unitful package -- building something useful in a legal grey area, and formalising later?


Interesting case! Lets discuss.

I don't know/use Unitful, but if that function didn't fail it's because the guys who wrote Unitful defined a promotion rule from Int to their Unit thingy. So.... that's how their type works. Don't use it. Or better, open an issue on github, they might have an explanation that's escaping us. I suspect it has to do with their mental model of what Unitful is supposed to achieve.

Let me tell you that my intuition agrees with you. As an ex-physicist, if I was designing a lib called Unitful I wouldn't let you sum sum 1 meter plus 1 unitless thing.

EDIT:

Actually, I just tried running your code and I do get DimensionError

    foo(2.0m)
    ERROR: DimensionError: 2.0 m and 1.0 are not dimensionally compatible.
I'm guessing you have some seriously outdated versions of something?

EDIT2: Sorry I misread you. You do get an error. Ok I see your point. Maybe the Julia docs should more explicit about what the Number interface entails. Is `+ 1` allowed? You're assuming the answer is obvious, but it's not to me. In particular, that's not generic at all. You're probably right about `+ oneunit(T)`


> I think the correctness concerns had to do with ambiguous composition of disparate components due to vague interfaces. I think this has to do with multiple dispatch in Julia

You'd have to be more specific. Currently ambiguous dispatch throws an exception. I don't think this is a recent change either.


Yes.


Python can call C natively. Also Python programs can be used as CLI tools interacting with practically any language. Julia programs in practice can't.

But I know from experience that this debate will not go anywhere. It's is the same old refusal to see the problem: "you're using it wrong".

Julia is dying and this is why it will die.


Yes, this is so true. There is a lot of wild attacking of the questioners of Julia.


HFT? :D

Which country are you in?


>The TTFP is too damn high and I don't think it will ever be fixed.

I just installed and ran on my macbook - took me 7 minutes to get it installed and run through to a first plot, most of that was installing and compiling the Plots library.

Once it's installed and available I can run it in about 1 second and then execute

using Plots

function test_plot()

x = range(0, 10, length=100)

y = sin.(x)

plot(x, y)

end

@time test_plot()

which gives me a result of 0.048700s and a nice plot.

I don't think this is the defining issue with Julia now - it definitely used to be 10 years ago - but now it's more than competitive with Python in this regard.

The issue is really support and buy in for all the new cool libraries that folks make available to use.


That's not TTFP. TTFP is putting that code into a file (e.g. ttfp.jl) and running:

$ time ./julia-1.9.4/bin/julia ttfp.jl

0.325806 seconds (202.80 k allocations: 13.208 MiB, 84.23% compilation time)

real 0m2.401s user 0m2.357s sys 0m0.553s

I.e. the TTFP is 2.4 seconds, not 0.33 seconds.

That said, 2.4s is huge progress (e.g. with Julia 1.6.0 it's 10 seconds for me) and acceptable for most simple plotting.

But just adding e.g. "using DynamicalSystems" brings this to 14 seconds, which is not acceptable for iterative editing. And it makes any Julia program using "DynamicalSystems" wildly impractical to use in a pipeline with other programs.


Also for the record, 1.10 rc1 brings the time down to 1.1 seconds (from 1.9 seconds in version 1.9 on my laptop). There's still more work to do, but there's been a ton of progress on this in the last year or so.


No point having this conversation. These people are still repeating memes from 5 years ago. I've met one of them in the wild, it was quite eye opening.


You are probably having a different conversation. This is my conversation:

$ time ./julia-1.10.0-rc2/bin/julia -e "using Plots; using DynamicalSystems;"

0m5.433s

This is a different conversation from

julia> @time plot(x, y)

  0.000465 seconds (484 allocations: 45.992 KiB)
But I agree that the conversation is pointless. The TTFX will never be fixed for Julia to be used with sane workflows because the few still using Julia don't want to accept the problem. But the flamewars are fun anyhow and maybe they prevent the next Julia from making the same mistakes.


I think this conversation is absolutely worth having. It's great to see that 1.10 is ~3x faster here, and there's probably another ~2 to 3x improvement or so from profiling DynamicalSystems to see where the time is going. That said, for sub second startup time with lots of packages, I highly recommend using PackageCompiler to make a custom system image which brings the using time down to roughly 0.


Is shaving seconds-per-package import a feasible way to get the startup time fast enough for script-type workflows? Before long you end up importing 10 packages and it will be too slow even if each package is 1s. I appreciate that the devs seem to want to do it "the right way" but that seems have taken quite a while.

Wouldn't something like slow interpreter option for "glue code" be feasible? Most code doesn't need to be fast, and waiting for the "hot functions" to compile on change wouldn't be too bad. And you could still AOT for maximum speed when needed. This is essentially how Javascript JITs manage to start up so fast.

Another fine "hack" for many cases would be to keep the Julia process running (like a daemon) but making sure the state is clean when a script is run on it.

Something like PackageCompiler is workable, at least for the short term. Long compiles are OK if they don't have to be done all the time. I indeed manage to get < 1s TTFP with DynamicalSystems with it (last time I tried a while ago I didn't manage). The UX for creating sysimages could be nicer, but doesn't seem anything that a quick script can't fix. Maybe I'll give Julia another go in my next analysis.

From what I gather there's now a new approach for caching the compilation results. I really hope it will succeed, but I'm a bit jaded from the many "Julia 1.x fixes the TTFX" news.


The next step is to enable more post-invalidation precompile for Plots. This is something we recently setup with Symbolics.jl since that is another package which has some "invalidation by design" patterns, where certain assumptions made in the compiler (such as (x==y)::Bool) are invalidated by symbolic numbers acting differently than a lot of numbers and thus some over-eager optimizations require being removed upon using. PrecompileTools.@recompile_invalidations is a recent (v1.9 era) tool that allows for forcing the compile at package precompile time to consider the environment post-invalidation and thus re-precompile, effectively reducing the TTFX in these cases where invalidation is always going to occur by design. Plots.jl is a case like this because its recipe system is effectively allowing people to extend the plotting pipeline well after Plots is loaded, which is its main killer feature but effectively means that one should expect invalidations from it. I think that recompilation post invalidation should make a strong dent in that and plan to see if that's the case in a few months when I get the time to dive in.

> Wouldn't something like slow interpreter option for "glue code" be feasible? Most code doesn't need to be fast, and waiting for the "hot functions" to compile on change wouldn't be too bad. And you could still AOT for maximum speed when needed. This is essentially how Javascript JITs manage to start up so fast.

That is something I've been meaning to investigate more. A lot of the Plots.jl code is internally dynamic because of its design to hold everything in a Dict{Any} and recurse the plot recipes. In other words, it doesn't even benefit from having the JIT at all because internally everything is uninferred boxed variables. There's probably a good way to just do "please interpret the calls in this part of the pipeline" and not see any runtime difference but chop out the core of the compilation. It's relatively straightforward to do, it's a one liner using JuliaInterpreter.jl @interpret, and I think one dig that results in a win would be a nice example of what should be a pattern we do more often in the near future.


I did briefly look at JuliaInterpreter.jl and if something like that can be used for general purpose, the TTFX would be fixed, the complaints would stop and I'd wager Julia would gain a lot in popularity. I really hope it happens.

For now the interpreter seems to focus on debugging, and if I understand correctly, it can't be really used to e.g. speed up imports or to specify hot code to be AOT'd/JITed?


Ahh I getcha a bit. What is the usecase though - which script-type workflows? (genuine curiosity, not just asking!)


Typically modeling timeseries data and comparing it to data collected from humans (e.g. car telemetry, motion tracking or eye tracking signals). The models are stochastic dynamical systems which have to be optimized against the data. It's more or less exactly in Julia's niche.

The script typically loads the data files and runs simulations (a lot of times when their parameters are optimized). This means the models have to be fast, but they are also recurrent in nature so they can't be vectorized to numpy. So depending on the case it's usually numba, cython or C++, all of which can be quite painful.


So what you are saying is that you would like a process that was live - with everything loaded, and ready to respond to any code that you wanted it to run, as opposed to having to bring up a new Juila process for each step in the pipeline.

Can't you compile the pipeline into a single script and then run it in a single instance of whatever?


I want the opposite. To have different processes for each step. Potentially the different steps being in different languages.

Or at least semantics of that. The steps should be independently callable and each call should result in the same output with same parameters, i.e. the state would reset for each run. I don't care so much how this is exactly implemented as long as it fullfills these.

The data doesn't need to stay in the process memory. It's can be e.g. mmapped and is cached by the OS anyway. And deserializing tens or even hundreds of megabytes takes usually less time than Julia takes to import Plots.

This is the thing that isn't feasible with Julia due to the TTFP (each step incurs the startup latency).


You're building pipelines for someone else. I'm using the pipeline someone else built.


I'm building pipelines for myself to use. The result is often pipelines that other people can use too. It's all just code, there's nothing magical in building pipelines.


I would love to see something you've built.

Do you have anything that you've shared publicly?


Here's an example: https://github.com/jampekka/vddfit

Codewise it's not pretty, especially the data mangling, but other people have managed to use it for their own purposes.

The model had to be implemented in C++ and wrapped for python bindings. It's a pain but sadly less painful than doing it with Julia's startup time.


Blast from the past from my days in the academia.

Ok thanks for sharing.

(I still disagree about everything you said about Julia though. I bet you I could re-implement this faster in Julia thank your C++, and in a type generic way too which would make it trivial for users to plug in their types, and it would be trivial to package - I'm assuming you didn't package that repo, you just expect your users to be shuffling files around. You're not contributing to improving the reproducibility situation in the academia, let me tell you xD)


So the handful of _Julia users_ with their totally unreproducible REPL mess can "trivially" plug in their types? Also saying the horrible mess that is Julia's module system is "trivial to package" is delusional.

I didn't package it and the data mangling part has very little use outside this specific experimental design. Also Python's packaging is quite shitty.

The C++ code is totally standalone and can be git-pulled and compiled how you like. C++ packaging is totally abysmal.

As I said in the comment and the README the code is a mess (although a piece of art compared to the notebook shit out there). It's purpose is that the results can be reproduced from the raw data.

I would love to make my analysis code cleaner. But there's zero incentive for that and very little time. Nobody sadly cares about the code, and most scientists can't code for shit. This is probably why Julia was designed so that it forces your code to be shit.

Sorry but the tone, but you kinda set it.


I mean DynamicalSystems is big - 183 subpackages... it doesn't surprise me that it requires some time to pull it in and make it available to other code.. what's the design choice that they should have made that would avoid this?


It can be improved though. Part of the issue is my fault and I plan to redo the DiffEq solver defaulting mechanism in order to reduce the amount that needs to precompile. The design of how to do it is already pretty clear and it's mostly about having the time to do the grunt work. I hoped to get it up last month but had a bit too much travel, but it shouldn't go beyond January. After that's up I plan to reassess some of the profiles for what the remaining pieces are and post some future plans for specifically this part of the ecosystem, but for now the ball is in my court and I need to make some improvements here.


E.g. javascript can have thousands of subpackages/dependencies and it starts up instantly. Scipy is huge but importing it is practically instant. "Classic" AOT can take ages to compile, but it doesn't have to compile on every run.

JS does this by multi-phase JIT. Python by being dog slow in the execution and AOT by being dog slow in the compilation.

I think Julia's "JIT-AOT" may be a good approach (and perhaps almost necessary for Julia's dispatch) too but it seems to be very hard to make start up fast.


So I tried this after I installed julia on my system (Fedora) and installed the Plots package. It took about a second or so, not too bad, slower than ROOT or matplotlib but not too different from octave... but the default backend is not interactively zoomable at all as far as I can tell? Ok, I can do Pkg.add("PyPlot") to use matplotlib, but now this takes an extra ~10 seconds? It would probably have been faster to mmap a shared file, dump the data there, then, fork a new python process and plot it... maybe I try the plotly backend (which doesn't need any new packages, that's nice), but now it opens in my browser? By now my .julia directory has balloned to 2.8 GB (!!!) and I'm bored of this test and go back to making my quick plots in ROOT :).

edit: oops, maybe PythonPlot is what i should use, not PyPlot. But when I try that I see: ┌ Warning: `PythonPlot` 1.0.3 is not compatible with this version of `Plots`. The declared compatibility is 1 - 1.0.2.[ and doesn't seem to be any faster


What do you mean by TTFP? Is this like "time to first byte"? If so, what is the "P"?

In any case, if the complaint is startup time, I do agree that this has been a bit of a thorn in the side. It also interacts poorly with a (in my view) pretty immature ecosystem for running services (for instance, the grpc library doesn't seem to be widely used or heavily maintained) or interacting with it via IPC. So it's great if you can spin up a notebook or repl and leave it running all day doing your work, or if you have a long-running batch process, but I agree that I've found it a bit tricky to interact with it from the outside world in the "usual" ways.

Having said that, it does work - for instance, a system I work on spins up a long-lived julia thread from within a python service using the very nice shared library, and calls into julia functionality through that thread - I've just found it to be tricky; but IMO still very much worth it.


> What do you mean by TTFP?

Time To First Plot. It was a meme back then that you needed to wait several minutes to draw just a simple line plot after starting up a Julia REPL/notebook because of compilation times (whereas in Python you're done in seconds with matplotlib.) Seems this has been reduced in Julia 1.9 with automatic pre-compilation of packages, but plotting still isn't "instant" as you would expect in scripting languages like Python.


For many complaints TTFP is not about REPL or notebook. It's about how long it takes for a new Julia process to do the plot.

TTFP is (close to) solved. 1.9.0 TTFP is under 3 seconds for me, which I find about acceptable for script-type workflow. However, time-to-first-DynamicalSystems is still around 10 seconds, which is not.


Loading new Julia processes takes long because the standard library is rather large. One big part of v1.10 has been the excision of more standard libraries out of the the core image. This required some of the v1.9 and v1.10 improvements because things like `using Pkg` was practically instant as a result of Pkg being in the system image, and thus moving these out of the base system image without a major regression required all of the package binaries infrastructure. The ecosystem is now getting ready for this change with recently all packages now requiring the versioning of standard libraries, standard libraries are now versioned independent of Julia and can do their own releases, so they are very close to be standalone. I'm not sure the progress with this in v1.11 but I think the intention was fto make it so that things like Pkg do not load with the start of Julia. The big elephant in the room there is then for BLAS to not load until `using LinearAlgebra`. The majority of the startup time is just spinning up BLAS threads the last time I checked, and with all of these other changes in place we should be ready to test this removal.


Useful info!! This is sort of a lazy-web thing, but do you know where this work is being organized or how one might start contributing to it?


I don't think there's a central organizing place, the work is happening across several Github issues and PRs. There's the Internals subforum [1] on Discourse if you have questions or need guidance, and there's a #internals channel on the Slack too.

[1] https://discourse.julialang.org/c/dev/5


It seems instant to me - the package compile on first invocation is rather like installing with pip or conda, and then after that I see the plot in about 1/20th second.


The REPL starts up instantly and the first plot is < 1s. People seem to be repeating complaints (valid at the time) that they overheard from three years ago.


This is not the complaint. The complaint is that running "$ julia someplotting.jl" takes a long time (actually with 1.9.0 it's reasonable at < 3s, but adding other libraries makes it easily tens of seconds or even minutes).

Everybody doesn't use the REPL. In my, I think quite well founded, opinion nobody should use the REPL for more than one or two lines.


For what it's worth, as someone who shares a sense that "scientific" programmers have a tendency to over-use notebooks and REPLs, I find your perspective on it to be too extreme in the other direction! It's perfectly possible to incorporate interactive tools into a good software development process.


I don't mean that running in Bash should be THE workflow. But I think it should be supported and it's used widely and lack of support for it is a major reason why Julia is not taking off. I'm not proposing taking REPL/stateful-notebook away from those who want to use it (although I'll advise them against it if asked).

I don't find my plt.plot(x,y); plt.show() in a script ideal either. Something like Pluto.jl (given it's easy to modularize and preferably can be run without a notebook environment) is most likely better for a typical data analysis workflow (and will likely lead to more reproducible results and better tracking of result provenance).


But there isn't "lack of support" for running cli scripts, there's just a few seconds of startup time (or less than that; it depends a lot on what dependencies the project has, in my experience). Like, I don't want to diminish that this amount of latency matters, but I just feel like you're being hyperbolic. I've used tools with more startup time for most of my career - the jvm, ruby when running any non-trivial rails app, essentially every ahead-of-time compiled language, etc. - it's not a good thing, and all those tools have invested massively in improving their startup times because of that, but I just really don't see how it can be considered a deal-breaker... (And for what it's worth, most non-trivial python codebases I work with also spend a ton of time starting up, though this can be fixed with investment into it.)


Especially for data analysis the latency is crucial. I run the code practically on every time I change a line. It's not that different from how people run REPLs, but I just keep the lines in files that I run. Would people be just fine if each REPL line would have tens of seconds of latency overhead?

For different cases, e.g. application development, the latency is not that critical. Although I find it quite insane that we accept so slow code iterations this day and age.


Again, I'm not trying to downplay how much it matters, I just think you're being pretty hyperbolic about it! You've said stuff like it's "impossible" and "not supported", and I think these are just way stronger than "very inconvenient" or "extremely frustrating". Maybe it's a minor point, I dunno, but like, if we were talking about a five minute compile cycle or a language that doesn't even have a way to run a script at the command line, then sure, I'd agree with what you're saying, but as it is, it just feels a bit like a tempest in a teapot to me...


Indeed it's something in the middle. For data analysis, many people are used to using RStudio and just keeping a running REPL going, and Julia works quite well for these kinds of tasks in the same mode because that effectively removes the startup time from being a concern. However, there are some folks that are used to other workflows and we could grow the audience by improving the ability to support such workflows better.


For what it's worth, my workflow is actually much closer to the parent's; I have a python batch job that spins up a julia runtime in a thread at startup and waits on a `Pkg.instantiate()` of a project with pretty substantial dependencies before it can start sending method calls into it. I think it takes like 3-5 seconds to start up. I badly want it to be a lot faster than that, but it's also fine. I'm not happy with it, but if I sat here thinking about it for a few minutes, I'm certain I could list a hundred things in our system that annoy me more than this.


That's probably something close to DaemonMode.jl?

If the problem was just running a daemon, it would be an inconvenience at worst. The problem is that changes to the code tend to trigger a recompile, can't be made (e.g. structs can't be redefined if I recall correctly) and most crucially the session can get "silently" into some unexpected confused state.

If you don't need to change the code (and it's structured well enough not to hold a state between calls), then running in a process is no problem. But for that you can do just plain old AOT.

Aside from my personal gripes, which I have probably aired here more than enough for everybody, I think the design that more or less forces REPL/notebook style is detrimental to quality of scientific code. Scientists don't learn to modularize their code and don't learn to understand the basic control flow. I see this all the time with my colleagues and the "pedagogical problem" of notebooks/REPL is very common when I teach programming. E.g. a common problem is that students don't understand variable assignment because with REPL/notebook it doesn't necessarily follow the normal program flow.

I'm kind of amazed how little the scientific programming scene is concerned about this. From software development perspective (I'm an ex software developer turned scientist) having clearly defined state transitions, independent modules of code and the code doing what it says are really really fundamental things, without which you have no chance in hell in making anything more than a few lines to not turn into an unmanageable and buggy mess.

Maybe there's a (mis)conception that scientific code is somehow fundamental different beast than "general programming". But it's not. It's just a relic of having bad systems languages like C and C++ and bad special purpose languages like MATLAB and R. That Python can be and is used for both is an existence proof that there is no such divide. But we need a better Python for both general and scientific programming.


> Aside from my personal gripes, which I have probably aired here more than enough for everybody, I think the design that more or less forces REPL/notebook style is detrimental to quality of scientific code.

So, two things here: First of all, I have greatly appreciated the discussion, so don't feel like you've over-aired your thoughts!

But also, here's this, in my view, hyperbole again, I just don't see how "more or less forces" is the right way to describe the state of choosing whether or not to use a notebook/REPL style for julia. It works perfectly fine to develop with a different style. There's a little bit of startup time, but it's just not nearly a big enough deal to say it "forces" a different development process.

I think your criticisms of notebook driven programming make sense, but are also slightly overblown; in my experience, scientists are perfectly capable of writing good modular code, despite preferring to work in notebooks most of the time. It's not not a problem, I just don't think it's been a very big problem, in my experience.

But I very much agree with your last two paragraphs. I just don't think notebooks / REPLs are a big part of the reason things have ended up this way. I think it's more of a cultural thing that gets passed down from generation to generation, and it has to do with respect; I think scientists have never taught one another that code deserves respect and professionalism. I think they have always thought of it as "some throwaway stuff that generates the data for the paper"; the paper is the thing deserving of respect, not the tools used to create it.

But I think this has been slowly changing as more and more scientists are digitally native and learn increasingly more software skills as a first class concern.


Agree that I may be hyperbolical at times and this can be detrimental to fruitful discussion. I'll be more mindful of it, thanks!

More accurate is that the REPL/notebook style is currently so much more convenient to get started and to "explore" with that people will use that and it's what all Julia examples etc teach. Pluto.jl may be a good "transition" from this though, although I'm afraid modularization is still too inconvenient with it.

The script-workflow latency is also a reason why "scripters" don't switch to Julia even though they otherwise would.

So while the it's not practically impossible as I hyperboled, it is IMHO a blocker for wider Julia adoption and a transition to better programming practices.

In my experience in both teaching and helping working scientists with their code problems I think the current situation that REPL/notebook is so much easier to get started with is at least a major reason why better practices don't get adopted.

Scientists themselves doing any programming is quite a recent thing. Usually it was "lab engineers" doing "the coding" and data mangling and the scientist doing "the analysis" in something like SPSS or Excel or copypasted R oneliners.

It's also really understandable why scientists don't make much effort to learn and enforce good programming. It's a relatively minor part of their job, they manage to scrape much of the analysis together with poor practices, and when it almost inevitably explodes as the analysis gets more complicated, someone "more technical" gets called to fix the mess (I do this all the time).

In most languages modularizing code is made needlessly hard, and I'm smelling something like this in Julia module system (and in e.g. Python packaging). I sometimes wonder if there's some (unconscious) gatekeeping to keep "the using" and "the programming" (and "the core development") apart artificially. You actually sometimes encounter explicit statements like this (Linus Torvalds famously about C++ but in this post's discussion something like that not knowing R means that you shouldn't use ML (which is extra bizarre because very very few actually know R)).

The benefits of modularization are also not easy to foresee until you learn from many hard lessons of ending up with unmanageable sphagetti or analysis that is wrong but it may be even impossible to pin out why due to lack of reproducible code. And if you don't know about the ways of avoiding/mitigating this, you're not really equipped with learning the lesson.

When this compounds with significant hurdles to do "the right thing" I find it almost inevitable that "the right thing" will not get adopted.

As a sidenote I find it also quite odd why so little effort is done to "show the work" with data analysis when it's literally required when you do it in math class. I think they are almost the same thing. Maybe it stems from the computer seen as a calculator and "the work" is done on paper, but this is mostly due cultural lag from the era where computers were in practice calculators.


Time to first plot (this is the time you have to wait for any plot if you don't do REPL/notebook in analyses). It's a commonly used term in Julia discussions.

Nowadays also called TTFX where X can refer to variety of things.


Ah thanks! I'm relatively new to Julia and don't frequent discussion forums (yet?) and didn't have much luck searching for it. I figured out that it was an X in TTFX, but thought maybe, "print"?

In any case, yep, makes sense, thanks!

I guess for the typical scientific experimentation with plotting use case, why not just use the repl / notebook approach that (I think?) doesn't really suffer from this issue, as it's already up and running? My experience with startup time being problematic is when trying to "productionize" things that are not within a scientific / plotting loop. But my impression has been that it works well when doing things that are. Am I missing it?


The main problem with REPL/notebook approach is that it has an "invisible state". E.g. running the cells in different order or running the same line in REPL twice can cause different behavior. This leads to major issues in e.g. reproducibility, easily introduces bugs that even leave no trace and generally it's very much against very core software development principles. Pluto.jl reportedly solves this, gotta try it out.

My scientific/plotting loop runs from Bash (something not solved by Pluto.jl). This allows me to interoperate with a lot of software that's not in Python. I can version the code with e.g. git and the analysis pipeline tends to get refactored into libraries and CLI tools in the process. With Julia this is in practice impossible.


I definitely don't think "impossible" seems like the right word. I do buy that it might be frustratingly slow to the point of not being worthwhile.

I do wonder if Pluto.jl might solve a lot of these issues for you though. I haven't really used it much, but it seems like it works with just plain .jl files, so you can non-jankily version them (unlike .ipynb), and extract libraries and refactor in all the normal ways.

My frustrations trying to do normal-software-development-principles with julia are actually more about the convention the community has around structuring files and modules, which just hasn't clicked for me at all and is a frequent source of confusion. (The testing framework / layout / something about it also hasn't clicked for me yet either, and I wonder if it just isn't great and that's why people don't seem to write many tests...)

But there's also just so much great stuff about the language that I'm not even really mad about any of this :)


> I definitely don't think "impossible" seems like the right word. I do buy that it might be frustratingly slow to the point of not being worthwhile.

That is more accurate.

> I do wonder if Pluto.jl might solve a lot of these issues for you though. I haven't really used it much, but it seems like it works with just plain .jl files, so you can non-jankily version them (unlike .ipynb), and extract libraries and refactor in all the normal ways.

It may solve enough issues at least for some cases. I'll check it out.

> My frustrations trying to do normal-software-development-principles with julia are actually more about the convention the community has around structuring files and modules, which just hasn't clicked for me at all and is a frequent source of confusion. (The testing framework / layout / something about it also hasn't clicked for me yet either, and I wonder if it just isn't great and that's why people don't seem to write many tests...)

From what I remember from the last time the module system was indeed quite a pain. The using/include/use seems like a total mess, and that IIRC Stefan Karpinski dismissed files-as-modules solely on the reason that Python does it that way did not really bring my hopes up.

I appreciate that the multiple dispatch design (which I find very much a pro for Julia) does bring about some additional challenges for a module system. But my hunch is that majority is because Julia was so strongly made by MATLABists to MATLABists (can replace MATLAB with e.g. R or Mathematica) that basic features required for a general purpose language were not much taken into account.

> But there's also just so much great stuff about the language that I'm not even really mad about any of this :)

This is exactly why I'm so mad about this! I would love to use all that great stuff AND follow good programming principles, but now I can't.


Yeah, maybe I'm just not jaded enough yet, but all the issues I've seen seem relatively superficial and solvable to me. It feels more like immaturity pains than fatal flaws to me. But it's not so young anymore, so maybe the immaturity is more insurmountable than it seems to me. I dunno!


Nobody is going to implement an important library in Julia that other things rely on if everything that relies on it must be reimplemented in Julia too.


Any chance of Julia supplanting Python for scientific computing?


Depends on what you mean. For me personally as a physicst, Julia supplanted Python like 7 years ago and I haven't looked back. There's a vibrant, active ecosystem of useful open source packages relevant to my needs constantly being developed. I've had zero reason to use python in ages.

Other people may have different needs, different experiences, and different organizational or legacy constraints that makes it so that switching from python to julia is just not on the table any time soon. That's fine.

Julia and Python continue to both grow in the scientific computing space, and I think Julia is becoming more and more of a known name, and an accepted tool in the scientist's toolkit. It may not be the tool the majority of scientists use any time soon, but that's not really a problem.


I am not at the numerical programming level of most folks in this thread but have written useful numerical software in Python, Julia, Fortran, Java, C#, C++, and Matlab.

My 0.02 is that Julia is awesome... if I was starting a new numerical project from scratch, I'd pick Julia 100% unless there was a very compelling reason to use another language (e.g. to interact with a legacy system). There are some Julia libraries that are so insanely good, like SciML/DiffEq, that I use Julia just to use those libraries (because I'm not smart enough to re-implement that stuff in some other language).

That being said, Python has a sufficiently good ecosystem that it's a sane choice for scientific programming where all out speed (or multi-node parallel computing) is not the top priority. Python with Numpy/Numba, RAPIDS and/or JAX can be very fast and it is very easy to write concise and fast programs this way. The code quality of RAPIDS and JAX are very high too and documentation is solid.

I'd also offer that Chapel and Rust belong in the "HPC/super-computing capable" pantheon of programming languages.


These discussions tend to go in circles because there are several incompatible notions of what “scientific computing” is. The first arena that comes to my mind is very high performance numerical code: simulations of merging galaxies, of the atmosphere, etc. In this arena, there are exactly four languages that have been used at the most demanding level: Fortran, C, C++, and Julia. Python is not on the radar.


I think this is somewhat restrictive. A significant amount of scientific computing work is done in high-level languages like Python that dispatch to lower-level kernels (like C++, jitted Numba, etc). I wouldn't discount them as "not scientific computing". The jitted kernels really blur the line, of course, because in many cases the jitted code runs in the same environment as the higher-level language.


In some sense it's not right to claim python is used in those cases. The core libraries are all written in one of the 4 languages GP listed. Python is just the glue, or high level algorithm.


Sure, but it's a pretty useless sense because it will, among other problems, cause you to answer the question "which language should I teach my undergrads first" completely wrong.


I think trying to supplant python is the wrong goal, or maybe the wrong frame. Supplanting an already established technology is very hard and depend so many factors that not directly controllable by the julia designer and community.

I think focusing on providing significant value to the dev community in a way that's not disruptive to their current workflow is a better metric of success.


If it weren’t for the one based indexing, I would use Julia for almost all my math research. Overall it is a very well designed language, and a ton of things just work.

But I can’t for the life of me understand how any language designer signed off on one based indexing.


I've never understood why this bugs people so much. I switch between languages with zero-based and one-based indexing about every hour or so. Never had a problem beyond the usual "off-by-one's" that I do anyway even without switching.

You have to keep track of which language you're using anyway to produce correct syntax...


Julia aims to also be a C-Compatible low-level language with manual memory management in addition to the GC.

When building data-structures you're often using small unsigned integers that use the complete value-space, (e.g. u8s and their bitsets in tries, u16s and blocks with that size in sparse vectors).

With 1 based indexing, you can't do that anymore without first having to cast the index into the next larger integer, adding 1, and then doing the access, which is a PITA, adds instructions, and messes with prefetching, while you're already working with difficult, hard to read, and heavily optimised stuff.


Why can't the compiler decrement indices as needed? It makes sense to keep 0-based indexing in the final binary, but the language's design doesn't constrain that.


My issue with one based indexing is more philosophical in nature. I truly believe to the core of my being that zero is the first number, and that indexing from one is simply wrong. The evidence leading me to this conclusion is that I have yet to know of a single instance where a procedure or mathematical formula becomes simpler when indexing from one; yet there are so many that become simpler when indexing from zero. To this extent, I find it almost offensive that a language geared toward mathematicians would intentionally choose to index from one.


I think the rational for one based indexing was to be closer to other languages used to numerical computation such as Fortran or Matlab.

It's been a long time i have used any language with one based indexing, but i don't remember it being particularly important in my day to day.


Don't all the languages geared towards science and math use 1-based indexing?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: