Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems so silly to me.

It’s PyTorch-if they said “the next version of PyTorch will be in Julia, the ecosystem would shift accordingly.

They’re practically saying “this language has every feature we need and want, most of them already existing, but we’re going to continue re-inventing them in this objectively less suitable language because we clearly wish to make life harder for ourselves”



Or I read it as "We want to make life as easy for our userbase as possible, so we will put more work on ourselves to make our users lives easier" which is an attitude I very much appreciate.


In the long run I think moving to Julia would make a lot of sense.

I have used MATLAB, R, Python and Julia extensively for doing all sorts of data related things during the last 20 years. Julia is incredibly easy to work with, very elegant and really efficient.

R and Python have always felt clumsy in some ways, and hard to write really performant code, even if you are more proficient in Python! As a seasoned Lisper and MLer, even after having a lot of Python experience in my belt, Julia felt much easier to work with from the very beginning.

Furthermore, most Julia libraries are written in pure Julia which simplifies things a lot and enhances composability. While there are great libraries around, the DL landscape is a bit lacking. Flux is great, but I would not use it to build e.g. transformers as it changes too often and has too few maintainers behind it. Hence a potential migration of Torch to Julia would be fantastic.


Flux dev here. There's Transformers.jl which has prebuilt transformers built on top of Flux. While the package does change, we have been more careful about ensuring we don't break code all that often.


But Julia doesn't have a good story for anything else besides that.

You can take a Python web server process, have a request call a task that uses NumPy and OpenCV and scikit-learn, get that back, and you're done, all in the same language.

Julia's community does not seem to have aspirations beyond high-performance math code, which is great for its use, but I'm not going to learn Julia just for that when I can implement the entirety of a development pipeline in Python and have all the other niceties that come with it.


That's just not true.

https://github.com/GenieFramework/Genie.jl https://github.com/JuliaWeb/HTTP.jl https://www.youtube.com/watch?v=xsxJt4prFG4

And with upcoming improvements to binary size and structured concurrency (it already does go-like lightweight threads) it will get even better.


A notebook doesn't make for a line-of-business web app.


And the two other links?


I don't know why I didn't see them. I read through the docs with Genie; it seems to be about where Django was 12 years ago as far as feature development goes. Enough to be very productive for some use cases, not sufficiently productive to consider it a step up from the existing tech that's out there.


Exactly.

PyTorch is not only easy, but is a joy to work with.

Among researchers, TensorFlow is rapidly losing ground to PyTorch, and, I think, will keep losing ground until it becomes a niche and only used by Googlers and some others.

https://horace.io/pytorch-vs-tensorflow/


agreed, and this always been the driving philosophy of pytorch, and perhaps why it kind of won so much brainshare against tensorflow despite _long_ odds when torch was ported from lua.

Soumith Chintala had a keynote talk in juliacon where he focused on these points; https://www.youtube.com/watch?v=6V6jk_OdH-w


were those odds really long? TensorFlow is a beastly mess; and there were some very serious breaking issues between minor version revisions, it was so bad that I used to work at a company where we joked "this company only exists because people can't install tensorflow". We also used to joke that in order to install TensorFlow, step 1: install the JVM.


Does keeping as much of the codebase as possible in Python (or keeping the fast parts in C++) actually make things easier for the userbase, or do they just care about having a first-class interface in Python regardless of the implementation language?


Almost certainly the latter. Python excels in this because it’s really easy to learn, so non python libraries provide python functions that you can call which may not be implemented in pure python.


Language choices are less about language to me and more about ecosystem of libraries. Python has generally been very strong in the ml/data science realm. I know Julia is catching up but am unsure just how much it covers. Example of python libraries I would consider needed as part of the data ecosystem, numpy, pytorch/tf, batch and stream processing frameworks like spark/flink/beam, workflow orchestration like kubeflow/airflow, data formats like pyarrow, etc. My company does most of our ml in google cloud so gcp libraries are also quite helpful. How much of that does Julia have equivalents to? How much coverage do those equivalents have? We’ve done some ml things outside python before and one general issue for most languages is leaving python there’s a high risk that something will be missing. Because of that if we use another language it’s preferred to keep scope of it’s project small. I think only big exception is on the data engineering side sometimes Java is better as a lot of batch/stream frameworks have best coverage in Java, although ml libraries is weaker there so our usage of Java is mainly data pipelines.

Another issue is pytorch/tf in python are very dominant in research/projects. Often we clone relevant recent projects and try experimenting with them to see if they help. Swapping to Julia would hurt a ton in that area.

edit: Also while I'm fond of python I'd be very open to seeing another language win. There are language design choices I dislike in python, but I like enough of the language and ecosystem as been too strong to leave most other languages worth pondering. If Julia grows enough that my coworkers start asking for Julia support I'd be happy to explore it. My pet preferred language is crystal (ruby like readability + types + good performance) but ecosystem wise it's tiny.


One of the really nice things with Julia is some of the ecosystem needs disappear. Python needs a lot of ecosystem because none of the packages work together, and the language is slow so you have to make sure you are doing as much work as possible outside the language itself. To answer your question more specifically:

Numpy -> Array + broadcasting (both in Julia Base)

pytoch/tf -> Flux.jl (package)

batch/stream processing -> you don't need it as much, but things like OnlineStats exist. Also Base has multithreaded and distributed computing. Spark in particular is one where it lets you use a cluster of 100 computers to be as fast as 1 computer running good code.

pyarrow -> Arrow.jl (there's also really good packages for JSON, CSV, HD5 and a bunch of others)

Let me know if you have any other questions. Always glad to answer!


Good spark support would be a good answer for batch/stream processing. I am a little scared of definition of support. Apache beam supports like 5 runners (flink, spark, data flow, etc) but the quality of runner support is extremely inconsistent. I’ve also noticed even for python flink sometimes have very useful operations only in Java with no wrapper. Although honestly having data pipelines in one language and downstream users of the data in a different language works pretty well in my experience so mixing data pipeline languages is somewhat ok.

What’s workflow orchestration choice? That’s main one you didn’t touch. My work area in on an ml training platform and a lot of my work can be described as wrapper work on kubeflow to allow dozens of other ml engineers to manage experiments/workflows. For python the main choices are kubeflow/airflow. Ray kind of but ray workflows are still quite new and missing a lot of useful features. I need some system to run hundreds of ml workflows (one workflow being like 5-10 tasks some short some long) per day and manage there tasks well.

Broader area also includes libraries like weights and biases, bento ml, etc (experimentation management libraries).

In theory you can have workflow manager in one language and workflow code in a different language. Main downside is it makes debugging locally workflows harder (breakpoints are a little sad across most language boundaries), but it is doable and we debated migrating to temporal (Java workflow system) before.


> What’s workflow orchestration choice?

We’ve moved away from language-integrated orchestration entirely at my work: we use Argo Workflows on Kubernetes, so we’re just orchestrating containers and aren’t beholden to language-specific requirements anymore so you can use whatever language/tool you want provided it packs into a container and accepts/returns what the rest of the workflow expects.


I don't use much in terms of orchestration, so I'm probably not the right person to ask there.

One of the really big potential benefits of Julia is that it lets you remove language barriers which is especially nice if you are doing ML research (or playing around with new model types, etc). Since the ML stack is Julia all the way down to CUDA/BLAS/julia loops, you can really easily inspect or modify everything in your stack.


Python is the middle manager of languages. It sucks at everything, but always knows a guy.


I saw a quote once that stuck with me: python is the second best language for everything, first at nothing.


What are the best languages for "executable pseudocode" or "glue language" roles?


And there's 1 guy he knows that's a 10x developer and always willing to take one for the team.

I'll leave it to HN to figure out what that means :P


Call native libraries?


Write a metric shit ton of unmaintainable, underperforming pandas code and call it an effective solution


By "the ecosystem" they mean the python ecosystem, not the PyTorch ecosystem.

PyTorch is a small part of the python ecosystem. The python ecosystem is not going to change at all if PyTorch moves to Julia.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: