For what it's worth, my workflow is actually much closer to the parent's; I have...

jampekka · on Dec 5, 2023

That's probably something close to DaemonMode.jl?

If the problem was just running a daemon, it would be an inconvenience at worst. The problem is that changes to the code tend to trigger a recompile, can't be made (e.g. structs can't be redefined if I recall correctly) and most crucially the session can get "silently" into some unexpected confused state.

If you don't need to change the code (and it's structured well enough not to hold a state between calls), then running in a process is no problem. But for that you can do just plain old AOT.

Aside from my personal gripes, which I have probably aired here more than enough for everybody, I think the design that more or less forces REPL/notebook style is detrimental to quality of scientific code. Scientists don't learn to modularize their code and don't learn to understand the basic control flow. I see this all the time with my colleagues and the "pedagogical problem" of notebooks/REPL is very common when I teach programming. E.g. a common problem is that students don't understand variable assignment because with REPL/notebook it doesn't necessarily follow the normal program flow.

I'm kind of amazed how little the scientific programming scene is concerned about this. From software development perspective (I'm an ex software developer turned scientist) having clearly defined state transitions, independent modules of code and the code doing what it says are really really fundamental things, without which you have no chance in hell in making anything more than a few lines to not turn into an unmanageable and buggy mess.

Maybe there's a (mis)conception that scientific code is somehow fundamental different beast than "general programming". But it's not. It's just a relic of having bad systems languages like C and C++ and bad special purpose languages like MATLAB and R. That Python can be and is used for both is an existence proof that there is no such divide. But we need a better Python for both general and scientific programming.

sanderjd · on Dec 5, 2023

> Aside from my personal gripes, which I have probably aired here more than enough for everybody, I think the design that more or less forces REPL/notebook style is detrimental to quality of scientific code.

So, two things here: First of all, I have greatly appreciated the discussion, so don't feel like you've over-aired your thoughts!

But also, here's this, in my view, hyperbole again, I just don't see how "more or less forces" is the right way to describe the state of choosing whether or not to use a notebook/REPL style for julia. It works perfectly fine to develop with a different style. There's a little bit of startup time, but it's just not nearly a big enough deal to say it "forces" a different development process.

I think your criticisms of notebook driven programming make sense, but are also slightly overblown; in my experience, scientists are perfectly capable of writing good modular code, despite preferring to work in notebooks most of the time. It's not not a problem, I just don't think it's been a very big problem, in my experience.

But I very much agree with your last two paragraphs. I just don't think notebooks / REPLs are a big part of the reason things have ended up this way. I think it's more of a cultural thing that gets passed down from generation to generation, and it has to do with respect; I think scientists have never taught one another that code deserves respect and professionalism. I think they have always thought of it as "some throwaway stuff that generates the data for the paper"; the paper is the thing deserving of respect, not the tools used to create it.

But I think this has been slowly changing as more and more scientists are digitally native and learn increasingly more software skills as a first class concern.

jampekka · on Dec 7, 2023

Agree that I may be hyperbolical at times and this can be detrimental to fruitful discussion. I'll be more mindful of it, thanks!

More accurate is that the REPL/notebook style is currently so much more convenient to get started and to "explore" with that people will use that and it's what all Julia examples etc teach. Pluto.jl may be a good "transition" from this though, although I'm afraid modularization is still too inconvenient with it.

The script-workflow latency is also a reason why "scripters" don't switch to Julia even though they otherwise would.

So while the it's not practically impossible as I hyperboled, it is IMHO a blocker for wider Julia adoption and a transition to better programming practices.

In my experience in both teaching and helping working scientists with their code problems I think the current situation that REPL/notebook is so much easier to get started with is at least a major reason why better practices don't get adopted.

Scientists themselves doing any programming is quite a recent thing. Usually it was "lab engineers" doing "the coding" and data mangling and the scientist doing "the analysis" in something like SPSS or Excel or copypasted R oneliners.

It's also really understandable why scientists don't make much effort to learn and enforce good programming. It's a relatively minor part of their job, they manage to scrape much of the analysis together with poor practices, and when it almost inevitably explodes as the analysis gets more complicated, someone "more technical" gets called to fix the mess (I do this all the time).

In most languages modularizing code is made needlessly hard, and I'm smelling something like this in Julia module system (and in e.g. Python packaging). I sometimes wonder if there's some (unconscious) gatekeeping to keep "the using" and "the programming" (and "the core development") apart artificially. You actually sometimes encounter explicit statements like this (Linus Torvalds famously about C++ but in this post's discussion something like that not knowing R means that you shouldn't use ML (which is extra bizarre because very very few actually know R)).

The benefits of modularization are also not easy to foresee until you learn from many hard lessons of ending up with unmanageable sphagetti or analysis that is wrong but it may be even impossible to pin out why due to lack of reproducible code. And if you don't know about the ways of avoiding/mitigating this, you're not really equipped with learning the lesson.

When this compounds with significant hurdles to do "the right thing" I find it almost inevitable that "the right thing" will not get adopted.

As a sidenote I find it also quite odd why so little effort is done to "show the work" with data analysis when it's literally required when you do it in math class. I think they are almost the same thing. Maybe it stems from the computer seen as a calculator and "the work" is done on paper, but this is mostly due cultural lag from the era where computers were in practice calculators.