Yeah, I mean, I think we're all basically doing this now, right? I wouldn't choose this design, but I think something similar to DeltaLake can be simplified down for tons of use cases. Manifest with CAS + buffered objects to S3, maybe compaction if you intend to do lots of reads. It's not hard to put it together.
You can achieve stupidly fast read/write operations if you do this right with a system that is shocking simple to reason about.
> Step 4: queue.json with an HA brokered group commit
> The broker is stateless, so it's easy and inexpensive to move. And if we end up with more than one broker at a time? That's fine: CAS ensures correctness even with two brokers.
TBH this is the part that I think is tricky. Just resolving this in a way that doesn't end up with tons of clients wasting time talking to a broker that buffers their writes, pushes them, then always fails. I solved this at one point with token fencing and then decided it wasn't worth it and I just use a single instance to manage all writes. I'd again point to DeltaLake for the "good" design here, which is to have multiple manifests and only serialize compaction, which also unlocks parallel writers.
The other hard part is data deletion. For the queue it looks deadly simple since it's one file, but if you want to ramp up your scale and get multiple writers or manage indexes (also in S3) then deletion becomes something you have to slip into compaction. Again, I had it at one point and backed it out because it was painful.
But I have 40k writes per second working just fine for my setup, so I'm not worrying. I'd suggest others basically punt as hard as possible on this. If you need more writes, start up a separate index with its own partition for its own separate set of data, or do naive sharding.
I'm not building a queue, but a lot of things on s3 end up being queue-shaped (more like 'log shaped') because it's very easy to compose many powerful systems out of CAS + "buffer, then push". Basically, you start with "build an immutable log" with those operations and the rest of your system becomes a matter of what you do with that log. A queue needs to support a "pop", but I am supporting other operations. Still, the architecture overlap all begins with CAS + buffer.
That said, I suspect that you can probably beat SQS for a number of use cases, and definitely if you want to hold onto the data long term or search over it then S3 has huge options there.
Performance will be extremely solid unless you need your worst case latency for "push -> pop" to be very tight in your p90.
This is fascinating. It sounds like you're building "cloud datastructures" based on S3+CAS. What are the benefits, in your view, of doing using S3 instead of, say, dynamo or postgres? Or reaching for NATS/rabbitmq/sqs/kafka. I'd love to hear a bit more about what you're building.
It's just trade-offs. If you have a lot of data, s3 is just the only option for storing it. You don't want to pay for petabytes of storage in Dynamo or Postgres. I also don't want to manage postgres, even RDS - dealing with write loads that S3 handles easily is very annoying, dealing with availability, etc, all is painful. S3 "just works" but you need to build some of the protocol yourself.
If you want consistently really low latency/ can't tolerate a 50ms spike, don't retain tons of data, have <10K/s writes, and need complex indexing that might change over time, Postgres is probably what you want (or some other thing). If you know how your data should be indexed ahead of time, you need to store a massive amount, you care more about throughput than a latency spike here or there, or really a bunch of other use cases probably, S3 is just an insanely powerful primitive.
Insane storage also unlocks new capabilities. Immutable logs unlock "time travel" where you can ask questions like "what did the system look like at this point?" since no information is lost (unless you want to lose it, up to you).
Everything about a system like this comes down to reducing the cost of a GET. Bloom filters are your best friend, metadata is your best friend, prefetching is a reluctant friend, etc.
I'm not sure what I'm building. I had this idea years ago before S3 CAS was a thing and I was building a graph database on S3 with the fundamental primitive being an immutable event log (at the time using CRDTs for merge semantics, but I've abandoned that for now) and then maintaining an external index in Scylla with S3 Select for projections. Years later, I have fun poking at it sometimes and redesigning it. S3 CAS unlocked a lot of ways to completely move the system to S3.
A lot of good insights here. I am also wandering if they can just simply put different jobs (unclaimed, in-progress, deleted/done) into different directory/prefix, and rely on atomic object rename primitive [1][2][3] to solve the problem more gracefully (group commit can still be used if needed).
What you describe is very similar to how Icechunk[1] works. It works beautifully for transactional writes to "repos" containing PBs of scientific array data in object storage.
Oof, I probably misspoke there just slightly. I attempted to solve this with token fencing, I honestly don't know if it worked under failure conditions. This was also a while ago. But the idea was basically that there were two tiers - one was a ring based approach where a single file determined which writer was allocated a 'space' in the ring. Then every write was prepended with that token. Even if a node dropped/ joined and others didn't know about it (because they hadn't re-read the ring file), every write had this token.
Writes were not visible until compaction in this system. At compaction time, tokens would be checked and writes for older tokens would be rejected, so even if two nodes thought that they owned a 'place' in the ring, only writes for the higher value would be accepted. Soooomething like that. I ended up disliking this because it had undesirable failure modes like lots of stale/ wasted writes, and the code sucked.
I've been writing rust for over a decade and I've been quite active in the community at times. I've barely even heard of the rust foundation, let alone seen suggestions to donate.
I assume this is hugely beneficial for research on intervention methods, not for treatment. I think everyone is focusing on "I'd rather know" but imagine if you could get larger populations with a diagnosis earlier on, how impactful that would be for testing an intervention?
First 2 are very legit, as a rust dev myself. I wish Rust had a larger stdlib and every argument against it leaves me pretty unconvinced/ feeling like the arguments could just as easily leave us to believe that "HashMap" should not be included (some people believe this), or even "time" should not be, etc.
The compile times are easily the worst part though. They couple poorly with (2).
I feel fine personally without (3), and (4)/(5) don't come up for me.
> For myself, for example... if it were my repository and my team and my hiring, and I was starting from scratch... I'd be extremely careful about third party crate adoption and have an extremely minimalistic approach there.
Same. I basically stick to a minimal `cargo-vet` and avoid new crates. I review new ones. I've chosen to take on a new crate when it's something like "the author literally wrote the spec for this format", but otherwise I'm reluctant.
I can't give you examples, but my experience is that AI does very well with Rust except for cases where a library has a constantly changing API/ has had recent breaking changes. I find that AI does extremely well at "picking up" a Rust codebase, I suspect due to the type information providing context but I couldn't say.
> - It would be a big deal if Rust did have a safe dynamic linking ABI. Someone should do it. That's the main point I'm making. I don't think deflecting by saying "but C is no safer" is super interesting.
I think we all agree that it would be a huge deal.
> - So long as this problem isn't fixed, the upside of using Rust to replace a lot of the load bearing stuff in an OS is much lower than it should be to justify the effort. This point is debatable for sure, but your arguments don't address it.
As you point out, this is the debatable part, and I'm not sure I get your justification here.
This might end up being the forcing function (quoting myself from another reply in this discussion):
> It can't be that replacing 20 C/C++ shared objects with 20 Rust shared objects results in 20 copies of the Rust standard library and other dependencies that those Rust libraries pull in. But, today, that is what happens. For some situations, this is too much of a memory usage regression to be tolerable.
If memory was cheap, then maybe you could say, "who cares".
Can you even make the standard library dynamically linked in the C way??
In C, a function definition usually corresponds 1-to-1 to a function in object code. In Rust, plenty of things in the stdlib are generic functions that effectively get a separate implementation for each type you use them with.
If there's a library that defines Foo but doesn't use VecFoo>, and there are 3 other libraries in your program that do use that type, where should the Vec functions specialized for Foo reside? How do languages like Swift (which is notoriously dynamically-linked) solve this?
You can have an intermediate dynamic object that just exports Vec<Foo> specialized functions, and the three consumers that need it just link to that object. If the common need for Vec<Foo> is foreseeable by the dynamic object that provides Foo, it can export the Vec<Foo> functions itself.
Your apt update would still be huge though. When the dependency changes (eg. a security update) you’d be downloading rebuilds of 20 apps. For the update of a key library, you’d be downloading your entire distribution again. Every time.
NixOS "suffers" from this. It's really not that bad if you have solid bandwidth. For me it's more than worth the trade off. With a solid connection a major upgrade is still just a couple minutes.
I think you misunderstand my point. Nix basically forces dynamic linking to be more like static linking. So changing a low level library causes ~everything to redownload.
Oh, well yeah, statically linked binaries have that downside. I guess I don't think that's a big deal, but I could maybe imagine on some devices that are heavily constrained that it could be? IDK. Compression is insanely effective.
You are forgetting about elephant in the room - if every bug require rebuild of downstream then it is not only question of constraint it is also question of SSD cycles - you are effectively destroying someone drive faster. And btrfs actually worsens this problem - because instead of one Copy on Write of library you now have 2n copies of library within 2 copies of different apps. Now (reverting/ø) update will cost you even more writes. It is just waste for no apparent reason - less memory, less disk space.
"compression is insanely effective" - And what about energy? compression will increase CPU use. It will also make everything slower - slower than just plain deduplication. Also, your reason for using worse for user tech is: the user can mitigate in other ways? This strikes me as the same logic as "we don't need to optimize our program/game, users will just buy better hardware" or just plain throwing cost to user - this is not valid solution just downplaying of the argument.
If Rust and static linking were to become much more popular, Linux distros could adopt some rsync/zsync like binary diff protocol for updates instead of pulling entire packages from scratch.
Static linking used to be popular, as it was the only way of linking in most computer systems, outside expensive hardware like Xerox workstations, Lisp machines, ETHZ, or what have you.
One of the very first consumer hardware to support dynamic linking was the Amiga, with its Libraries and DataTypes.
We moved away from having a full blown OS done with static linking, with exception of embedded deployments and firmware, for many reasons.
Even then, they would still need to rebuild massive amounts on updates. That is nice in theory, but see the number of bugs reported in Debian because upstream projects fail to rebuild as expected. "I don't have the exact micro version of this dependency I'm expecting" is one common reason, but there are many others. It's a pretty regular thing, and therefore would be burdensome to distro maintainers."
The "issue" isn't that these new tools from Ubuntu is in Rust, that's almost irrelevant. The issue is that they are not the "standard" tools.
If Ubuntus Rust replacements aren't adopted in other distributions, or only in some of them, we get an even more fragmented Linux ecosystem. We've already seen this with the sudo-rs (which really should be called something else). It's a sudo replacement, ideal a one to one replacement, but it's not 100% and for how long? You can also think of the Curl provided by Microsoft Powershell, which isn't actually Curl and only partially provides Curl functionality, but it squats the command name.
Ubuntu might accidentally, or deliberately, create a semi-incompatible parallel Linux environment, like Alpine, but worse.
This is a very shallow, very boring criticism. I doubt it will resonate. Modern C++ does not solve the safety issues, it has plenty of brand new footguns like string_view. Who cares if Go is better than Rust? Feel free to write Go, no one cares.
"mut and fn very annoying to read" like okay lol who cares? What should anyone take from your post other than that you aren't that into Rust?
I don't agree. For one thing, the language directly impacts things like iteration speed, runtime performance, and portability. For another, there's a trade-off between "verbose, eats context" and "implicit, hard to reason about".
IMO Rust will strike a very strong balance here for LLMs.
> Formal specifications and automated testing, will beat any language specific tooling.
I don't understand what you mean. Beat any language at what? Correctness? I don't think that's true at all, but I also don't see how that's relevant, it definitely doesn't address the fact that Rust will virtually always produce faster code than the majority of other languages.
> Hardly much different than dealing with traditional offshoring projects output.
> Any tool that can plug into MLIR and use LLVM, can potentically produce fast code.
I guess that's sort of technically true, but not even really? Like, obviously you can compile Python to C and then compile that with clang, but it doesn't make it fast. But even if that were the case, there aren't that many languages that have Rust performance so who cares? "Potentially" is sort of saying we might have a future language that's better, but of course anyone would agree.
> Also there is the alternative path to execute code via agents workestration, just like low code tooling work.
I don't understand how this is relevant.
> I see you never had the fortune to review code provided by cheap offshoring teams.
I just don't understand why you're bringing it up tbh I don't understand the relevance.
It doesn't need to win the benchmarks Olympics, it needs to be fast enough.
Plenty of AI based tooling is already trying out this path.
Agents execute actions that in the past would be manually programmed applications, now tasks can be automated given a few mcp endpoints.
LLMs are already at the same output quality of lousy offshoring companies, thus having to fix a bit of it is something that unfortunately many of us are already used with fellow humans.
I feel like maybe we're drifting here. You said this:
> Well, I am on the provocative side that as AI tooling matures current programming languages will slowly become irrelevant.
And I said I disagree because language directly impacts things like performance. And it does, massively. Like, order of magnitude differences are not hard to achieve simply by changing language.
You are now saying that things just need to be "fast enough", but I don't get how that's relevant. The point is that a different language will have different tradeoffs, and AI changes some of the calculus there, but language is still a major component of the produced artifact. If you agree that language has major implications on the produced artifact, then we agree. If you don't, then I'll just once again appeal to the massive performance gaps between different languages.
I still am not understanding the offshoaring conversation.
> And I said I disagree because language directly impacts things like performance. And it does, massively. Like, order of magnitude differences are not hard to achieve simply by changing language.
Only because you focus too much into the frontend, instead of the whole compiler infrastructure, with multiple frontends for the same compilation pipeline.
> I still am not understanding the offshoaring conversation.
Because you never had to review human written code by cheap offshoring teams, zero difference with LLM generated code quality, even today.
If the offshore company provides me a Rust crate that compiles, that is already a lot of guarantee. Now that does not solve the logic issues and you still need testing.
But testing in Python is so easy to abuse as LLM. It will create mocks upon mocks of classes and dynamically patch functions to get things going. Its hell to review.
What is a programming language used for if not the most formal specification possible? Of course it doesn't matter what language you use if you perfectly describe the behavior of the program. Of course, there's also no point in using LLMs (or outsourcing!) at that point.
TBH I don't know how to do that work. If I'm in the backend it's very easy for me. I can think about allocations, I can think about threading, concurrency, etc, so easily. In browser land I'm probably picking up some confusing framework, I don't have any of the straightforward ways to reason about performance at the language level, etc.
Maybe once day we can use wasm or whatever and I can write fast code for the frontend but not today, and it's a bit unsurprising that others face similar issues.
Also, if I'm building a CLI, maybe I think that 1ms matters. But someone browsing my webpage one time ever? That might matter a lot less to me, you're not "browsing in a hot loop".
Yes but it's not really fair to expect me to know how to do that. Just because I know how to do it for backend code, where it's often a lot easier to see those copies, doesn't mean I'm just a negligent asshole for not doing it on the frontend. I don't know how, it's a different skillset.
The parent commenter earlier seems to be implying that it's only a matter of not caring.
> care so little about the performance of the code they ship to browsers.
> but I'm curious to hear how do you know it for backend code but not frontend code.
Because I find backend languages extremely easy to reason about for performance. It seems to me that when I write in a language like rust I can largely "grep for allocations". I find that hard to see in javascript etc. This is doubly the case because frontend code seems to be extremely framework heavy and abstract, so it makes it very hard to reason about performance just by reading the code.
That's completely relatable, and also a major point in my original argument. Using heavily abstracted frameworks will automatically cap you performance wise. The only way out is to not use a framework or one that's known to be lightweight. In backend or tooling like with the JS compiler from OP, one tends to not use heavy frameworks in the first place.
You think about allocations: JS is a garbage collected language and allocations are "cheap" so extremely common. GC is powerful and in most JS engines quite fast but not omniscient and sometimes needs a hand. (Just like reasoning with any GC language.) Of course the easiest intervention to allocations is to remove allocations entirely; just because it is cheap to over-allocate, and the GC will mostly smooth out the flaws with such approaches, doesn't mean ignoring the memory complexity of the chosen algorithms. Most browser dev tools today have allocation profilers equal or better to their backend cousins.
You think about threading, concurrency, etc: JS is even a little easier than many backend languages because it is (almost excessively) single-threaded. A lot of concurrency issues cannot exist in current JS designs unless you add in explicit IPC channels to explicitly "named" other threads (Service Workers and Web Workers). On the flipside, JS is a little harder to reason about threading than many backend languages because it is extensively cooperatively threaded. Code has to yield to other code frequently and regularly. Shaving milliseconds off a routine yields more time to other things that need to happen (browser events, user input, etc). That starts to add up. JS encourages you to do things in short, tight "bursts" rather than long-running algorithms. Here again, most browser dev tools today have strong stack trace/flame chart profilers that equal or exceed backend cousins. Often in JS "tall" flames are fine but "wide" flames are things to avoid/try to improve. (That's a bit reversed from some backend languages where shallow is overall less overhead and long-running tasks are sometimes better amortized than lots of short ones.)
> But someone browsing my webpage one time ever? That might matter a lot less to me, you're not "browsing in a hot loop".
The heavily event-driven architecture of the browser often means that just sitting on a webpage is "browsing in a hot loop". Browsers have gotten better and better at sleeping inactive tabs and multi-threading tabs to not interfere with each other, but things are still a bit of a "tragedy of the commons" that the average performance of a website still directly and indirectly drags everyone else down. It might not matter to you that your webpage is slow because you only expect a user to visit it once, but you also aren't taking into account that is probably not the only website that user is browsing at that moment. Smart users do directly and indirectly notice when the bad performance of one webpage impacts their experiences of other web pages or crashes their browser. Depending on your business model and what the purpose of that webpage is for, that can be a bad impression that leads to things like lost sales/customers.
I don't think it's the same tbh. In Rust I can often just `rg '\.clone'` and immediately see wins. Allocations are far easier to track statically. I don't have a good sense for "seeing" allocations when I look at JS, it feels like it's unfair to expect me to have that tbh. As for profilers, yes I could see things like "this code is allocating a lot" but JS hardly feels like a language where it's smooth to then fix that, and again, frameworks are so common that I doubt I'd be in a position to do so. This is really in contrast to systems languages again where I also have profilers but fixing the problem is often trivial.
> You think about threading, concurrency, etc: JS is even a little easier than many backend languages because it is (almost excessively) single-threaded. A lot of concurrency issues cannot exist in current JS designs unless you add in explicit IPC channels to explicitly "named" other threads (Service Workers and Web Workers).
My issue isn't with being able to write concurrent code that has no bugs, my issue is having access to primitives where I have tight control over concurrency and parallelism. The primitives in JS do not provide that control and are often very heavy in and of themselves.
I think it's perhaps worth noting that I am not saying "it's impossible to write fast code for the browser", I'm saying it is not surprising that people who have developed skillsets for optimizing backend code in languages designed to be fast are not in a great position to do the same for a website.
> I don't have a good sense for "seeing" allocations when I look at JS, it feels like it's unfair to expect me to have that tbh.
I still think that's a training/familiarity problem more than a language issue? You can just as easily start with `rg \bnew\b` as you can `rg \.clone`. The `new` operator is a useful thing to start with as in both C++ and C#, too. (Even though JS new is technically a different operator than both C++ and C#'s.) After that the JSON syntax is a decent start. Something like `rg {\s*["\.']` and `rg [` are places to start. Curly brackets and square brackets in "data position" are useful in Python and now some of C#, too.
After that the next biggest culprits are common library things like `.filter()` and `.map()` which JS defaults to reified/eager versions for historic reasons. (There are now lazier versions, but migrating to them will take time.) That sort of library allocations knowledge is mostly just enough familiarity with standard library, a need that remains universal in any language.
> JS hardly feels like a language where it's smooth to then fix that
Again, perhaps this is just a familiarity issue, but having done plenty of both, at the end of the day I still see this process as the same: move allocations out of tight loops, use object pools if necessary, examine the O-Notation/Omega-Notation of an algorithm for its space requirements and evaluate alternatives with better mean or worst cases, etc. It mostly doesn't matter what language I'm working in the basics and fundamentals are the same. Everything is as "smooth" as you feel comfortable refactoring code or switching to alternate algorithm implementations.
> frameworks are so common that I doubt I'd be in a position to do so
Do you treat all your backend library dependencies as black boxes as well?
Even if that is the case and you want to avoid profiling your framework dependencies themselves and simply hope someone else is doing that, there's still so much in your control.
I find JS is one of the few languages where you can somewhat transparently profile even all of your dependencies. Most JS dependencies are distributed as JS source and you generally don't have missing symbol files or pre-compiled binary bricks that are inscrutable to inspection. (WASM is changing that, for the worse, but so far there are very few WASM-only frameworks and most of them have other debugging and profiling tools.)
I can choose which frameworks to use based on how their profiler results look. (I can tell you that I don't particularly like Angular and one of the reasons why is I've caught it with truly abysmal profiles more than once, where I could prove the allocations or the CPU clock time were entirely framework code and not my app's business logic.)
I've used profilers to guide building my own "frameworks" and help proven "Vanilla" approaches to other developers over frameworks in use.
> The primitives in JS do not provide that control and are often very heavy in and of themselves.
Maybe I'm missing what primitives you are looking for. async/await is about the same primitive in JS and Rust and there are very similar higher-level tools on top of them. There's no concurrency/parallelism primitives today in JS because there is no allowed concurrency or parallelism. There are task scheduling primitives somewhat unique to JS for doing things like "fan out" akin to parallelism but relying on cooperative (single) threading. Examples include `requestAnimationFrame` and `requestIdleCallback` (for "this can wait until you next need to draw a frame, including if you need to drop frames" and "this can wait until things are idle" respectively).
> I'm saying it is not surprising that people who have developed skillsets for optimizing backend code in languages designed to be fast are not in a great position to do the same for a website.
I think I'm saying that it is surprising to me that people who have developed skillsets for optimizing backend code in languages designed to be fast seem to struggle applying the same skills to a language with simpler/"slower" mechanics, but also on average much higher transparency into dependencies (fuller top-to-bottom stack traces and metrics in profiles).
To be fair, I get the impulse to want to leave it as someone else's problem. But as a full stack developer who has done performance work in at least a half dozen languages, I feel like if you can profile and performance tune Rust you should be able to profile and performance tune JS. But maybe I've seen "too much of the Matrix" and my "it's all the same" comes from a deep generalist background that is hard for a specialist to appreciate.
> I still think that's a training/familiarity problem more than a language issue?
But that's fine. Even if we say it's a familiarity problem, that's fine. I'm only saying that it's not reasonable to expect my skills in optimizing backend code to somehow transfer. Obviously many things are the same - reducing allocation, improving algorithmic performance, etc. But that looks very different when you go from the backend to the frontend because the languages can look very different.
> You can just as easily start with `rg \bnew\b` as you can `rg \.clone`.
That's not true though. In Rust you have to have a clone somewhere if you're allocating on the heap, or one of the pointer types like `new`. If I pass a struct around it's either cheaply moveable (ie: Copy) or I have to `clone` it. Granted, many APIs will clone "invisibly" within them, but I can always grep to find the clone.
In Javascript, things seem to allocate by default. A new object allocates. A closure allocates. Things are very implicit, you sort of are in an "allocates by default" mode with js, it seems. In Rust I can just do `[u8; n]` or whatever if I want to, I can just do `let x = "foo"` for a static string, or `let y = 5;` etc. I don't really have to question the memory layout much.
Regardless, you can just learn those rules, of course, but you have to learn them. It seems much easier to "trip onto" an allocation, so to speak, in js.
> Again, perhaps this is just a familiarity issue
I largely agree, though I think that js does a lot more allocation in its natural syntax.
> Do you treat all your backend library dependencies as black boxes as well?
No, but I don't really use frameworks in backend languages much. The heaviest dependency I use is almost always the HTTP library, which is reliably quite optimized. Frameworks impose patterns on how code is structured, which, to me, makes it much harder to reason about performance. I now have to learn the details of the framework. Perhaps the only thing close to this in Rust would be tokio.
> I've used profilers to guide building my own "frameworks" and help proven "Vanilla" approaches to other developers over frameworks in use.
I suspect that this is merely an issue of my own biased experience where I have inherited codebases with javascript that are already using frameworks.
> Maybe I'm missing what primitives you are looking for. async/await is about the same primitive in JS and Rust and there are very similar higher-level tools on top of them.
I mean, stack allocation feels like a pretty obvious one, reasoning about mutability, control over locking, the ability to `join` two futures or manage their polling myself, access to operating system threads, access to atomics, access to mutexes, access to pointers, etc. These just aren't available in javascript. async/await in js is only superficially similar to Rust.
I mean, a simple example is that I recently switched to CompactString and foldhash in Rust for a significant optimization. I used Arc to avoid expensive `.clone` calls. I preallocated vectors and reused them, I moved other work to threads, etc. I feel really comfy doing this in Rust where all of this is sort of just... first class? Like, it's not "weird" rust to do any of this. I don't have to really avoid much in the language, it's not like js where I'd have to be like "Okay, I can't write {a: 5} here because it would allocate" or something. I feel like that shouldn't be too contentious? Surely one must learn how to avoid much of javascript if they want to learn how to avoid allocations.
> To be fair, I get the impulse to want to leave it as someone else's problem.
I just reject that framing. People focus on what they focus on. Optimizing their website is not necessarily their interest.
> I feel like if you can profile and performance tune Rust you should be able to profile and performance tune JS.
I probably could but it's definitely not going to feel like second nature to me and I suspect I'd really feel like I'm fighting the language. I mean, seriously, I'd be curious, how do you deal with the fact that you can't stack allocate? I can spawn a thread in Rust and share a pointer back to the parent stack, that just seems very hard to do in javascript if not outright impossible?
> I think I'm saying that it is surprising to me that people who have developed skillsets for optimizing backend code in languages designed to be fast seem to struggle applying the same skills to a language with simpler/"slower" mechanics
Yeah I don't really see it tbh. I mean even if you say "I can do it", that's great, but how is it surprising?
> I probably could but it's definitely not going to feel like second nature to me and I suspect I'd really feel like I'm fighting the language. I mean, seriously, I'd be curious, how do you deal with the fact that you can't stack allocate? I can spawn a thread in Rust and share a pointer back to the parent stack, that just seems very hard to do in javascript if not outright impossible?
I had alluded to it before, but this is maybe where some additional experience with other garbage collected backend languages like C# or Java could help build some "muscle memory" here.
The typical lens in a GC-based language is value types versus reference types. Value types are generally stack allocated and pass-by-value (copy-by-value; copied from stack frame to stack frame when passed). Reference types are usually heap allocated and pass-by-reference. A reference is generally a "fat pointer", with the qualification that you generally can't dereference one like a pointer without complex GC locks because the GC reserves the right to move the objects pointed to by references (for instance, due to compaction, but can also due to things like promotion to another heap). References themselves follow the same pass-by-value rules generally (stack allocated and copied).
(The lines are often blurry hence "generally" and "usually": a GC language may choose to allocate particularly large value types on the heap and apply copy-on-write semantics in a way to meet the pass-by-value semantics. A GC language is also free to stack allocate small reference types that it believes won't escape a particular part of the stack. I bring up these edge cases not to suggest complexity but to remind that profile-guided optimization is often the best strategy in any language because any good compiler, even a JIT compiler, is trying to optimize what it can.)
In JS, the breakdown is generally that your value types are string, number, boolean, and your reference types are object, array, and function. `const a = 12` is a static, stack allocated number. `const x = 'foo'` is a static, stack allocated string. It will get copied if you pass it anywhere. Though there's one more optimization here that most GC languages use (and goes all the way back to early Lisp) called "string interning". Strings are always treated as immutable and essentially copy-on-write. Common strings and strings passed to a large number of stack frames get "interned" to shared memory (sometimes the heap; sometimes even just reusing the memory of their first compiled instance in the compiled binary). But because of the copy-on-write and how easy it is to trigger, and often those copies start stack allocated, strings are still considered value types, even though with "interning" they sometimes exhibit reference-like behavior and are sort of the "border type".
Of things to look out for `+` or `+=` where one of the sides is a string can be a huge memory allocator due to copying string bytes alone, which should be easy to expect to happen.
On the reference type side `let x = {a: 5}; let y = x`, the `{a: 5}` part is an object and does allocate to the heap (probably, modulo again things like escape detection by the JIT compiler), but `x` and `y` themselves are stack allocated references. That `let y = x` is only a reference copy.
> it's not like js where I'd have to be like "Okay, I can't write {a: 5} here because it would allocate" or something. I feel like that shouldn't be too contentious? Surely one must learn how to avoid much of javascript if they want to learn how to avoid allocations.
Generally, it's not about "avoiding" the easy language constructions because they allocate, it is balancing the trade-offs of when you want to allocate and how much.
Just like you might preallocate a vector before a tight loop, you might preallocate an array or an object, or even an object pool. (Build an array of objects, with a "free" counter, borrow them, mutate them, return them to the "free" section when done.)
But some of that is trade-offs, preallocation is sometimes harder to read/reason with. On the other side the "over-allocation" you are worried about might be caught entirely by the JIT's escape analysis and compiled out. For almost all languages it is best to let a profile or real data guide what to try to optimize (premature optimization is rarely a good idea), but especially for a GC language it can be crucial. Not because the GC language is more complicated or "magic" or "mysterious", but simply because a GC language is tuned for a lot of auto-optimizations that a manually managed memory language doesn't necessarily get "for free". The trade-off for references being much more opaque boxes than pointers is that a JIT compiler has more optimization options because it can just assume pointer math is off the table. It's between the JIT and the GC where an allocation lives, more times than not, and there are some simple optimization answers such as "the JIT stack allocated that because it doesn't escape this method". It shouldn't feel like a surprise when such things happen, when you get such benefits "for free". The JIT and GC are still maintaining the value-type or reference-type "semantics" at all times, those are just (intentionally) big easy "traits" with a lot of useful middle ground and lot of cross-implementation.
> stack allocation feels like a pretty obvious one, reasoning about mutability, access to pointers
A lot of the above should be a decent starting place for learning those tools. `let` versus `const` as maybe a remaining JS piece not explicitly dived into.
References are generally "pointer enough" for most work. The JS GC doesn't have a way to manually lock a reference to dereference it for pointer math today, but that doesn't mean it never will. Parts of WASM GC are applicable here, but mostly restricted to shared array buffers (blocks of bytes).
In other GC languages, C# has been exploring a space for GC-safe stack allocated pointers to blocks of memory that support (range checked) pointer-like math called Span<T> and Memory<T>. It's roughly equivalent to Rust's Arc-like mechanics, but subtly different as you would expect for existing in a larger GC environment. As that approach has become very successful in C# I am starting to expect variations of it in more GC languages in the next few years.
> control over locking, access to atomics, access to mutexes
For the most part JS is single threaded, stack data is copied (value types), and reference-types get auto-locking for "free" from the GC. So locks aren't important for most JS work and there's not much to control.
If you start to share memory buffers from JS to a Service/Web Worker or to a WASM process you may need to do more manual locks. The big family of tools for that is the Atomics global object: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
But a lot of that is new and rare in JS today.
> the ability to `join` two futures
`Promise.all` and `Promise.any` are the two most common "standard library" combinators. `Promise.all` is the most like Rust `join`.
There are also libraries with even higher-level combinators.
> manage their polling myself
Promises don't poll. JS lives in a browser-owned event loop. Superficially you are in a browser-provided "tokio"-like runtime at all times.
There are some "low-level" tricks you can pull, though in that the Promise abstraction is especially thin compared to Rust Futures. The entire "trait" that async/await syntax abstracts is just the "thenable pattern" in JS. All you need to make a new non-Promise Promise-like is create an object that supports `.then(callBack)` (optionally a second parameter for a catchCallback and/or a `.catch(callBack)`). Though the Promise constructor is also powerful enough you generally don't need to make your own thenable, just implement your logic in the closure you provide to the Promise constructor.
Similarly on the flipside if you need a more complex combinator than Promise.all, and the reason that some higher-level libraries also exist, you just have to build the right callbacks to `.then()` and coordinate what you need to.
It's generally recommended to stick with things like Promise.all, but low level tricks exist.
> I mean even if you say "I can do it", that's great, but how is it surprising?
I think what continues to surprise me is that it sometimes reads like a lack of curiosity for other languages and for the commonalities between languages. Any GC language is built on the same exact kind of building blocks as "lower level" languages. There is a learning curve involved in reasoning about a GC language, but I don't think it should seem like a steep one. The vocabulary has strong overlaps: value types and stack allocated; reference types and heap allocated; references and pointers. The intuitions of one often benefit the other ("this is a reference type, can I simplify what I need from it inside this loop to a value type or two to keep it stack allocated or would it make more sense to preallocate a pool of them?"). Just because you don't have access to the exact same kinds of low level tools doesn't mean that they don't exist or that you can't learn how to take what you would do with the low level tools and apply them in the higher level space. (Plus tools like C#'s Span<T> and Memory<T> work where the low level tools themselves are also starting to blur more together than ever before.)
It just takes a little bit of curiosity, I think, to ask that next question of "how does a GC language stack allocate?" and allowing that to lead you to more of the vocabulary. Hopefully, I've done an okay job in this post illustrating that.
You can achieve stupidly fast read/write operations if you do this right with a system that is shocking simple to reason about.
> Step 4: queue.json with an HA brokered group commit > The broker is stateless, so it's easy and inexpensive to move. And if we end up with more than one broker at a time? That's fine: CAS ensures correctness even with two brokers.
TBH this is the part that I think is tricky. Just resolving this in a way that doesn't end up with tons of clients wasting time talking to a broker that buffers their writes, pushes them, then always fails. I solved this at one point with token fencing and then decided it wasn't worth it and I just use a single instance to manage all writes. I'd again point to DeltaLake for the "good" design here, which is to have multiple manifests and only serialize compaction, which also unlocks parallel writers.
The other hard part is data deletion. For the queue it looks deadly simple since it's one file, but if you want to ramp up your scale and get multiple writers or manage indexes (also in S3) then deletion becomes something you have to slip into compaction. Again, I had it at one point and backed it out because it was painful.
But I have 40k writes per second working just fine for my setup, so I'm not worrying. I'd suggest others basically punt as hard as possible on this. If you need more writes, start up a separate index with its own partition for its own separate set of data, or do naive sharding.
reply