I really don't get that. I look at pictures I've taken in a digital world and I'm moved, just as I am when I see pre-digital pictures. Perhaps older images are sometimes "more special" but that's an artifact of the distance between who I was then vs now. Why would I stop feeling an emotional attachment to photos just because I have many? I really can not understand this at all.
That is presumably a very expensive endeavor. We already have hardware that attempts to mitigate this and while I think it's possible for the government it's certainly not trivial.
This is very stupid. No one wants this. People don't like false sincerity. Even when we know that it's someones job to be nice, we appreciate when it feels genuine.
If you want people to genuinely be nice, give them reasons. Make them happy. Help them stay motivated. Otherwise you cheapen "please" and "thank you" even more than is already the case and get zero value out of it because no one will appreciate it knowing that it's forced.
A world where everyone says "please" and "thank you" isn't a better world.
I am always kind of surprised when I go to a landing page for a language and there isn't any actual code. This is one of my biggest complaints about the rust language page, it feels crazy to me that there's no code and I think this is just a ridiculous choice (and I know this has been brought up before).
The old page had a built-in sandbox. Go used to have a more "Front and center" sandbox too but at least it's there if you scroll down https://go.dev/
marketing isn't concerned, it's an experiment in programming languages. the attention of someone who needs to eyeball the syntax before they understand how to read it has zero value to a project like this.
i'm really not trying to be snarky or anything, but right at the top of the om page it describes the language as concatenative and homoiconic. without searching that or asking an llm, do you know what those terms mean? or what fold is?
could be there's nothing wrong with the page and you're really just not the audience for it. hacker news has many currents, most of which don't interest me, and that's fine, i don't feel the need to weigh in on everything.
As is clearly explained on the web page, this is not a programming language for everyday tasks, it's an early stage proof of concept that can be used to explore how computer science might be expressed in unusual ways.
Implementing fold would be something of a milestone in such a language.
> refusing to accept writes where content has changed between the read and write?
Right. You can issue a write that will only be accepted if a condition is matched, like the etag of the object matching your expectation. If it doesn't match, your object was invalidated.
This is the hardest part because you can easily end up in a situation like you're describing, or having large portions of clients talking to a server just to have their writes rejected.
Further, this system (as described) scales best when writes are colocated (since it maximizes throughput via buffering). So even just by having a second writer you cut your throughput in ~half if one of them is basically dead.
If you split things up you can just do "merge manifests on conflict" since different writers would be writing to different files and the manifest is just an index, or you can do multiple manifests + compaction. DeltaLake does the latter, so you end up with a bunch of `0000.json`, `0001.json` and to reconstruct the full index you read all of them. You still have conflicts on allocating the json file but that's it, no wasted flushing. And then you can merge as you please. This all gets very complex at this stage I think, compaction becomes the "one writer only" bit, but you can serve reads and writes without compaction.
Currently there's nothing in my headers, but the footer is straightforward. There's the schema, row group metadata, some statistics, byte offsets for each column in a group, page index, etc. It's everything you'd want if you wanted to reject a query outright or, if necessary, query extremely efficiently.
min/max stats for a column are huge because I pre-encode any low-cardinality strings into integers. This means I can skip entire row groups without every touching S3, just with that footer information, and if I don't have it cached I can read it and skip decoding anything that doesn't have my data.
Footers can get quite large in one sense - 10s-100s of KB for a very large file. But that's obviously tiny compared to a multi-GB Parquet file, and the data can compress extremely well for a second/ third tier cache. You can store 1000s of these pre-parsed in memory no problem, and store 10s of thousands more on disk.
I've spent 0 time optimizing my footers currently. They can get smaller than they are, I assume, but I've not put much thought. In fact, I don't have to assume, I know that my own custom metadata overlaps with the existing parquet stats and I just haven't bothered to deal with it. TBH there are a bunch of layout optimizations I've yet to explore, like using headers would obviously have some benefits (streaming) whereas right now I do a sort of "attempt to grab the footer from the end in chunks until we find it lol". But it doesn't come up because... caching. And there are worse things than a few spurious RANGE requests.
I haven't. I'm sort of aware of it but I guess I prefer to just have tight control over the protocol/ data layout. It's not that hard and it gives me a ton of room to make niche optimizations. I doubt I'd get the same performance if I used it, but I could be wrong. Usually the more you can push your use case into the protocol the better.
Like most managed services it is a trade off of control vs ease of operation. And like everything with S3 it scales to absurd levels with 10,000 tables per table bucket
You can, and it's actually great if you store little "headers" etc to tell you those offsets. Their design doesn't seem super amenable to it because it appears to be one file, but this is why a system that actually intends to scale would break things up. You then cache these headers and, on cache hit, you know "the thing I want is in that chunk of the file, grab it". Throw in bloom filters and now you have a query engine.
reply