Hacker Newsnew | past | comments | ask | show | jobs | submit | staticassertion's commentslogin

I really don't get that. I look at pictures I've taken in a digital world and I'm moved, just as I am when I see pre-digital pictures. Perhaps older images are sometimes "more special" but that's an artifact of the distance between who I was then vs now. Why would I stop feeling an emotional attachment to photos just because I have many? I really can not understand this at all.

That is presumably a very expensive endeavor. We already have hardware that attempts to mitigate this and while I think it's possible for the government it's certainly not trivial.

These somethings are doing so much work I can't tell if you're agreeing with them or not tbh.

This is very stupid. No one wants this. People don't like false sincerity. Even when we know that it's someones job to be nice, we appreciate when it feels genuine.

If you want people to genuinely be nice, give them reasons. Make them happy. Help them stay motivated. Otherwise you cheapen "please" and "thank you" even more than is already the case and get zero value out of it because no one will appreciate it knowing that it's forced.

A world where everyone says "please" and "thank you" isn't a better world.


> People don't like false sincerity.

Sad thing is, probably BK reckon its average customer is dumb enough to mistake it for genuine.

Sadder thing is, probably they're correct.

And if not, well, this latest move will help make it so.


I am always kind of surprised when I go to a landing page for a language and there isn't any actual code. This is one of my biggest complaints about the rust language page, it feels crazy to me that there's no code and I think this is just a ridiculous choice (and I know this has been brought up before).

The old page had a built-in sandbox. Go used to have a more "Front and center" sandbox too but at least it's there if you scroll down https://go.dev/


> I am always kind of surprised when I go to a landing page for a language and there isn't any actual code.

So, you're not surprised that this Om page has an extensive section called "Examples", right? https://www.om-language.com/#language__examples__


I didn't scroll that far, and I shouldn't have to.

One time, this annoyed me so much that I made a website.

https://anaminus.github.io/langding/

om would fall under "Yes, must scroll".


Fascinating! It almost seems like the more popular a language is the less likely it is to have syntax on the landing page.

Popular languages don’t have to sell themselves anymore. No one goes to rust or pythons website to see if they would enjoy the syntax

There is code. Small examples start halfway down the page, and there's one 20-line example. Not much, but it's not accurate to say there's none.

It would be helpful to see any kind of motivation for the project though. Anything at all.


On my phone that code is about 250+ lines down, probably 4-5 screens down.

It basically doesn't exist as far as marketing is concerned.


marketing isn't concerned, it's an experiment in programming languages. the attention of someone who needs to eyeball the syntax before they understand how to read it has zero value to a project like this.

So it just needs a TOC.

No, it needs a 5 line code snippet above the fold.

splashing code examples at the audience encourages superficial assessments of the language.

There is code, search for 'examples'.

It concludes by implementing a fold:

   define
   {
       [Fold]<- {
           rearrange
           {
               rearrange
               {
                   dequote
                   choose
                   quote Result
                   pair pair pair {[Fold]<-} Function Result Remainder
                   Remainder
               }
               {Result Remainder}
               dequote Function Base <-[terms] Source
           }
           {Function Base Source}
        }
   }
   {
       [Fold]<- {[literal]<-} {} {1 2 3}
   }

great example! as someone who writes a Fold function every day, this explains the power of the language very well. ;)

i'm really not trying to be snarky or anything, but right at the top of the om page it describes the language as concatenative and homoiconic. without searching that or asking an llm, do you know what those terms mean? or what fold is?

could be there's nothing wrong with the page and you're really just not the audience for it. hacker news has many currents, most of which don't interest me, and that's fine, i don't feel the need to weigh in on everything.


As is clearly explained on the web page, this is not a programming language for everyday tasks, it's an early stage proof of concept that can be used to explore how computer science might be expressed in unusual ways.

Implementing fold would be something of a milestone in such a language.


> refusing to accept writes where content has changed between the read and write?

Right. You can issue a write that will only be accepted if a condition is matched, like the etag of the object matching your expectation. If it doesn't match, your object was invalidated.


Doesn't using TrustedTypes basically do that? I'm not really web-y, someone please correct me if I'm off.

Yup, this is basically what TrustedTypes is for!

This is the hardest part because you can easily end up in a situation like you're describing, or having large portions of clients talking to a server just to have their writes rejected.

Further, this system (as described) scales best when writes are colocated (since it maximizes throughput via buffering). So even just by having a second writer you cut your throughput in ~half if one of them is basically dead.

If you split things up you can just do "merge manifests on conflict" since different writers would be writing to different files and the manifest is just an index, or you can do multiple manifests + compaction. DeltaLake does the latter, so you end up with a bunch of `0000.json`, `0001.json` and to reconstruct the full index you read all of them. You still have conflicts on allocating the json file but that's it, no wasted flushing. And then you can merge as you please. This all gets very complex at this stage I think, compaction becomes the "one writer only" bit, but you can serve reads and writes without compaction.

https://doi.org/10.14778/3415478.3415560

Note that since this paper was published we have gotten S3 CAS.

Alternatively, I guess just do what Kafka does or something like that?


S3 Select is, very sadly, deprecated. It also supported HTTP RANGE headers! But they've killed it and I'll never forgive them :)

Still, it's nbd. You can cache a billion Parquet header/footers on disk/ memory and get 90% of the performance (or better tbh).


Caching Parquet headers/footers sounds super interesting. Can you say more about how you implemented it?

Currently there's nothing in my headers, but the footer is straightforward. There's the schema, row group metadata, some statistics, byte offsets for each column in a group, page index, etc. It's everything you'd want if you wanted to reject a query outright or, if necessary, query extremely efficiently.

min/max stats for a column are huge because I pre-encode any low-cardinality strings into integers. This means I can skip entire row groups without every touching S3, just with that footer information, and if I don't have it cached I can read it and skip decoding anything that doesn't have my data.

Footers can get quite large in one sense - 10s-100s of KB for a very large file. But that's obviously tiny compared to a multi-GB Parquet file, and the data can compress extremely well for a second/ third tier cache. You can store 1000s of these pre-parsed in memory no problem, and store 10s of thousands more on disk.

I've spent 0 time optimizing my footers currently. They can get smaller than they are, I assume, but I've not put much thought. In fact, I don't have to assume, I know that my own custom metadata overlaps with the existing parquet stats and I just haven't bothered to deal with it. TBH there are a bunch of layout optimizations I've yet to explore, like using headers would obviously have some benefits (streaming) whereas right now I do a sort of "attempt to grab the footer from the end in chunks until we find it lol". But it doesn't come up because... caching. And there are worse things than a few spurious RANGE requests.


Have you tried AWS s3 tables which is a manged iceberg service?

I haven't. I'm sort of aware of it but I guess I prefer to just have tight control over the protocol/ data layout. It's not that hard and it gives me a ton of room to make niche optimizations. I doubt I'd get the same performance if I used it, but I could be wrong. Usually the more you can push your use case into the protocol the better.

Like most managed services it is a trade off of control vs ease of operation. And like everything with S3 it scales to absurd levels with 10,000 tables per table bucket

Makes sense and tbh there's a very good chance that I'd consider it if I were trying to stay more "standard" but I'd have to learn more.

Wow I didn't know that. To be fair now that S3 tables exists it is rather redundant.

You can, and it's actually great if you store little "headers" etc to tell you those offsets. Their design doesn't seem super amenable to it because it appears to be one file, but this is why a system that actually intends to scale would break things up. You then cache these headers and, on cache hit, you know "the thing I want is in that chunk of the file, grab it". Throw in bloom filters and now you have a query engine.

Works great for Parquet.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: