Author here. In the article I highlighted how easy it is now to translate OCaml line-by-line into other languages while at the same time preserving exact semantics. That was the cool news that deserved its own callout. But as you noticed I didn't spend any time talking about how translating OCaml helps solve the problem of managing types "all with slightly different variations".
Briefly: atd, Cap n' Proto, Protobuf and many others occupy the role of representing the same datatype in different languages and transmitting values of that datatype across the wire and/or persisting in storage. But I'm finding most programs have a need for multiple variations to that core record datatype. For example, when you persist an OCaml/Java/etc record datatype to a database, you don't use the OCaml/Java/etc record datatype. You use the data types provided by the database (VARCHAR, BLOB, etc). You can adopt an ORM solution, use an object database, or write a pair of type-conversion functions from OCaml/Java/etc to your database datatype. That is one example of a "variation" that you must spend engineering time on.
I mentioned in the article that my user needed 5 different variations. For example, with the sqlite3 database they were using, the database table variation required writing a pair of datatype-conversion functions to and from the sqlite3 database. What if there was a way where we could write these datatype-conversion functions in one place (cough cough: OCaml with a line-by-line semantics preserving feature) and have bindings to/from the database, the view layer, the controller and all the other variations generated in the appropriate Java/Swift/SQL/etc languages? And what if we could deploy modifications to that datatype (ex. adding a database field) in the same uniform way we do all other minor variations to a datatype? That is the "drastic QOL (quality of life) improvement" win my users asked for.
That QOL win deserves its own article. Yes, I'll post it when done.
> And what if we could deploy modifications to that datatype (ex. adding a database field) in the same uniform way we do all other minor variations to a datatype?
If you mean propagating changes to a schema across all layers of the stack, that may not necessarily be a good idea. In my experience we want to keep the datatype definitions at different layers somewhat independent of each other, to allow evolving separately. For example, the database table could have a `more JSON` column to allow adding data that was not foreseen in the original table design; but the application's datatype would decode that JSON into specific fields to show the user perhaps.
I'd consider that an extension of the "object database" approach mentioned. Just as you mention the application needs to decode (and encode and deal with missing fields) that JSON. And probably in any sufficient size of application you'd have to do this decode/encode in more than one place. That JSON encoding/decoding is a "variation" (more explicitly, a reshaping operation). The engineering costs of having variations strewn across your source code hasn't disappeared.
Using page rank as suggested by OP, would provide weird incentives like frequent comments, or higher volume, could rank a person higher than if they had occasional comments further down the thread. If OP is interested in the most influential commenters, people who write frequently and have a posse, then PR would be a good way to do it. What would happen to a helpful comment from a throwaway account?
If there's more of a ranking algorithm, identifying who is most likely to have actually read the article could be neat.
What would happen is that the old accounts would accumulate "CommentRank" and the new users would always be sorted to the bottom. This is good for search, since we are looking for "the" answer, the site, a limited number of best pages for a topic etc., but it's quite bad for discussion; a bad take from Animats would forever be above a good comment from a greenhorn.
I thought about it while writing this post. It would be useful to have a decay factor for PageRank. In simple words, if someone wants to maintain their PR score, they need to actively engage in the HN community.
> if someone wants to maintain their PR score, they need to actively engage
Wouldn't that encourage people to "actively engage" even when they have nothing to say just to keep their score?
Goodhart's law comes into mind here. People who want to stay relevant on this site will take your measure as their target. There's no system that can't be gamed if you know the rules. But the more "rock solid" you try to make those rules, the more you embed intrinsic discrimination between participants. You put "Proof of Work" in the rules and you alienate newcomers while all the rest do "busywork".
Your own comments and submissions would sink like a stone because you didn't engage enough, regardless of how interesting your points are.
I think this part of the post answers your question.
> Will I ever publicly write about how HN ranks posts if I am Dang (HN moderator)? No, because Pagerank can be manipulated by people despite its reputation. In fact, Pagerank is being exploited for years. Moreover, there are financial incentives for companies to get on the first page in HN.
> Will I ever publicly write about how HN ranks posts if I am Dang (HN moderator)? No
Lack of transparency is not a solution, and certainly not one that will sell well in a place where "security by obscurity" is disdained and open source cherished. It's particularly bad when you're dealing with a high profile site filled with intelligent people who would most likely make short work of reverse engineering just enough of the algorithm to understand how it works.
You can put epicycle around epicycle, but there's a lot of computational efficiency in a score which can be calculated from an integer and a timestamp.
"…would provide weird incentives like frequent comments, or higher volume, could rank a person higher than if they had occasional comments further down the thread."
I know I'll be out of step with most who comment on HN but I question the necessity of having a ranking/voting system at all. All it seems to do is to act as an ego booter for commentators and it messes up orderly discussion when people insist on posting replies to the first commentator.
As a news aggregator, no doubt HN needs some method of ranking stories to set their listing lifetimes and their importance (page ranking) but that could be accomplished by the number of clicks a storey receives and or how long a visitor/commentator stays on a story (or the time between link clicks).
I also question aspects of the voting system as currently implemented. It's not unusual for me to receive down-votes on comments to controversial stores but not receive a comment from the down-voter. Why bother with this system if the voter hasn't the gumption to say why he/she disagrees with one's views? (One could discuss the psychology and social philosophy behind this but here is not the place).
Another annoyingly aspect of the voting system is the lack of stats. The time I take from posting to checking to see if someone has replied varies wildly—from minutes to days to never. This wide variation is significant when I post to controversial stores. Days later I can check back to see if there's been any votes or replies only to find there has been none.
At other times, I've monitored the incoming votes almost in realtime and watched them oscillate around 0 or 1, that is I've received many up and down votes but the long-term average is zero. Come back and check days later and one hasn't a clue from the stats if anyone at all has actually read one's comment let alone the fact that it was equally controversial to both sides of a polarized audience. HN should allow users access to these long-term stats.
Not having an explicit ranking system means that things are ranked by submission time. This too incentivizes certain behavior (frequent posting). Maybe one can get around that with certain guardrails like rate limiting submissions/comments but that's yet another can of worms to open.
I agree with you mostly (and will give you an upvote). I think we've seen enough to understand that upvotes and downvotes are dolled out based on how well a comment agrees or disagrees with the voters preconceived notion of whatever the topic is.
I also question the concept of 'hiding' comments. Are my eyes so delicate that I can't see certain comments by default? It seems like just a feature that exists to bias subsequent viewers against comments.
It'd be really interesting to see a system where voters have both one upvote/downvote for the submission and the ability to upvote one comment and downvote one comment.
I believe hidden comments can be seen if you enable showdead, and if you do so you'll learn why they are hidden (they add virtually nothing to discussions).
I see where you included metrics as a starting point like how long an article takes to read. Maybe like a word count approach as a starting point.
Thinking about understanding the content of an article and seeing who's most likely to be referencing that content in their post is more of what I had in mind.
What happens on a slightly different task where domain experts have tried to create a set of topics, not all domain experts talk to each other, and so we instead need a way to merge existing topics? I continue to see benchmarks where human expertise significantly outperforms AI on common sense reasoning tasks (most recently https://arxiv.org/abs/2112.11446).
What about an approach using directed acyclic graphs and entities?
In traditional qualitative research, you'd usually have a bunch of experts get together and figure out a set of topics (or import and adapt a set of topics from similar work) before you go about classifying the bulk of your data.
It's a good point. Terrorizing an urban population with $200 drones doesn't seem like an unrealistic scenario. How would a tool like the F-35 be used to combat this situation? I think it would be fairly useless.
Yes, we aren't going to risk $150M+ planes at low altitude trying to shoot $200 drones with a gun that only carries a few seconds of ammo, or try destroying them with million dollar air to air missiles.
The F35 is entirely unsuited for low altitude operations, or dog fighting with other fighter planes. It's only good for hiding miles away at higher altitudes and firing missiles.
He said a $200 drone, which would be something similar to the tiny, low-flying commercial offerings. You couldn't take that out with an F-35 without destroying an entire city block along with it.
This makes me think of Liars Poker. There may not be anything special about what the traders are doing. They could just be in the right place at the right time.
Security. The alternative might be to let them make those choices then arrest them later. Or hit them with a drone strike. The filtering approach seems more compassionate.
https://atd.readthedocs.io/en/latest/atdgen.html