Hacker Newsnew | past | comments | ask | show | jobs | submit | artee_49's commentslogin

Unintended side-effects are the biggest problems with AI generated code. I can't think of a proper way to solve that.


TLDR:

Google allows you set input long paragraphs and URLs into a field called "App name" and they then send you an email with the paragraph you entered in (malicious with phishing links) to your inbox. Since this is sent by Google, it's DKIM signed and passes DMARC so you can simply download the entire email and just send it as a raw email to other people and it'll continue to be signed and land in their inboxes.

The other thing is that with these we cannot change the "To" header in the email (not envelope TO (which is where email is delivered to) but rather what shows up in the "To" when the client renders the email) and so the attacker bought a domain that looks like it's google owned "(rand)goog-ssl.com". When looking at emails in your inbox ensure that the "To" is always valid along with the "From".


Even for senior levels the claim has been that AI will speed up their coding (take it over) so they can focus on higher level decisions and abstract level concepts. These contributions are not those and based on prior predictions the productivity should have gone up.


It would be different I'm sure if they were making contributions to repos they had less familiarity with. In my experience and talking with those who use AI most effectively, it is best leveraged as a way of getting up to speed or creating code for a framework/project you have less familiarity with. The general ratio determining the effectiveness of non-AI coding vs AI coding is the familiarity the user has with the codebase * the complexity of the codebase : the amount of closed-loop abstractions in the tasks the coder needs to carry out.

Currently AI is like a junior engineer, and if you don't have good experience managing junior engineers, AI isn't going to help you as much.


I’ve been doing this for a while now. Junior engineers are pretty near universally terrible when measured by short term ROI. The only reason you would ever want to pay a truly junior engineer is because you can teach them.

If someone told me “you can have a free junior engineer, but they get swapped out each week for a new person”, I’d say no thanks.

I’m sure someone could figure out a way to make money on in that situation, but it wouldn’t be by building anything I’d be comfortable attaching my name to or or would want to use myself


It does work that way, but IP reputation is a thing as well so you need to keep that in mind. IPs need to be "seasoned" and "trusted" as well as domains.

This is how email-as-infra works, you're sending from a shared pool of their ips and they sign your emails with DKIM and you'll have SPF set up as well on your own.


DKIM is not meant to block spam, it's meant to authenticate that the sender had access to the private key for the public key exposed on the domain that it was sent from, implying that the sender has sufficient permissions to send from the domain.

It should not be used to imply anything else, none of these have anything to do with spam, that's reputation (and yes, having DKIM set-up boosts your reputation but it is not sufficient) and should be "built" up by the domains sending the emails.


I think shuffle sharding is beneficial for read-only replica cases, not for writing scenarios like this. You'll have to write to the primary and not to a "virtual node". Right? Or am I understand it incorrectly? I just read that article now.


I think you'll have to pay a team millions to figure that out, it is unlikely to be a static rate but rather decided based on multiple traits like time of year, time of flight, distance of flight, cost of ticket, etc.


The airline has literally all of the data on this, they definitely do not have to pay a team millions.


They probably do pay millions of dollars in wages for business analysts to figure out what this rate is on their flights.


They probably just have an SSRS report that prints out in a few dozen offices automatically on some schedule.

I'm not trying to be pedantic but this is table stakes stuff. I know we're supposed to shy away from saying things like this but compared to the other engineering that airlines have to do, this is easy. It costs - at most, including wages - a few tens of thousands of dollars yearly to come up with these figures. It's a fraction of the salary of one United Airlines BA.[0] This cost might go up if one of the senior developers convinces their boss that this needs to be a machine learning model but unless they're resume pumping it's going to be at most PCA and a regression.

This is not a team of people working for months on this one thing.

[0] https://www.glassdoor.com/job-listing/analyst-revenue-manage...


I am a bit perplexed though as to why they have implemented fan-out in a way that each "page" is blocking fetching further pages, they would not have been affected by the high tail latencies if they had not done this,

"In the case of timelines, each “page” of followers is 10,000 users large and each “page” must be fanned out before we fetch the next page. This means that our slowest writes will hold up the fetching and Fanout of the next page."

Basically means that they block on each page, process all the items on the page, and then move on to the next page. Why wouldn't you rather decouple page fetcher and the processing of the pages?

A page fetching activity should be able to continuously keep fetching further set of followers one after another and should not wait for each of the items in the page to be updated to continue.

Something that comes to mind would be to have a fetcher component that fetches pages, stores each page in S3 and publishes the metadata (content) and the S3 location to a queue (SQS) that can be consumed by timeline publishers which can scale independently based on load. You can control the concurrency in this system much better, and you could also partition based on the shards with another system like Kafka by utilizing the shards as keys in the queue to even "slow down" the work without having to effectively drop tweets from timelines (timelines are eventually consistent regardless).

I feel like I'm missing something and there's a valid reason to do it this way.


I interpreted this as a batch write, e.g. "write these 10k entries and then come back". The benefit of that is way less overhead versus 10k concurrent background routines each writing individual rows to the DB. The downside is, as you've noted, that you can't "stream" new writes in as older ones finish.

There's a tradeoff here between batch size and concurrency, but perhaps they've already benchmarked it and "single-threaded" batches of 10k writes performed best.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: