"a version of Python is downloaded over 300 million times per day." If true, thi...

marginalia_nu · on April 11, 2023

No doubt the vast majority of those downloads are from CI systems and build scripts executing on every commit.

dsr_ · on April 11, 2023

Which is insane. Local caches should be ubiquitous, along with lightweight validation checks.

marginalia_nu · on April 11, 2023

Yeah. It's quite a problem.

Although I think in many cases, the people doing this aren't aware it's happening. It's a git action kicking off some jenkins agent somewhere in a kubernetes container on a virtualized server that was set up and forgotten about based on a patchwork of online tutorials.

pmontra · on April 11, 2023

Probably everybody knows that each test run and build is downloading GB of data but they're doing it quickly, they don't cost much money or none at all, and it's easier to do it than setting up a local cache and use it (CI, local dev machines,) etc. The only reason I ever saw some optimization at that level was because building the base image took too long, so we were saving one and we were rebuilding it only when dependencies changed. I can't remember the details.

dp-hackernews · on April 11, 2023

Try doing that in am air-gapped environment and you'll soon learn about the importance of local artifact caches and caching proxies with tight retrieval policies, or having downloads gated for review... ;-)