Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The early layers of the network detecting simple features like lines, curves, textures, shapes, colors, etc

This is pretty common, especially with convolutions, but is not guaranteed. How you embed matters a lot. For example, there are transformers that use early convolutions for the embeddings and that makes them just work. Though too many and they are less performant (ViT tried pre-resnets which wasn't great). [0] also investigates transfer learning in medical domains and shows that CNNs depends more on statistics reuse and transformers depend more on feature reuse.

> If you've seen a paper that verifies this I'd love to see it.

[0] also discusses this. When it does and doesn't work on medical domains. Basically any paper that discusses transfer learning will also discuss the limitations. But note that there is a bias towards results that work. [1] also shows some of these results, where Imagenet pretraining helps and doesn't (and references others doing the same). Note in Figures 2 and 3 how InceptionV{3,4} and MNASNet have higher performance without pretraining (Fig 4 is a summary). So this shows in part of what I was saying that it isn't always about dataset either. You have a coupled problem that is hard to disentangle. There's also plenty of papers that try to say that LLMs are good at learning vision classification and never get past 50/60% accuracy (or worse) on ImageNet. Lots of scientific papers will also just straight up train from scratch and not mention transfer learning because it just didn't work for them, but you'd need to physically talk to these people as it isn't in their papers.

> FWIW I have not yet seen a model starting from a pre-trained COCO checkpoint that does worse than random initialization.

Additionally ImageNet performance doesn't correlate 1-to-1 with how well it works as a backbone in object detection and segmentation.

As another note, I would often say to be careful with pretrained models. The vast majority of papers are using test accuracy to hyper-parameter tune. So you're leaking knowledge into your model. I think this is mostly caused by reviewer benchmarkism (desk reject if you aren't SOTA) so bad practices become standard.

[0] https://arxiv.org/abs/2203.01825

[1] https://arxiv.org/abs/2101.06871



Thanks! Def going to check out those papers.


Yeah I do want to make clear that transfer learning frequently works. I think my initial comment probably comes off too strong (just reeling from terrible and unproductive reviewers who are reject happy). But there is a common belief that you never need to train from scratch and that's what I'm really trying to counter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: