Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Dataset Cleaning Tips
3 points by ratsimihah on April 7, 2019 | hide | past | favorite | 1 comment
Hello! I am building a custom dataset in the style of CIFAR-10, so I have about 60k images. As of now I'll probably be cleaning the dataset manually be deleting non-relevant images, cropping, etc, in addition to automatic preprocessing like resizing images and so on.

Do you have any tip or best practice for the manual cleaning up? Is using something like Amazon Mechanical Turc doable/viable?

Thanks!



https://arxiv.org/abs/1902.10811

There's some description in here you may find relevant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: