Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Easier said than done, I have 700k requests from bots in my access.log coming from 15k different IP addresses.

:: ~/website ‹master*› » rg '(GPTBot|ClaudeBot|Bytespider|Amazonbot)' access.log | awk '{print $1}' | sort -u | wc -l

15163



    map $http_user_agent $uatype {
            default             'user';
            ~*(googlebot|bingbot) 'good_bot';
            ~*(nastybot|somebadscraper) 'bad_bot';
        }
You can also do something like this to rate limit instead of by IP address. Making all ‘bad_bots’ limited but not ‘good_bots’.

I’m not dismissing the difficulty of the problem but there are multiple vectors that can identify these ‘bad_bots’.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: