What if we start throtling them so we make them waste time? Like, we could throttle contiguous requests, so if anyone is hitting the server aggresively they’d get slowed down.
The tricky bit is recognizing that the requests are all from the same source. Often they use different IP addresses and to even classify requests at all you have to keep extra state around that you wouldn’t need without this anti-social behavior.
What if we start throtling them so we make them waste time? Like, we could throttle contiguous requests, so if anyone is hitting the server aggresively they’d get slowed down.
https://zadzmo.org/code/nepenthes/
They can just interleave requests to different hosts. Honestly, someone spidering the whole Web probably should be doing that regardless.
The tricky bit is recognizing that the requests are all from the same source. Often they use different IP addresses and to even classify requests at all you have to keep extra state around that you wouldn’t need without this anti-social behavior.