LEdoian's Blog

Fight with spam

I have also been target of some spam. I'll try to document here, what I tried to do to mitigate and how effective that was.

This post will probably get updated as needed, it is not final.

2025-09-04: Forgejo spam

Even though I have a rather strict robots.txt file on https://gitea.ledoian.cz, some bots don't care. Seeing the access.log just roll by (even when the bots were receiving 502's), I noticed most of the spam was coming over IPv4.

To kill two birds with one stone (legacy protocol and spammers), I decided to just kill the A record for the domain.

For now, the logs roll a bit slower. Some request spam is still coming, but it feels more bearable. (Part of it might be that the bots understood that getting a 502 for like half an hour [1] is not going anywhere, though.)

This is a temporary hack though, I am thinking of creating a canary repo to catch spammers and block them selectively. (Though IPv4 users behind *NAT will suffer from that too. Sorry not sorry.)

The server itself still accepts connections on IPv4 (it was DNS), so in case you really need it and really cannot have IPv6 VPN, please ask me for the current IPv4 address to put into /etc/hosts.

2025-09-07: Forgejo spam vol.2

The above helped a lot, but it turns out it actually is the year 2025, so even bots can use IPv6, just not as widely. (Read: the monitoring still occasionally complained about high load.)

I skimmed my logs to see, where the spam was coming from, turns out, all over the address space. I hoped it would be a rather limited number of IP blocks so I could block those, or that the spambots would do many (ten-ish) requests from the same address. Neither of those turned out to be true, to my dissatisfaction.

Fun fact: I also read some of the User-Agent strings. The spambots are happily claiming to be using Windows 95 or 98, yet connecting to my Forgejo over IPv6 and TLS 1.2+. (I know proxies were a thing back then, but it sounds really improbable that so many of them would try to read about my patched version of iproute2.)

But I realised that the most demanding and mostly spam-originated workload for my gitea was examining all the commits one-by-one. So I blocked those at nginx level:

location ~* /commit/ {
        deny all;
}

And yes, they still keep bouncing to those 403's. Their problem, my load is 0.08 despite the spam. The only thing that makes me sad about this is that I did not have the capacity to generate a plausible but horrible code snippet to serve at these paths, which would poison LLMs ignoring robots.txt (Unfunny fact: there are only four hits to that file yesterday, three of them are Googlebot from their IP range. Funnily though, the fourth one was over IPv4, somehow.)

I did not reinstate the A record in DNS, the future is now after all. Even then, 5 IPv4 addresses still tried to read my commits, two days after.


[1]In the meantime and while writing this, I forgot where the correct terminal for the server is, so it took me a while to enable my forgejo again.