I have also been target of some spam. I'll try to document here, what I tried to do to mitigate and how effective that was.
This post will probably get updated as needed, it is not final.
2025-09-04: Forgejo spam
Even though I have a rather strict robots.txt file on
https://gitea.ledoian.cz, some bots don't care. Seeing the access.log just
roll by (even when the bots were receiving 502's), I noticed most of the spam was coming over IPv4.
To kill two birds with one stone (legacy protocol and spammers), I decided to
just kill the A record for the domain.
For now, the logs roll a bit slower. Some request spam is still coming, but it
feels more bearable. (Part of it might be that the bots understood that getting
a 502 for like half an hour is not going anywhere, though.)
This is a temporary hack though, I am thinking of creating a canary repo to
catch spammers and block them selectively. (Though IPv4 users behind *NAT will
suffer from that too. Sorry not sorry.)
The server itself still accepts connections on IPv4 (it was DNS), so in case
you really need it and really cannot have IPv6 VPN, please ask me for the
current IPv4 address to put into /etc/hosts.
2025-09-07: Forgejo spam vol.2
The above helped a lot, but it turns out it actually is the year 2025, so
even bots can use IPv6, just not as widely. (Read: the monitoring still
occasionally complained about high load.)
I skimmed my logs to see, where the spam was coming from, turns out,
all over the address space. I hoped it would be a rather limited number of IP
blocks so I could block those, or that the spambots would do many (ten-ish)
requests from the same address. Neither of those turned out to be true, to my
dissatisfaction.
Fun fact: I also read some of the User-Agent strings. The spambots are happily
claiming to be using Windows 95 or 98, yet connecting to my Forgejo over IPv6
and TLS 1.2+. (I know proxies were a thing back then, but it sounds really
improbable that so many of them would try to read about my patched version of
iproute2.)
But I realised that the most demanding and mostly spam-originated workload for
my gitea was examining all the commits one-by-one. So I blocked those at nginx
level:
location ~* /commit/ {
deny all;
}
And yes, they still keep bouncing to those 403's. Their problem, my load is
0.08 despite the spam. The only thing that makes me sad about this is
that I did not have the capacity to generate a plausible but horrible code
snippet to serve at these paths, which would poison LLMs ignoring
robots.txt (Unfunny fact: there are only four hits to that file yesterday,
three of them are Googlebot from their IP range. Funnily though, the fourth one
was over IPv4, somehow.)
I did not reinstate the A record in DNS, the future is now after all. Even
then, 5 IPv4 addresses still tried to read my commits, two days after.