Blocking the Semrush bots

It’s not enough to block just Semrush’s bot, SemrushBot, in one’s robots.txt, since they have a whole bunch of them.

We’ve usually allowed all bots but with “AI” and Semrush’s terrible programming that wasted hours of my time every day for nine months, there are certain parties no longer welcome to crawl.

To block Semrush but allow others, here’s what robots.txt needs to look like.
 
User-agent: *
Allow: /
 
User-agent: SemrushBot
Disallow: /
 
User-agent: SiteAuditBot
Disallow: /
 
User-agent: SemrushBot-BA
Disallow: /
 
User-agent: SemrushBot-SI
Disallow: /
 
User-agent: SemrushBot-SWA
Disallow: /
 
User-agent: SplitSignalBot
Disallow: /
 
User-agent: SemrushBot-OCOB
Disallow: /

 

We have put a rule into Cloudflare’s WAF to block any user-agent containing semrush as well, though as you see above, that doesn’t catch their SiteAuditBot or SplitSignalBot.

Semrush is honest enough to provide their list of bots at www.semrush.com/bot/.

We’ve slowed Ahrefs to a page an hour. It probably should be a page a day.
 
User-agent: AhrefsBot
Crawl-delay: 3600

 

I’ve no real desire to help either and have them drain our resources, especially as their users are in the “SEO” field and recent experience confirms that the majority appears to be unscrupulous.

We’ve put up a few more manual blocks at Cloudflare. We chose to do it manually rather than use Cloudflare’s blanket AI-bot block, since it caught Googlebot in the day we ran it, and, from what I can tell, it wasn’t for their “AI” purposes. Cloudflare expressly says it uses “AI” to determine the blocking rules, which doesn’t give me much hope, since these days we should be using less “AI”, not more. Humans need to be involved far more.
 
Cloudflare screenshot showing that a Semrush bot was blocked
 

Even in programming, Uplevel, a coding management software business, revealed that using Github’s “AI” programming assistant, Copilot, resulted in 41 per cent more errors being inserted into code. The one profession that “AI” adherents said would benefit from Approximate Imitation tools hasn’t.

File this “AI” nonsense with ‘Google Plus is a Facebook-killer’ and ‘Quora is a Facebook-killer’ and other claims made by the tech press over the years.
 
PS.: Notice the date of the Cloudflare screenshot above? October 6. But the SemrushBot block was present in robots.txt since the end of September. It seems they ignore it, and a more blanket ban at Cloudflare is the better way to go. A check of Cloudflare’s events today (October 7), after the full suite of user-agents was put into robots.txt, show that SemrushBot bypasses the file, thankfully to be thwarted by Cloudflare.
 
P.PS.: It’s now October 10 and they are still coming. Semrush claims that we need to wait ‘up to one hour or 100 requests for SemrushBot to discover changes made to your robots.txt’ but that’s not true:
 
Cloudflare screenshot showing that a Semrush bot was blocked


You may also like




Leave a Reply

Your email address will not be published. Required fields are marked *