Mobile app version of vmapp.org
Login or Join
Margaret670

: Why is AhrefsBot requesting a page that's been removed from my website? I was reviewing the logs of my website (WordPress), and I saw a line like this : myWebsite:80 5.10.83.28 - - [17/Jan/2014:09:05:53

@Margaret670

Posted in: #WebCrawlers #Wordpress

I was reviewing the logs of my website (WordPress), and I saw a line like this :

myWebsite:80 5.10.83.28 - - [17/Jan/2014:09:05:53 +0000] "GET myUrl == HTTP/1.1" 404 5716 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)"


So a bot called AhrefsBot was visiting myUrl.

The problem is that I removed the page myUrl weeks ago.
So why I am seeing this bot still requesting it?

How did it find the URL myUrl, especially when I'm sure that there are no pages that link to it? And how do I avoid these kind of 404 pages?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Margaret670

2 Comments

Sorted by latest first Latest Oldest Best

 

@Angela700

There are a few possible reasons why a bot would try to visit a removed page:


The bot followed a link to that page from another website. Bots frequently omit referrer so it is difficult to tell if this is the case. Given that the bot in question has "backlink checker" as part of its tagline, this seems a likely cause.
The bot had visited the page while it existed and was recrawling based on its own database rather then fresh discovery. This is, again, common enough. When it encounters a 404 it should drop it from its database.
There is actually still a link somewhere on your site and you just missed it.
The bot made an error when doing link analysis. Most bots use various heuristics to find URLs in JavaScript and such. These tend to give a fair degree of false positives and can lead bots to crawling pages that never existed. You don't say what "myUrl" is so it's hard to judge if that is the case here.


Bots behavior usually depends on factors that you can't see and thus will often not appear to you as entirely rational. There is no way of absolutely preventing them from triggering 404s.

10% popularity Vote Up Vote Down


 

@Megan663

There are 2 possible reasons:


Your sitemap.xml still contains this URL. Find it and remove.
Some page of your site contains this URL as a link. Crawl your site with a web crawler in order to find the link and then delete it.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme