: How can I find the source of this broken link to our site? In our error logs we have quite a few (~ 10s / day) internal recursion errors caused by googlebot. Here is a typical example:
In our error logs we have quite a few (~ 10s / day) internal recursion errors caused by googlebot. Here is a typical example:
[Sun Jul 03 10:58:22 2011] [error] [client goo.gle.ipa.ddr] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
Looking in the access log, the offending GET is always looking for a broken URL with this format:
ourdomain.com/wp-content/uploads/2007/11/image-name.jpg%22%20width=%2239%22%20height=%2250%22%20alt=%22image%22/%3E%3C/a%3E%3Ca%20href=%22/m/imgres?q=some+query+here
Undoing the URL encoding to make it more readable:
ourdomain.com/wp-content/uploads/2007/11/image-name.jpg width="39" height="50" alt="image"/></a>< href="/m/imgres?q=some+query+here
This looks like a broken fragment of some HTML that was displaying the image as part of a hyperlink. Looks like it might have been part of some sort of search result.
We don't have any links of this format anywhere in our site, so I can only assume this is a reference from somewhere else that is either misinterpreted by googlebot, or is just broken in the page markup.
There is no entry for the referrer in the Apache access_log. I have tried to find the source in Google webmaster tools, but although the unreachable is noted, there is no mention of the source. I've also tried to use Google to find the reference without success. I do notice that the "imgres" in the href fragment is used in Google image search, but the rest of the fragment isn't consistent with this.
My questions:
Should I bother with this or just let these recursion errors happen (10s of them / day)?
If I need to bother, any suggestions on how to find their source?
Thanks.
More posts by @Bryan171
3 Comments
Sorted by latest first Latest Oldest Best
This happens a lot. I get hundreds of links with HTML entities replacing chars too.
Google Webmaster Tools will let you know where and when the link was found 99% of the time. Just look under Diagnostics -> Crawl Errors
There will be a table listing URLs and the number of pages (3rd column) that links to them. Click on where it says 'X pages' and that will list you the X pages where that link is found and when the link was encountered last by Googlebot.
Try doing a search on Google of type:
link:<url of image>
That 'should' help you locate what is linking to the image.
Maybe you need to run HTTrack on your website with it set to download everything and follow all links within the site:
www.httrack.com/
You will then be able to determine from your access log whether your site (although not having the offending link) is actually generating the problem links. The problem links not necessarily being the one Google Image Search is wanting, but some weird recursion going on in your Wordpress when it 404s on them.
You might also want to put in some .htaccess rules to tidy up your GoogleBot problem. If you send 410 or 301 instead of 404 then the Google Image search results for your site should get tidier next time it crawls your site.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.