: Why is BingBot causing so many 404 errors by removing letters from url's? Since the ForceRecrawl problem went away Bing has come back with some new trends. I'm seeing many URLs missing the
Since the ForceRecrawl problem went away Bing has come back with some new trends.
I'm seeing many URLs missing the last letter, or missing a few letters, and some other wired guess URLs. It also looks like I'm not the only one.
I'm not getting these URLs from any other bot, and I regularly run a link checker over my site to check for dead links, so they're not coming from my pages. I wish Bing (and maybe all bots) would at least include one referrer in the request header to let us know where they got the link from (I know they might have more than one reference, but having one is a nice start).
I'm also having trouble understanding Bing's indexing strategy, they index about 25% the number of pages that Google indexes, then they suddenly throw half of them out and start building up again slowly.
Is Bing is trying to alter the URL's and see if it can navigate to pages by "guessing" URL's instead of harvesting them from the normal navigation mechanism? Maybe they cannot master parsing Javascript menu's? I don't know, but they are doing something crazy!
Slightly off topic, but a nice conspiracy theory: There's another bot called "Ezooms/1.0" that's doing something similar: it adds spaces behind dashes it finds in URLs. (I think it's always after the first dash in the URL). By comparing the patterns, I'd almost think these two bots have been written by the same developer (though the mysterious Ezooms has a gmail address added in the user agent string).
More posts by @Angela700
1 Comments
Sorted by latest first Latest Oldest Best
Have you looked at the Bing Webmaster Tools at all?
You can sign up for them and the process of claiming your domains is the same as for Google Webmasters Tools.
These can then give you a full list of crawl stats including links leading to 404s.
Note that if you've removed content you'll see 0 links becuase the not is requesting pages it knew about before.
Another possibilty is that these links are coming from on-page scripts: I had an issue on a site where we were building a link for an advert call in the JavaScript with some it rendered serverside. The bots would find this partial URL in the source and attempt to follow it.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.