: Why is Google Webmaster Tools crawling invalid URLS and showing 500 errors? Google Webmaster tools is reporting 12k+ 500 errors. Eeek! None of the URLS are valid- they all contain www.youtube.com.
Google Webmaster tools is reporting 12k+ 500 errors. Eeek!
None of the URLS are valid- they all contain youtube.com. First, why is Google crawling these URLS if they don't exist? I supplied a sitemap, and they are of course not in the sitemap.
I don't have a robots.txt blocking anything. I've checked for invalid redirects--none, and checked for unclosed tags or something that would throw youtube.com into the URL by accident--none.
In every 'linked from', the referring URL is also a bad URL, with youtube.com in it. The Google Tools report no malware, and I can't check the server logs because the host won't give me access.
Really stuck!! Any ideas appreciated!
More posts by @Shelton105
2 Comments
Sorted by latest first Latest Oldest Best
Google is indexing site not immediately all pages at once.
Google indexing pages firstly highest level.
Then after few days Google trying to index deeper - second level of pages (the pages, on which Google found links on first level of pages), and so on.
In this way Google tries to index each page on site.
So Google creates hierarchical tree of links and Google knows what pages are linked to each page.
Then Google came to each indexed page after some time and checks if content on page is changed. Interval of indexing for each page and each site is based on many factors.
So if you delete some page and updated all links to this page on all other pages - Google does not know it immediately and it tries to index deleted page because it is planned to index this page in its schedule.
There are (at least) two common reasons why strange and mangled URLs may show up as crawl errors in Webmaster Tools.
The first possibility is that someone has copied your pages (or some other pages that link to yours) and mangled the links in the process. This happens more often than you might think; see e.g. the sixth question in this Google Webmaster blog post.
The other possibility is that Googlebot itself is trying to follow what it thinks are JavaScript links and making a mess of it. You can usually tell these two cases apart by visiting the referring page (which should exist and be accessible, if Google managed to crawl it to begin with) and looking for the name of the target page in its source.
Either way, there are basically two things you can do: either just ignore the links, or come up with some rewrite rules to try and map the broken URLs into working ones. If you can see an obvious pattern in the URLs, and are familiar with regexps, I'd recommend the latter approach — it'll clean up your crawl error list and maybe even give you a small and rather cheesy, but real, PageRank boost.
A third option, if you find that someone's been copying your content without permission, is to try and get them delisted. You could even send a complaint (and/or a formal takedown request) to their hosting provider, if you believe it justified. Of course, given that they are apparently linking back to your site, you might not necessarily find that worth the effort.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.