: Determining which sitemap entries not indexed by google We submit a sitemap to google and can see this being indexed in webmaster tools. We have 5140 entries, and have broken this up to 10
We submit a sitemap to google and can see this being indexed in webmaster tools. We have 5140 entries, and have broken this up to 10 child site maps of 500 each. This all seems to be working well.
Google, however is not indexing all entries of three of the child site maps (all others seem to be nicely indexed). The number of entries indexed has remained static for the last 6 weeks.
We would now like to determine which url's are not being indexed by google, to try and see if there is a content issue or otherwise.
Is there any way to determine which url's are not being added to the index other than manually go through all 500 entries using 'site' on google?
More posts by @Sent6035632
1 Comments
Sorted by latest first Latest Oldest Best
I will reply as an answer because I cant fit everything in a comment:
You could use a few tools, but I will describe the way I am most comfortable with, PHP + cURL + DOMDocument.
First you need to generate the search queries, so probably reading your sitemap.
$url = 'www.google.com/search?q=' . urlencode("site:".$sitemap_url);
(add more parameters to search url if needed)
Then you probably want to fake your useragent and set other cURL options if needed.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Your fake useragent string here');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$out = curl_exec($ch);
$dom = new DOMDocument(); @ $dom->loadHTML($output);
Now that you have Google search loaded in a DOMDocument, you can parse it and look if your url is actually present in the search. If it is, it's indexed.
Hope this helps
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.