Mobile app version of vmapp.org
Login or Join
Sent6035632

: Determining which sitemap entries not indexed by google We submit a sitemap to google and can see this being indexed in webmaster tools. We have 5140 entries, and have broken this up to 10

@Sent6035632

Posted in: #Sitemap

We submit a sitemap to google and can see this being indexed in webmaster tools. We have 5140 entries, and have broken this up to 10 child site maps of 500 each. This all seems to be working well.

Google, however is not indexing all entries of three of the child site maps (all others seem to be nicely indexed). The number of entries indexed has remained static for the last 6 weeks.

We would now like to determine which url's are not being indexed by google, to try and see if there is a content issue or otherwise.

Is there any way to determine which url's are not being added to the index other than manually go through all 500 entries using 'site' on google?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Sent6035632

1 Comments

Sorted by latest first Latest Oldest Best

 

@Sue5673885

I will reply as an answer because I cant fit everything in a comment:

You could use a few tools, but I will describe the way I am most comfortable with, PHP + cURL + DOMDocument.

First you need to generate the search queries, so probably reading your sitemap.

$url = 'www.google.com/search?q=' . urlencode("site:".$sitemap_url);


(add more parameters to search url if needed)

Then you probably want to fake your useragent and set other cURL options if needed.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Your fake useragent string here');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$out = curl_exec($ch);

$dom = new DOMDocument(); @ $dom->loadHTML($output);


Now that you have Google search loaded in a DOMDocument, you can parse it and look if your url is actually present in the search. If it is, it's indexed.

Hope this helps

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme