: How do I get a list of all indexed links? I am looking for a way to take every link that I have indexed by Google and export them to a CSV file. Recently I have had way more pages indexed
I am looking for a way to take every link that I have indexed by Google and export them to a CSV file. Recently I have had way more pages indexed by Google then I actually have and I want to find where all these pages are coming from without having to view each search result page.
More posts by @Odierno851
3 Comments
Sorted by latest first Latest Oldest Best
I ended up drilling down to the problematic sub-folder through searching for site:domain.com/foo/bar/ but in my search I did come across a method for getting the search results into an excel file.
Open up a Google Docs spreadsheet and use this formula:
=importXml("www.google.com/search?q=site:domain.com&num=100&start=1"; "//cite")
It will only get the first 100 results but you can use it again to get the next 100. Just change the start variable:
=importXml("www.google.com/search?q=site:domain.com&num=100&start=100"; "//cite")
This will only provide up to 1000 results, as mentioned previously by DisgruntledGoat, but the formula can be changed to provide links from specific sub-directories:
=importXml("www.google.com/search?q=site:domain.com/foo/bar/&num=100&start=1";
"//cite")
Unfortunately there is no way to get a full list of every indexed page in Google. Even milo5b's solution will only get you at most 1,000 URLs.
It sounds like you have some duplicate content issues. In Webmaster Tools, check in Health > Index Status and it will show you a cumulative total of pages indexed over time. If the graph makes a big leap at one point you may be able to work out if a specific change on your site triggered the jump.
You could also try using Bing's Webmaster Tools. They have an Index Explorer which could help you find the URLs. Search engine spiders are quite similar so if Google found those links, Bing probably did too.
I thought Bing had a way to export most of its data but I cannot find it on a cursory glance. There is an API though so you could probably use that to extract everything.
You could write a script that parses Google's SERP (for example PHP + Curl) and store each link in a CSV file. Be careful to have your script behave like a human, because Google could ban your IP from search results for a few hours if you abuse this.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.