Mobile app version of vmapp.org
Login or Join
Pierce454

: How to properly remove URL's from Google's index? On some of our sites, we now have several thousand pages that dilute our website's keyword density. The website is an MVC site with SEO routing.

@Pierce454

Posted in: #GoogleIndex #GoogleSearchConsole #Indexing #XmlSitemap

On some of our sites, we now have several thousand pages that dilute our website's keyword density. The website is an MVC site with SEO routing.

If I submit a new sitemap with say only the 2000 or so pages that we want indexed, even though navigating to the diluting pages still works, will Google re-index the site with only those 2000 pages, dropping the superfluous ones?

For example, I want to keep roughly 2000 of the following:
mysite.com/some-search-term-1/some-good-keywords www.mysite.com/some-search-term-2/some-more-good-keywords


And remove several thousand of the following that have already been indexed.
mysite.com/some-search-term-xx/some-poor-keywords www.mysite.com/some-search-term-xx/some-poor-more-keywords


These pages are not actually "removed" as navigating to these URL's still renders a page. Even though there are potentially hundreds of thousands of pages, I only want say 2000 to be re-indexed and retained. The others removed (without having to do these manually).

Thanks.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Pierce454

3 Comments

Sorted by latest first Latest Oldest Best

 

@Fox8124981

Another way is through Google Webmaster Tools on the left menu, under Optimization. I have done this before and is very quick (more than updating your robots.txt)

10% popularity Vote Up Vote Down


 

@Samaraweera270

An XML Sitemap doesn't govern the content that a search engine will index. It's an aid to discovery, but whether or not a piece of content is listed in a Sitemap has nothing to do with whether or not it's indexed. Info here.

Robots.txt, as discussed above, may prevent crawling but will not prevent indexing. If a page is blocked by robots.txt but is linked to, either internally or externally, it stands a chance of appearing in the index, with "blocked by robots.txt" notification in the result snippet.

I believe the only reliable way to do what you want is either noindex: either HTML meta tag or the X-Robots-Tag equivalent.

However, I question the whole premise. I wouldn't be concerned with "diluting your website's keyword density" - it's basically meaningless. Does the content have value and purpose? If so, are visitors finding it easily, or could it be improved? If the content is poor and serves no purpose, get rid of it. Base this process on - and assess your progress with - analytics data.

10% popularity Vote Up Vote Down


 

@Cugini213

You can tell Google to not crawl specific parts of your site with robots.txt, you can use the sitemap, you can use a few other technics, but in the end, if there is a way yo get to the pages following links or threads from a forums, or pages from somebody else, Google will find it.

If those pages, the good and the bad ones are related enough, you can use the canonical meta to help redirect the traffic to those pages that you want, but they have to be similar, otherwise you will be using the tag badly.

If you have a forum, it will be much better to do some clean up and remove old, inaccurate or unfinished threads, those are not useful and dilute your ranking.

If you have other kind of site, you may provide a description to it or a link so we can tell you a better approach.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme