Mobile app version of vmapp.org
Login or Join
Kaufman445

: Will the already indexed URLs be removed if I use a robots.txt for my site? Many URLs from my site have been added to Google's index, but a lot of them are outdated and they will never

@Kaufman445

Posted in: #GoogleSearch #Indexing #RobotsTxt #Seo

Many URLs from my site have been added to Google's index, but a lot of them are outdated and they will never result in a 404 error and take the user to my site's homepage.

I have submitted a new sitemap with my latest URLs, but the the old and obsolete URLs of my site are still shown on top in Google search. There're 100's of such URLs.

I know about creating URL removal requests and robot.txt. But submitting removal requests will take a lot of time and effort. I would like to use robots.txt instead. But if I list them in my robots.txt using a wild card expression that matches my old URLs, will Google remove them from its index? Or will it just stop crawling them again, which means they will not be re-indexed, but already indexed old URLs will still be shown in Google search, which is not I want. Can you please let me know what I should do?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Kaufman445

3 Comments

Sorted by latest first Latest Oldest Best

 

@LarsenBagley505

If I block Google from crawling a page using a robots.txt disallow directive, will it disappear from search results? developers.google.com/webmasters/control-crawl-index/docs/faq

Blocking Google from crawling a page is likely to decrease that page's
ranking or cause it to drop out altogether over time. It may also
reduce the amount of detail provided to users in the text below the
search result. This is because without the page's content, the search
engine has much less information to work with.

However, robots.txt Disallow does not guarantee that a page will not
appear in results: Google may still decide, based on external
information such as incoming links, that it is relevant. If you wish
to explicitly block a page from being indexed, you should instead use
the noindex robots meta tag or X-Robots-Tag HTTP header. In this case,
you should not disallow the page in robots.txt, because the page must
be crawled in order for the tag to be seen and obeyed.

10% popularity Vote Up Vote Down


 

@Annie201

This is the official answer from Google on this: Completely remove an entire page

If you want a page removed, then you have to fill out the remove page from index form in the Google Webmaster Tools page AND use the robots.txt file to exclude the pages so Google doesn't index them again.

They state in the link above that if the page exists in Google's index and you just use the robots.txt file alone to exclude the page, then it may be indexed by Google:


If the page still exists, use robots.txt to prevent Google from
crawling it. Even if a URL is disallowed by robots.txt we may still
index the page if we find its URL on another site. However, we won't
index the page if it's blocked in robots.txt and there is an active
URL removal request for the page.

10% popularity Vote Up Vote Down


 

@Debbie626

You should make sure that the outdated pages either do a 301 redirect to your home page, or give a 404 or 410 status code.

Google will eventually remove the pages from search results, if you put the URLs in robots.txt. However, that might take some time.

Fastest way is to use Webmaster Tools and remove the URLs there.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme