Mobile app version of vmapp.org
Login or Join
Sue5673885

: On-demand removal - should i block off with robots.txt? Has anyone tried the on-demand removal tool? https://developers.google.com/custom-search/docs/indexing I created a sitemap with a bunch of

@Sue5673885

Posted in: #GoogleSearchConsole

Has anyone tried the on-demand removal tool?
developers.google.com/custom-search/docs/indexing
I created a sitemap with a bunch of old urls with an expiration that are being 404ed. Also I edited my robots.txt to disallow crawling of the old directory. When I submit my new sitemap of expired links though, I get a warning message that the robots.txt is blocking off that directory.

So I'm not sure if I need to remove it from robots.txt or leave it. I keep reading conflicting things whether or not to edit the robots.txt.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Sue5673885

1 Comments

Sorted by latest first Latest Oldest Best

 

@Ravi8258870

When I submit my new sitemap of expired links
though,


You should not use expired urls in a sitemap. They are only for current urls.
There is no valid syntax in sitemaps to indicate a removed url.

From www.sitemaps.org/protocol.html see


Excluding content

The Sitemaps protocol enables you to let search engines know what
content you would like indexed. To tell search engines the content you
don't want indexed, use a robots.txt file or robots meta tag. See
robotstxt.org for more information on how to exclude content from
search engines.

I get a warning message that the robots.txt is blocking off
that directory.


Whats happening is that googlebot sees the url in your sitemap, tries to access it but get blocked by your robots.txt.

It's a misunderstanding, you think you're asking google to delete the url and googlebot thinks you are asking it to crawl a link but then it gets blocked by your robots.txt


So I'm not sure if I need to remove it from robots.txt or leave it. I
keep reading conflicting things whether or not to edit the robots.txt.


The correct procedure is as follows ...


configure your web server to return a 404 response for all deleted urls
configure your web server to show the visitor a helpful "404 help" page (optional but nice)
remove the deleted urls from your robots.txt (this will allow google to try to request them, get the 404 response and understand correctly that the urls have been deleted)
submit an updated sitemap that only contains currently existing urls that you want google to crawl and index.


As regards


Has anyone tried the on-demand removal tool?

developers.google.com/custom-search/docs/indexing

The "on-demand removal" tool only applies to "Google Custom Search". It has no effect on the general Google search public search index.

Think of "Google Custom Search" as an entirely separate private search engine just for your site. It just happens to be implemented by Google.

Note that robots.txt has no effect on "Google Custom Search".

Note that "Google Custom Search" & the general Google search public search index use separate sitemaps that do not influence each other.

You can create and manage "custom search" instances from within the "Google Search Console" (recently renamed from "Google Webmaster Tools")

Log into "Search Console", click "Other Resources" at the bottom of the left menu. Then click on

"Custom Search : Harness the power of Google to create a customized search experience for your own website." in the list displayed.

Assuming you are were using a "Custom Search" and wish to remove urls from it's index immediately, you must prefix the URL with a "-" symbol, and submit it in the "On-demand indexing using individual URLs" tool within the "Other Resources -> Custom Search" section of the "Google Search Console" website.

If it's not that urgent you can remove urls from a "Custom Search" index by submitting an updated sitemap that applies only to that "Custom Search" index.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme