: HTTP: How to be deleted from search engines at a certain point in time in the future? Is there a way to tell search engines, that a page they crawl should be included in the search results

Is there a way to tell search engines, that a page they crawl should be included in the search results now, but have to be deleted at a certain time in the future?

I have a website where hundreds of publications happen each day and I want them to be crawled and be searchable, but I am legally required to remove the information after a while (individual date for each page).

After that given date, the page will not be visible at my website anymore (HTTP response 410 gone), but the page will linger in e.g. the google cache for a while, which could cause legal issues for me. Obviously, it's not viable to issue hundreds of content removal requests to google by hand. On the other hand, the individual pages do not get modified for some months until they have to be discarded, so google bot won't check in often.

For what I understand, the HTTP Expires header is a label for minimum freshnes and not for maximum lifetime, correct? I am sending last-modified-at and etag headers, but they don't help here. Is there any way to say "cache, but only until 2011-08-15"?

10.02% popularity Vote Up Vote Down

: Javascript slider Help me find javascript slider. I can't find it anywhere - i just saw it one time and was shocked, how it was good implemented. So, it looks like image below:

@Mendez628

Posted in: #Jquery #LookingForAScript

2 Comments

: Assuming the same hardware allocation, a server will always be slightly more efficient than a VPS. But virtualisation software has become very good in recent years; the difference is no longer

@Mendez628

0 Comments

: How to use Google Web Fonts "text" parameter on Blogger? Google introduced the new parameter "text" to the Google Web Fonts API, where you can choose which letters will come from your request

@Mendez628

Posted in: #Blogger #Fonts #Google #Template #Xml

1 Comments

: So now Google has said no to old browsers when can the rest of us follow suit? Google recently announced that they will no longer support older browsers on Aug 1st: For this reason, soon

@Mendez628

Posted in: #Browsers #BrowserSupport #Html5 #Standards #WebDevelopment

9 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Mendez628

For google there is a meta tag called unavailable_after, which does exactly what I was looking for: It tells google to remove a certain page at a specific time in the future.

It is the only way to achieve what I was hoping to accomplish: Getting the pages removed automatically, at the right time, not relying on the crawler to come back and notice the 410 Gone response, which can take a weeks after the content has been removed.

Example:

<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Aug-2007 15:00:00 EST" />

Or with HTTP header, for PDFs etc.:

X-Robots-Tag: unavailable_after: 23 Jul 2007 15:00:00 PST

Sources: googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html and www.google.com/support/webmasters/bin/answer.py?answer=79812
I could not find out if Bing, Yahoo & Co have adopted this Google specific tag.

10% popularity Vote Up Vote Down

@Merenda212

First of all you don't have control what search engines crawl and what they put in their index.

BUT, Google for instance is taking your information about the live time of your pages very serious. So if you add the correct HTTP header it will consider those information. You can also add some information into your robots.txt about which pages are invalid.

There are also the Webmaster tools where you can tell Google to remove pages from the index.

On the official Google webmaster blog you will find very helpful information about removing URLs from the index and how to reinclude content. There they say you can remove URLs by:

using 410,
robots.txt or the
noindex meta tag

10% popularity Vote Up Vote Down

Feed

: HTTP: How to be deleted from search engines at a certain point in time in the future? Is there a way to tell search engines, that a page they crawl should be included in the search results

More posts by @Mendez628

: Javascript slider Help me find javascript slider. I can't find it anywhere - i just saw it one time and was shocked, how it was good implemented. So, it looks like image below:

: Assuming the same hardware allocation, a server will always be slightly more efficient than a VPS. But virtualisation software has become very good in recent years; the difference is no longer

: How to use Google Web Fonts "text" parameter on Blogger? Google introduced the new parameter "text" to the Google Web Fonts API, where you can choose which letters will come from your request

: So now Google has said no to old browsers when can the rest of us follow suit? Google recently announced that they will no longer support older browsers on Aug 1st: For this reason, soon

Login to post a comment!

2 Comments

Back to top | Use Dark Theme