Mobile app version of vmapp.org
Login or Join
Margaret670

: Removing full site from Google index I want to remove the contents from Google index. Google index include my websites huge number of pages indexed in Google index about 5,000,000 Pages earlier

@Margaret670

Posted in: #GoogleIndex #GoogleSearchConsole

I want to remove the contents from Google index. Google index include my websites huge number of pages indexed in Google index about 5,000,000 Pages earlier but now left pages are 3,025,000.

I have made the following things but the removal of pages are very slow pace.

Robots.txt:

User-agent: *
Disallow: /


.htaccess:

rewriteengine on
rewritecond %{HTTP_USER_AGENT} ^.*Googlebot/2.1.*$
rewriterule .* - [F,L]


The is the content returned to Googlebot when it try's to crawl the content:

HTTP/1.1 410 Gone
Date: Sat, 05 Jan 2013 12:39:23 GMT
Server: Apache/2.2.23 (Unix) mod_ssl/2.2.23 OpenSSL/0.9.8e-fips-rhel5
mod_fastcgi/2.4.6 mod_jk/1.2.37 mod_auth_passthrough/2.1 mod_bwlimited/
1.4 FrontPage/5.0.2.2635 PHP/5.3.19
Content-Length: 661
Connection: close
Content-Type: text/html; charset=iso-8859-1


I had also used the html meta tag noindex, no follow but no effect as:

<meta name="googlebot" content="noindex,nofollow">


I had also submitted for website removal but the speed of removing of contents is very slow. In last 35 days the only few are pages are removed. My website is also removed from Google search index but Google Webmasters Tools - Health -> Index status still showing 3,025,000 page and If I re-submit the site they will show already indexed pages. How can I increase the speed to remove pages.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Margaret670

3 Comments

Sorted by latest first Latest Oldest Best

 

@Samaraweera270

.htaccess




I just spent the last minute clicking around your site with my User-Agent set to Googlebot 2.1, and I didn't hit a single 410. I'm not an expert on .htaccess, but are you sure your .htaccess rule is functioning correctly site-wide?
F should produce a 403 (Forbidden), not the 410 your Fetch as Googlebot produced.
Why only tell Google the page is forbidden, gone, or whatever? Your meta noindex would suggest you wish to instruct search engines other than Google.



Meta Noindex


You seem to be instructing specific robots to noindex, and then cancelling that out by telling all robots to index:

<meta name="googlebot" content="noindex,nofollow">
<meta name="searchbot" content="noindex,nofollow">
<meta name="baidu" content="noindex,nofollow">
<meta name="geo.country" content="IN">
<meta name="robots" content="Index, Follow">




Robots.txt


Your robots.txt file does not, in fact, contain

User-Agent: *

Disallow: /


as you say it does. It contains

User-agent: *
Disallow: /judgment_view
Disallow: /payment
Disallow: /include
Disallow: /search.php*
Disallow: /admin


Although it isn't that important as it would only prevent crawling, and wouldn't remove content from the index.


Solution


You haven't stated exactly what your aim is, neither is it clear from the steps you've taken on your site, but the above should serve as a starting point.

10% popularity Vote Up Vote Down


 

@Nimeshi995

Google doesn't drop of pages that quickly partly because pages are ranked and if they was to drop them quickly people would whine about them losing their page ranks with a unoticed mistake. So its kinda like a grace period to short things out.

The problem with .htaccess redirects is that Google can assume that it's a mistake within the HTACCESS so periodically it will come back and check again, and if you have many many pages then this because a time consuming process.

Additional you should do a on every page, because the robots.txt is not always checked on crawls, personally I wouldn't use .htaccess at all because they will just keep coming back and assuming its an error. NOINDEX is faster than robots and htaccess but try the removal tool I linked above.

<meta name="robots" content="noindex,nofollow">


Also another factor is how Google treats your site in terms of ranking and what it considers its speed. So if your on a VPS increase the speed of it, it should increase the amount of pages that Google will crawl due to the fact Google bot will crawl for a X amount of alloted time and then leave regardless, and you want as much juice as possible.

Best Method, Hit or Miss

Now the best way to remove URLS promptly is via their webmaster tools however you have 3 million pages that becomes unrealistically impossible however there is a site removal tool which many don't know about and funny enough the url is almost the same.

CHECK
www.google.com/webmasters/tools/removals www.google.com/webmasters/tools/url-removal?hl=en&siteUrl= (This one is the Webmaster Tools one - the one above you can request removals of sites)

10% popularity Vote Up Vote Down


 

@BetL925

Would adding: <meta name="robots" content="noindex,nofollow"> to the head section of your site's pages help speed things along?

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme