Mobile app version of vmapp.org
Login or Join
Lee4591628

: How to control old indexed page crawling rate I have a site where user can create their profile page with their specific URL. Each specific URL will be available to public which means that

@Lee4591628

Posted in: #CrawlRate #Google #Googlebot #Seo #WebCrawlers

I have a site where user can create their profile page with their specific URL. Each specific URL will be available to public which means that it will also be ready for search engines too to crawl. Today I am getting today a minimum of 1000 new profile pages today and 10-40 pages getting indexed by Google in 24 hours. That is fine.

Here is my problem:

But already indexed pages are being again and again crawled by Google after some period of time. But that's actually not required and the site is now having 300000 - 400000 pages.

So I don't want the search engine to be crawled for old profile pages again and again unless until there is new update and it is fine to crawl the new pages.
Also i already using 410 redirection for the expired profile pages.

It would be great if you suggest, so that Search engine focus for new profile pages only instead of old pages.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Lee4591628

2 Comments

Sorted by latest first Latest Oldest Best

 

@Angie530

You may want to consider adding a change-frequency tag to your XML sitemap: www.sitemaps.org/protocol.html
Another couple things to look into would be ensuring last-modified headers match the creation (or modification) date of the user's account and, if it works for your application, cache related headers: www.mobify.com/blog/beginners-guide-to-http-cache-headers/

10% popularity Vote Up Vote Down


 

@Harper822

If you haven't already done so, register an account with Google Webmaster Tools and add your domain to it then access your domain, and select the gear icon and visit Site Settings then you can select "Limit Google's maximum crawl rate" and select a value that you're OK with. Probably the lowest value works well for you which may be 0.002 requests a second (a pause of 500 seconds between requests).

Another thing to try is the "Crawl-delay" directive for robots.txt which allows you to specify the waiting time between two requests from the same server and I think this time is measured in seconds. I'm also not sure if all search engines support it. Here's a link for more info about crawl delay:
en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive
I also recommend continuing to issue HTTP status 410 to URLs you no longer want indexed.

If any other pages on your site link to the old pages, then you may want to add rel="nofollow" to each anchor tag that link to any old page so google wont accidentally try to crawl it.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme