Mobile app version of vmapp.org
Login or Join
Turnbaugh106

: Google crawl speed -- how fast can it go? I have a huge website with 5 million pages. Currently Google indexes about 10,000 pages per day. This is very slow, I still have lots of pages that

@Turnbaugh106

Posted in: #Google #Indexing #WebCrawlers

I have a huge website with 5 million pages. Currently Google indexes about 10,000 pages per day. This is very slow, I still have lots of pages that I can't get indexed. Does anyone know what is the upper threshold for crawl speed?

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Turnbaugh106

4 Comments

Sorted by latest first Latest Oldest Best

 

@Radia820

I've found out that it is possible to achieve 2 pages/second crawl speed by improving server responce time. Each page should responce as fast as possible. This may require garbage collector tuning, db tuning and code tuning. If average responce time is better then 50ms per second, then google would index at 2pages/sec, this is experimental fact.

10% popularity Vote Up Vote Down


 

@Heady270

Google's crawl rate is a function of:


Pagerank -- the more reputation and inbound links your site has, the more it will be crawled. Within your site the most prominent pages (like the home page) will get crawled more often because they have higher pagerank.
How often your pages change -- pages that change frequently will get re-crawled more often that pages that don't.
How fast your server is -- rather than having a number of pages per day that Googlebot downloads, it appears that it is limited by the amount of time spent downloading pages. Making pages smaller and increasing the speed of the server can both let Googlebot crawl faster.


In addition, Googlebot has several different crawl modes.


Re-crawl mode -- it will come back and visit pages that it has visited before.
Fresh crawl mode -- it will crawl lots of new pages in a new section of a site. The higher the pagerank of the site, the more pages get crawled.
Stale pages mode -- Googlebot finds a box of old links in the basement and plows through them just for "fun". These pages are often all pages that no longer exist and are redirected to other pages. They often have no pagerank and are crawled in URL-length order.


The upshot of this is that the best way to get your site crawled faster is to get inbound links and increase the pagerank.

10% popularity Vote Up Vote Down


 

@Pope3001725

If they're crawling your pages and they're not being found in the search results then the crawl rate is not an issue. This sounds like your website is full of low quality content that Google does not want in its index. Is this original content? Is it quality content? Google not listing your pages indicates that it is not.

10% popularity Vote Up Vote Down


 

@BetL925

The maximum speed of indexing is 10 times per second. This is the speed of StackOverflow indexing by Google (read this).

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme