: Blocking vs noindex to reduce crawl requests I observed that GoogleBot is making a lot of duplicate requests for the same URLs from my website within a week. Amongst these requests a majority

Posted in: #Googlebot #Noindex #RobotsTxt #Seo #WebCrawlers

I observed that GoogleBot is making a lot of duplicate requests for the same URLs from my website within a week. Amongst these requests a majority were for low/thin value pages(no or very low SERP,not much of content).

Therefore, I want to optimize the way in which google uses its bandwidth for my website. Apart from few unnecessary resources that I can block, I want to limit the bots focus to crawling/recrawling high-value pages only.
After discussing a lot I have 3 options

404 the low value pages. Not an option for me.
Add no-index to the low-value pages. This should(although not confirmed) reduce the frequency with which those pages are requested for while crawling.
Block the URLs via robots.txt. There is (no particular pattern + I have to block 150000+ URLs) to the low-value pages because of which I cannot use wildcards in the robots.txt. So, robots.txt is almost out of the picture.

Looking at these options 2nd one is the one most feasible. But my concern is that as per Google documentation crawling and indexing are independent.

Robots.txt should be used to limit crawling.
no-index should be used to prevent indexing.

So, perhaps adding no-index would not help my case. Any suggestions or alternatives?

10.01% popularity Vote Up Vote Down

: How to prevent browser to block mixed content (http content in a https site) I'm installing SSL in many websites, almost all of those don't give me any problem at all, but some of them do.

@Pope3001725

Posted in: #Content #Http #Https

2 Comments

: Why my subdomains are ranking really high. I have a site with many users. Lets say my site is called thecommunity.com. For each of my users I give them a free url such as jason.thecommunity.com

@Pope3001725

Posted in: #Seo #Subdomain

1 Comments

: Hiding WHOIS information without WhoisGuard I've recently found a very good domain, but this domain requires your real contact information to be shown (yes, WhoisGuard is disallowed). Is there

@Pope3001725

Posted in: #Domains #Privacy

0 Comments

: Why the number of indexed URLs in Search Console dropped? A few months ago I saw there were 14 URLs reported as indexed in Google Search Console. As of a few days back, it's showing that

@Pope3001725

Posted in: #GoogleIndex #GoogleSearchConsole #Indexing #Url

1 Comments

Login to post a comment!

1 Comments

Sorted by latest first Latest Oldest Best

@Alves908

Google have too many cralwers based on backlinks, re-crawl same URL after few weeks/months, pagerank, sitemap, Google Webmaster Request etc.

By using noindex Google may crawl that URL less frequently, but it will not going to block it permanently, because noindex pages are crawlable and pass PageRank when it is linked from somewhere, so as per backlinks cralwer and pagerank crawler those pages will going to crawl.

So my first advice is try to links those pages rarely.

Second is remove those pages from sitemap or feed URL of your website.

Third is use Last Modified HTTP header, because when Google crawl some pages, then they will going to recrawl same URL after some time(May be after few weeks to check any changes).

I don't see any other solution for you. If it is possible then move your thin content to subdiretory and block that specific directoy in robots.txt.

10% popularity Vote Up Vote Down

Feed

: Blocking vs noindex to reduce crawl requests I observed that GoogleBot is making a lot of duplicate requests for the same URLs from my website within a week. Amongst these requests a majority

More posts by @Pope3001725

: How to prevent browser to block mixed content (http content in a https site) I'm installing SSL in many websites, almost all of those don't give me any problem at all, but some of them do.

: Why my subdomains are ranking really high. I have a site with many users. Lets say my site is called thecommunity.com. For each of my users I give them a free url such as jason.thecommunity.com

: Hiding WHOIS information without WhoisGuard I've recently found a very good domain, but this domain requires your real contact information to be shown (yes, WhoisGuard is disallowed). Is there

: Why the number of indexed URLs in Search Console dropped? A few months ago I saw there were 14 URLs reported as indexed in Google Search Console. As of a few days back, it's showing that

Login to post a comment!

1 Comments

Back to top | Use Dark Theme