Mobile app version of vmapp.org
Login or Join
Pope3001725

: Facets: Googlebot encountered extremely large numbers of links on your site We are encountering the following issue: Googlebot encountered extremely large numbers of links on your site. The site

@Pope3001725

Posted in: #Google #Googlebot #GoogleSearchConsole

We are encountering the following issue: Googlebot encountered extremely large numbers of links on your site.
The site is an ecommerce site and has around 12M pages indexed in Google.
From the examples provided, a majority of links are from facets and internal search queries.
The facets are using rel="canonical" to the non faceted version and set to 'No URLs' via URL parameters. The search pages are noindexed and set to 'No URL's via URL parameters and up until recently were blocked via robots.txt.
Despite blocking the facets/search pages via URL parameters, using canonical, and noindex Googlebot is still crawling these pages.

I've heard that I should also be blocking the facets via robots.txt, but we had our search pages disallowed in robots.txt and Google did not honor it.

What other options do I have to resolve this issue?

Regrading @John Mueller's answer to a similar problem: Solving "Googlebot encountered extremely large numbers of links on your site."
He says that the message is sent out before the new URLs are crawled, meaning the robots.txt, and noindex robot tags, or rel=canonical are not known at that point. Given that the facets have been blocked since 2012, Google should not be finding these unique URLs from crawling our site internally, does this mean that people are externally linking to these faceted links, thus providing Google this list of faceted URLs?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Pope3001725

3 Comments

Sorted by latest first Latest Oldest Best

 

@XinRu657

I spoke with John Mueller regarding robots.txt, URL parameters, canonicals, and noindexation.
Using 'URL parameters' in GWT is a strong suggestion to Googlebot, but not absolute. Googlebot still spot checks the URLs, so depending on the number of URLs the spot check might be fairly visible. Also, since we had 'URL parameters' set to not crawl search queries and facets - Googlebot would be limited to how often the URLs are crawled. Which means it'll take longer to recrawl them and drop them out of Google's index.
Regarding robots.txt, since we were blocking the search pages via robots.txt, Googlebot would not be re-crawling the URLs to see the noindex. So, removing the search pages from robots.txt was the correct move.
Because of the amount of search pages/facet pages, it'll take some time for the URLs to get reprocessed. John Mueller gives us a time from half a year to 3/4s a year to be recrawled and dropped out naturally.

Solution: John Mueller's suggestion is to use Google's urgent 'Remove URL' tool found in GWT: Remove URL.

Here is the video link of John Mueller's response from Google webmaster office hours.

10% popularity Vote Up Vote Down


 

@Jessie594

This may not directly address your link count warning, but there is something you said:


The facets are using rel="canonical" to the non faceted version and
set to 'No URLs' via URL parameters


Some of the canonicals are indeed using facets. Have you thought about changing your URL parameter tactic? You can "teach" Gbot how to use the facets in GWT > Crawl > URL Parameters. You can specify querystring relationships like select, sort, narrows, specifies, translates, and paginates. According to Mueller this will not solve the warning, but also according to him:


if you send us 5-100x more URLs than you actually have content, that
can result in us not being able to pick up new content as quickly as
we might if we could crawl more efficiently.


So going on a limb, setting the URL parameters for their purpose could make a crawl more efficient....or at the very least allow Gbot to understand what its about to encounter. Seems like just turning them off is an easy way out rather than the "right" way, especially for a major player with huge amounts of facets and query identifiers like Walmart.

Also another thought is your canonicals themselves...noticing on your rel=next in the android tab pc category for example that is full of querystrings such as facet= cat_id= etc. Maybe its just me but that seems counter-intuitive to define a canonical full of querystrings without the bot being able to understand the parameter logic itself.

Again, it's not a direct answer to your question, and im not an expert on your traffic, but it seems to me like it could effect things for the Gbot, even if in a post-warning yet pre-understanding way.

10% popularity Vote Up Vote Down


 

@Heady270

This is not an issue that needs to be resolved. Any site that has a large number of pages on it gets this message. Google tells you this in case you accidentally published URLs, especially ones with duplicate content.

As long as you mean to publish your URLs and are handling any duplicate ones appropriately, this is not an warning that you need to pay further attention to. I've worked with large sites that have had this warning for years, but always enjoyed great rankings and lots of search engine traffic.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme