: Impact of blocking site search from googlebot, but listing content pages in sitemap.xml We have a fairly large site 600k+ unique pages with various unique URLs along with a custom, local search
We have a fairly large site 600k+ unique pages with various unique URLs along with a custom, local search interface for the site. The search interface has sorting and faceting capabilities to allow users to limit results and change display. We are getting over 200,000 search requests a day from googlebot alone, and have tried to address it in the following ways:
adding noindex to faceting and sorting requests
adding rel=next and rel=prev to results pages to allow google to understand structure
We are considering adding a sitemap.xml to ensure that content pages are found, and also attempting to add a Dissallow Disallow entry in the robots.txt for the search results. Does anyone have any experience with the effects of this? Ideally, we don't want to lose rankings or results in Google, but also would love to not suffer the impact of serving up 200k searches a day to a search engine bot that just wants real content and not search results.
More posts by @Connie744
2 Comments
Sorted by latest first Latest Oldest Best
You should not allow Googlebot to crawl site search pages. In addition to putting undue stress on your server, Google doesn't want to crawl them. Here is Google's Matt Cutts blog post about the issue: Search results in search results
by Matt Cutts on March 10, 2007 Google now actively penalizes sites that allow their site search results to be crawled and appear in Google's SERPs. By allowing googlebot to crawl your search result pages, you are risking all of your Google referral traffic. One favorite trick of a google reviewer is to use your site search for a spam terms such as "Viagra". When they see a crawlable page as the result (even if it says "no results for Viagra found") they will apply a manual penalty against your site as a spam site.
You should put your site search into robots.txt. Just make sure that googlebot can still crawl your content pages.
Having a sitemap that lists all your content files IS NOT ENOUGH to get all your content files indexed. Here is a very related question The Sitemap Paradox in which Jeff Atwood from Stack Overflow notes that pages in the sitemap which can't be crawled, don't get indexed. The question is answered by Google's John Mueller. He states in no uncertain terms:
The Sitemap file isn't meant to "fix" crawlability issues. If your site can't be crawled, fix that first. We don't use Sitemap files for ranking.
I would recommend that every piece of content on your site is available within 3 or 4 clicks from the home page. That is admittedly a tough task to do well, especially because neither users nor googlebot react well to large lists of links in your pages. If you do add links to your pages, try to make them as useful to users as possible and keep lists of links to fewer than 10. You can use dimensions out of your faceted navigation to create useful links on products like
Also by this manufacturer
Similar items by feature
Similar items by price
Users who bought this also bought
Honestly, google should NEVER search your site. Not having a sitemap is the problem. WIth that amount of request you just need it.
But its not necessary to add noindex. Just make sure you specify on google webmaster tools whats the search parameter so google wont hit it.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.