Mobile app version of vmapp.org
Login or Join
Hamm4606531

: How to interpret Google policy on crawling and indexing search results pages? I just learned that we can be penalized for letting Google crawl and index our database search results. https://webmasters.stackexchange.com/a/55599/

@Hamm4606531

Posted in: #Google #Googlebot #Penalty

I just learned that we can be penalized for letting Google crawl and index our database search results. webmasters.stackexchange.com/a/55599/33777
Question: If they don't want to list Yellow Page type results, then why do they? These sites have been around forever, and the domains haven't changed. I just did a search for specific keywords on a friend's website. He ranks #7 . The first six are all search results pages for well known Yellow Page type sites.

I'm a back-end developer so this all new to me. I reviewed the Webmaster Guidelines and saw this (emphasis mine):


Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.


That is very subjective.


My client has a page listing the names of all companies in our database.
Each company name is a link to the search result page for that exact phrase.
That result page in turn links to their company profiles in the different publications.
The same company might have a profile in more than one publication.
The profiles may be similar, but would list different product categories depending on the publication.


This was originally set up because the client was trying to compete with Yellow Pages and the like. And we are ranking pretty well when people search for particular companies. But I don't want to get penalized.

We were linking to the search results rather than directly to the profile because one company might have multiple profiles.

However, the client now wants to segregate the different publications more. So, I can save a click for users if the company list links directly to the profile and skips the search results page.

Question: Is the list of company names and the profile for each company acceptable content for Google? Do you think we will be penalized for letting them crawl and index that?

We were just about to add a similar list of all categories in the database where each category would link to a list of the companies in that category. I think this has value for users coming from search engines. But it's subjective.

Question: Since it is dynamically generated, Google could randomly request words like Viagra and we would present a "No matches found." page. That page would have a noindex meta tag. But is this enough to get us penalized?

Note: The actual search form uses POST which I believe Google avoids. We would only generate links for exact category names that exist in the database. So we wouldn't be inviting them to crawl a search results page, but more a directory landing page. However, nothing is there to stop Google from hunting for content by manipulating the URLs.

Vent: I know some basic SEO, and I've always gone with the idea of thinking about our users first - provide them with the content they are looking for - and let Googlebot figure things out on its own. It seems counter-intuitive to me to have to tell Google to stop crawling my site. The same thing goes for nofollow on partnership links (which I also just learned about). IMO, Google should just figure out what is relevant/valuable and display that. They shouldn't penalize sites for having content that doesn't interest them.

Aside: If they don't want to crawl useless pages, why are they still requesting pages that have been sending 301, 404, or 410 for more than a year? And no, there are no inbound links to these pages.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Hamm4606531

1 Comments

Sorted by latest first Latest Oldest Best

 

@Cofer257

Firstly, I think Stephen's assertion that a search results page will result in a site penalty is wrong. The blog post in question simply states Google may "take action to reduce search results," which implies your search results pages will rank lower in Google's results (or be ignored) and the rest of the site will be unaffected.

However, the key to this question is the definition of a "search results page." The type of page that Google is addressing here is one where you can literally type anything into a box and search for it. The main problem is that the possible list of pages you get from that is infinite. Furthermore, many similar queries will return the exact same results.

Now, your situation is slightly different. I am assuming that any search made on your site, although using POST, either redirects to a GET URL or is otherwise accessible by GET (otherwise how do you link to your search results).

You are correct that Google will not submit a POST form. Nor will they randomly generate URLs by replacing q=Company by q=Viagra in the URL. So really, the only way for Google to know it's a search URL is the pattern of the URL itself (e.g. it looks like /search.php?q=Company).

In your case the URLs you're linking to are a finite number based on the companies/categories you have in your database. So I think what you should do is separate your generic search and your pre-defined search terms (categories). In other words, create separate static-looking URLs like /category/company-name that shows whatever the search for "company name" shows now. You can still use the same search back end for that, it's just that to search engines and users it looks like static pages.

I think this is what Yellow Pages type sites do. What they link to aren't strictly search results pages, but category or keyword pages.

Regarding your final question about requesting old pages: unfortunately that's just what Google does. If a URL existed at one time there is a chance it will again. Normally a page will be linked to from somewhere else on the internet so they assume the other site still has the link for a reason and keeps checking it.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme