Mobile app version of vmapp.org
Login or Join
Annie201

: How do search engines behave when encountering noindex? I have a question about the behaviour of search engines when they encounter meta name="robots" content="noindex" If I use the meta tag format

@Annie201

Posted in: #Noindex #Seo #WebCrawlers

I have a question about the behaviour of search engines when they encounter meta name="robots" content="noindex"

If I use the meta tag format of robots, I know that search engines will not index the contents of the page.

What I don't know is whether it also blocks the URL from showing up in search results, or just that particular content with that URL.

I'm also wondering if Google's search engine will think it looks like cloaking and penalise accordingly.

Some background:


We are building a new version of a site which we are releasing in a phased beta rollout.
URLs will be the same between old and beta. What you get to see depends on whether you are opted in to the beta or not.
Initially this will be private beta but will then have limited launch (e.g. 20% of public users)
Users get automatically opted-in to the new site when they come to it (with the option to go back to the old site if they want)
We want the old site still to get indexed by Google etc, but not the new one

support.google.com/webmasters/answer/93710?hl=en says this:


When Googlebot next crawls that page, Googlebot will see the noindex meta tag and will drop that page entirely from Google Search results, regardless of whether other sites link to it.


I've read a number of things, but nothing that categorically tells me whether the content or the URL is ignored.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Annie201

2 Comments

Sorted by latest first Latest Oldest Best

 

@Alves908

What I don't know is whether it also blocks the URL from showing up in search results, or just that particular content with that URL.


If you have a noindex robots tag then the URL should not appear in the search results. (It should be noted, however, that if you block the crawling of this URL, eg. with robots.txt, then Google won't be able to see the noindex meta tag and your page could still appear in the SERPs as a link-only result. See my answer to this question: Google is indexing pages with a “noindex” robots meta tag)


...but nothing that categorically tells me whether the content or the URL is ignored.


There is nothing stopping Google from crawling the page (as it can still follow links), but it won't be indexed. Not quite sure how you are differentiating between the URL and the content at that URL? If the URL is "ignored" in the SERPs then obviously the content on that page is also ignored. Unless that content also appears at another URL. (?)


wondering if Google's search engine will think it looks like cloaking and penalise accordingly.


If 80% of your users still see the same content that Google sees when crawling your site then I would think you'd be OK.

Obviously if 50+% of your users are seeing the beta content, and Google is not, then Google's index becomes very misleading to users searching so it would be understandable that you would get penalised in this instance.

10% popularity Vote Up Vote Down


 

@Ann8826881

In the Google database, the single most important table of the schema is the document table which hosts the URL and document id. All other database tables and data elements rely completely on this. While there is a unique document id (allows for a smaller index size), the URL uniquely identifies a page since there cannot be two pages on any given URL. However, there can sometimes be two URLs for any given page. But that should not be a problem if the page itself is marked noindex. Nothing to worry about here.

If you noindex a page, the URL is not listed in the document table and there is no other activity related to this page. This includes storing content.

Links are another matter.

When a link is found, the target URL can potentially show up in the SERPs for a period. If you do not want the target URL to show up in the SERPs, it is best to mark the link as nofollow and noindex. A target URL can show up in the SERPs when the link is first discovered. I argue that this should not occur, but Google argues strongly that it should. Google is just plain wrong on this one. Sorry G! URLs as a result of a dangling link should not appear in the SERPs if you have not indexed the content of that page. Period.

What happens is this. A URL is discovered and entered into the document table and given a unique document id. In the link table, typically, references are made to the link source page and target page using the document id. If Google has yet to index the target page, the reference to the target page is left blank in the link table. This is called a dangling link. Normally, when the target page is indexed, then the reference to the target page is made in the link table. However, since you have marked the page noindex, the pages URL is removed from the document table and the reference to the target page is never made in the link table. However, Google does keep track of the link for the future.

That is in theory according to how Google originally describes the process. I rather suspect this has been modified somewhat but remains similar enough that this description is still worthwhile.

What is best is to mark any link to the new section of your site as nofollow and noindex as well as the page as noindex. That way, none of what you fear occurs even for a brief time. Better yet, if you have the option of creating a login process for the new site, that would absolutely ensure that your pages do not get indexed. I would only do this if you already have a login for your users, otherwise, it would be more work than necessary.

As far as cloaking is concerned. Google does check for cloaking using networks other than their own. They will hit a page and compare it to what has been indexed. Generally, these two accesses are close together to ensure that pages are not changed and sites are penalized innocently.

What you have to be concerned with is if Google and users see different pages per any given URL. If this is the case, then you can be in trouble.

Most beta sites are handled on a sub-domain or a sub-directory or in some manner where the URL is changed such as an additional parameter which would be enough. It may be that you will have to rethink the process a bit. A simple parameter change or addition should suffice. I assume also that an opt-in button could be deployed and a cookie added or updated which is checked prior to presenting the page. This may also work.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme