... to disallow indexing: This should mean

Posted in: #GoogleSearch #Noindex #RobotsTxt #Serps

In my robots.txt file (http://www.tutorvista.com/robots.txt), I'm using Noindex: /content/... to disallow indexing:

This should mean that www.tutorvista.com/content/ and anything below this URL shouldn't be indexed. But in the image of my search results below, you can see that pages under this URL are being indexed:

Additionally, I'm using Disallow: /biology/ which means that www.tutorvista.com/biology/ and anything below this shouldn't be crawled. But in the image of my search results, you can see that pages under this URL are being crawled and indexed.

So can anyone tell me what's wrong with my robots.txt directives?

10.02% popularity Vote Up Vote Down

: Moving from VPS to AWS? We are developing a site currently hosted on a VPS. (Although the site is running Joomla, this isn't a Joomla-specific question.) We anticipated a few thousand users

@Speyer207

Posted in: #WebHosting

0 Comments

: Google Analytics - find exit rate for events, rather than pages? I have a single-page application, and I am tracking events on the page, in Google Analytics, passing our internal user ID and

@Speyer207

Posted in: #GoogleAnalytics

0 Comments

: Is it problematic for SEO to abbreviate a business name in the domain name? For example, the business name could be "purple valley grand artists" and the domain might be pvga.com. Is this incongruence

@Speyer207

Posted in: #Domains #Name #Seo

2 Comments

: Google Analytics combine statistics from different URLs I am working as a student assistent for a University on a Wordpress website that contains courses. They want for each course a overview

@Speyer207

Posted in: #Analytics #GoogleAnalytics #PageViews #Statistics #Wordpress

1 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Ann8826881

Note that Noindex is not part of the original robots.txt specification. Google supported it as experimental feature (see: How does “Noindex:” in robots.txt work?), but it’s not clear if that is still the case (as they didn’t document it to begin with). But let’s assume it is.

Your robots.txt has two problems.

Empty lines

A record must not contain empty lines. Empty lines are used to separate records.

A conforming bot (which doesn’t identify as Googlebot-Image/Adsbot-Google/Mediapartners-Google) uses this record:

User-agent: *
Allow: /

So none of the following Disallow/Allow/Noindex lines apply.

Of course a bot may try to "fix" this and interpret the following lines to be part of this record (i.e., ignoring the blank lines), but the robots.txt spec doesn’t define this, so I wouldn’t count on it.

... in Noindex values

If Noindex works like Disallow (which we don’t know for sure, as Noindex is not specified/documented, but I guess it wouldn’t make sense to specify it differently), the ... you appended to the values mean that ... must appear in the URLs you want to noindex.

The line

Noindex: /content/biology/...

would apply to a URL like /content/biology/.../foobar, but not to a URL like /content/biology/foobar nor /content/biology/.

So if you want every URL whose paths starts with /content/biology/ to be noindexed, you would have to specify:

Noindex: /content/biology/

10% popularity Vote Up Vote Down

@Yeniel560

"noindex" directives should not be used in your robots.txt file, instead a noindex meta tag should be added to any pages that you don't want indexed in Google.

A NOINDEX tag looks like the below and it should be placed in the section of any page you do not want indexed:

<meta name="robots" content="noindex">

More information can be found here.

In the second example while you do have "Disallow: /biology/" in your robots.txt file, a few lines above this you also have "Allow: /biology/animations/" hence why this page in indexed in your example.

Hope this helps!

10% popularity Vote Up Vote Down

Feed

: URLs with 'NoIndex` in robots.txt are being indexed by Google In my robots.txt file (http://www.tutorvista.com/robots.txt), I'm using Noindex: /content/... to disallow indexing: This should mean

More posts by @Speyer207

: Moving from VPS to AWS? We are developing a site currently hosted on a VPS. (Although the site is running Joomla, this isn't a Joomla-specific question.) We anticipated a few thousand users

: Google Analytics - find exit rate for events, rather than pages? I have a single-page application, and I am tracking events on the page, in Google Analytics, passing our internal user ID and

: Is it problematic for SEO to abbreviate a business name in the domain name? For example, the business name could be "purple valley grand artists" and the domain might be pvga.com. Is this incongruence

: Google Analytics combine statistics from different URLs I am working as a student assistent for a University on a Wordpress website that contains courses. They want for each course a overview

Login to post a comment!

2 Comments

Back to top | Use Dark Theme