: How to disallow robots from the first 185 pages? I have a website that whereby the first 185 pages are sample profiles for demonstration purpose: http://example.com/profile/1 ... http://example.com/profile/185

I have a website that whereby the first 185 pages are sample profiles for demonstration purpose:
example.com/profile/1 ... example.com/profile/185

I want to block these pages from Google as they are somewhat similar in content to avoid penalty for being labelled as duplicate content. Is there a better way to do it than listing them out in robots.txt like so:

User-agent: *
Disallow: /profile/1
Disallow: /profile/2
Disallow: /profile/3
...

10.03% popularity Vote Up Vote Down

: Which url is better for google SEO, recursively or directly? Say I'm building a e-commerce website. I have a lot of products in different categories. For example one of them is: Flat to

@Kevin317

Posted in: #GoogleSearch #Seo

2 Comments

: Does Googlebot care about valid HTML? In this YouTube Video, Matt Cutts says that the crawler is built to deal with HTML syntax errors. Google does not penalize you if you have invalid

@Kevin317

0 Comments

: Should I accept bounced emails when the emails are automatically generated from a "no-reply" user? I have a website that generates emails based on user action such as registration on the site.

@Kevin317

Posted in: #Bounce #Email

1 Comments

: How to improve onsite SEO if the site contain mainly picture? And design requires that we don't add texts. The site contains only pictures. I think it needs some text. However, I do not want

@Kevin317

Posted in: #Keywords #MetaKeywords

0 Comments

Login to post a comment!

3 Comments

Sorted by latest first Latest Oldest Best

@Ann8826881

It is not possible to use robots.txt (as defined by the original specification) in your case. A line like Disallow: /profile/1 will block all URLs whose paths start with /profile/1. So this applies to the profiles 1, 10-19, 100-185 (as intended), but also to the profiles 186-199, 1000-1999, 10000, … (not intended).

Workaround: Add a character as suffix, for example a /. So your profile URLs would look like profile/1/, /profile/2/, …. Then you could specify Disallow: /profile/1/ etc.

That said, some robots.txt parsers support additional features which are not included in the original robots.txt specification. As you say you want to block the pages for Google, Google gives special meaning to the $ character:

To specify matching the end of a URL, use $

So for Google, you could write Disallow: /profile/1$. But other parsers that don’t support this feature will then index your profiles 1-185 as they only look for URL paths literally starting with /profile/1$.

So when you don’t want to add a suffix (and list all Disallow lines explicitly), or if you don’t want a Google-only solution (without suffix, but still listing Disallow lines explicitly), robots.txt is no solution for you.

Instead, you could use:

on the HTTP level: the HTTP header X-Robots-Tag

X-Robots-Tag: noindex

on the HTML level: meta element with the robots name

<meta name="robots" content="noindex" />

Both ways are supported by Google.

10% popularity Vote Up Vote Down

@Si4351233

You are creating a file to be read by a robot, so create it with a robot:

<?php ob_start(); ?>

User-agent: *

<?php
header("Content-Type:text/plain");
$limit = 185;

for($i = 1; $i < $limit ; $i++)
echo "Disallow: /profile/$in";
?>
# rest of robots.txt here

Or if you are using leading zeros (better sorting) replace the echo line with:

printf("Disallow: /profile/%03dn", $i);

Of course, robots.php doesn't work, but that's what mod_rewrite is for:
In .htaccess:

RewriteRule robots.txt robots.php [L]

10% popularity Vote Up Vote Down

@Welton855

You could put the robots meta tag in all of those pages: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

10% popularity Vote Up Vote Down

Feed

: How to disallow robots from the first 185 pages? I have a website that whereby the first 185 pages are sample profiles for demonstration purpose: http://example.com/profile/1 ... http://example.com/profile/185

More posts by @Kevin317

: Which url is better for google SEO, recursively or directly? Say I'm building a e-commerce website. I have a lot of products in different categories. For example one of them is: Flat to

: Does Googlebot care about valid HTML? In this YouTube Video, Matt Cutts says that the crawler is built to deal with HTML syntax errors. Google does not penalize you if you have invalid

: Should I accept bounced emails when the emails are automatically generated from a "no-reply" user? I have a website that generates emails based on user action such as registration on the site.

: How to improve onsite SEO if the site contain mainly picture? And design requires that we don't add texts. The site contains only pictures. I think it needs some text. However, I do not want

Login to post a comment!

3 Comments

Back to top | Use Dark Theme