Mobile app version of vmapp.org
Login or Join
Twilah146

: Robots.txt disallowing URLs I need to disallow some URLs on my site but I am not sure how to do that. I have a site that has products and reviews. When someone makes a review, the site

@Twilah146

Posted in: #RobotsTxt #Url

I need to disallow some URLs on my site but I am not sure how to do that. I have a site that has products and reviews. When someone makes a review, the site generates a URL automatically like this:

mysite.com/addreview_1.htm
mysite.com/addreview_2.htm
....
mysite.com/addreview_9999.htm


I need some way to disallow all the URLs which will appear in the future.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Twilah146

2 Comments

Sorted by latest first Latest Oldest Best

 

@Ann8826881

The original robots.txt specification has no concept of "full" URL. Whatever you specify as value for Disallow is always the start of the URL paths you want to block.

For example, see this robots.txt:

# robots.txt for example.com
User-agent: *
Disallow: /foobar.html


This will obviously block example.com/foobar.html. But it will also block:


example.com/foobar.html?foo=bar
example.com/foobar.html.zip
example.com/foobar.html.for.example
example.com/foobar.html/foo/bar


So, in your case you just need:

User-agent: *
Disallow: /addreview


It will block all URLs that begin with the string addreview:


example.com/addreview
example.com/addreview.html
example.com/addreview_1.htm
example.com/addreview_9999.htm


But it will also block an URL like (let’s assume it exists) example.com/addreviewer, of course. Which may or not what you want (depends on all your URLs you use).

So you need to find a part of a starting URL paths that matches to all the URLs you want to have blocked and doesn’t include any others.

10% popularity Vote Up Vote Down


 

@Angela700

You can add a wildcard entry to the robots.txt like:

Disallow: /addreview*


Google and other big players will honor the wildcards, but as this is a more recent addition to the robots.txt specification, there are probably still crawlers that ignore it.

This will also only work if the URLs you want to disallow have a common element that is not found in URLs you want crawled.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme