Mobile app version of vmapp.org
Login or Join
Sherry384

: Robots.txt dissalow url containing string with a '/' at the end i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns

@Sherry384

Posted in: #GoogleSearchConsole #RobotsTxt #WebCrawlers

i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns corresponding to pages with duplicate content.

For example i have a page for article itemA belonging to category catA/subcatA, with URL:

/catA/subcatA/itemA

this is the URL that i want to be indexed from google.

this article is also visible via tagging in various other places in the web site. The URLs produced via tagging is like:

/tagA1/itemA

this URL i want NOT to be indexed from google. However i want to have indexed all tag listings:

/tagA1

so how can i achieve this? dissalow URLs of including a specific string with a '/' at the end?

/tagA1/ itemA - dissalow

/tagA1 - allow

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Sherry384

3 Comments

Sorted by latest first Latest Oldest Best

 

@Cofer257

You should not use robots.txt to block duplicate content.

The first step is to stop linking to 'bad' URLs. Each article should have one, canonical URL. So for example the URL /tagA1/itemA should not exist. On your tag page that lists the articles, they should link to the preferred URL of /catA/subcatA/itemA.

If for some reason that is not possible, or you have links pointing to the 'bad' URLs from elsewhere, there are two possible solutions:


301 redirect the 'bad' URL to the 'good' one. This could be done via htaccess, especially if there are clear patterns for the redirects. This is the preferred solution.
Use the "rel=canonical" tag. Details in Google help files

10% popularity Vote Up Vote Down


 

@Tiffany637

A different approach ,

If you are using any CMS(Wordpress,Joomla etc) every CMS have separate page for listing tags and tags/with item.

So you can simple use canonical urls or nofollow,noindex option with meta tags.

You already mention you have thousands of dynamic urls so its better to use meta tags with nofollow,noindex based on your requirement in each pages.

Hope its make sense.

10% popularity Vote Up Vote Down


 

@Bryan171

User-agent: Google
Disallow: /tagA1/
allow: /tagA1


If you use this, It will disallow all the pages comes after tagA1/ and not tagA

Learn more information about robots.txt from www.robotstxt.org/robotstxt.html

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme