: Robots.txt dissalow url containing string with a '/' at the end i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns
i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns corresponding to pages with duplicate content.
For example i have a page for article itemA belonging to category catA/subcatA, with URL:
/catA/subcatA/itemA
this is the URL that i want to be indexed from google.
this article is also visible via tagging in various other places in the web site. The URLs produced via tagging is like:
/tagA1/itemA
this URL i want NOT to be indexed from google. However i want to have indexed all tag listings:
/tagA1
so how can i achieve this? dissalow URLs of including a specific string with a '/' at the end?
/tagA1/ itemA - dissalow
/tagA1 - allow
More posts by @Sherry384
3 Comments
Sorted by latest first Latest Oldest Best
You should not use robots.txt to block duplicate content.
The first step is to stop linking to 'bad' URLs. Each article should have one, canonical URL. So for example the URL /tagA1/itemA should not exist. On your tag page that lists the articles, they should link to the preferred URL of /catA/subcatA/itemA.
If for some reason that is not possible, or you have links pointing to the 'bad' URLs from elsewhere, there are two possible solutions:
301 redirect the 'bad' URL to the 'good' one. This could be done via htaccess, especially if there are clear patterns for the redirects. This is the preferred solution.
Use the "rel=canonical" tag. Details in Google help files
A different approach ,
If you are using any CMS(Wordpress,Joomla etc) every CMS have separate page for listing tags and tags/with item.
So you can simple use canonical urls or nofollow,noindex option with meta tags.
You already mention you have thousands of dynamic urls so its better to use meta tags with nofollow,noindex based on your requirement in each pages.
Hope its make sense.
User-agent: Google
Disallow: /tagA1/
allow: /tagA1
If you use this, It will disallow all the pages comes after tagA1/ and not tagA
Learn more information about robots.txt from www.robotstxt.org/robotstxt.html
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.