: Robots.txt dissalow url containing string with a '/' at the end i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns

Posted in: #GoogleSearchConsole #RobotsTxt #WebCrawlers

i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns corresponding to pages with duplicate content.

For example i have a page for article itemA belonging to category catA/subcatA, with URL:

/catA/subcatA/itemA

this is the URL that i want to be indexed from google.

this article is also visible via tagging in various other places in the web site. The URLs produced via tagging is like:

/tagA1/itemA

this URL i want NOT to be indexed from google. However i want to have indexed all tag listings:

/tagA1

so how can i achieve this? dissalow URLs of including a specific string with a '/' at the end?

/tagA1/ itemA - dissalow

/tagA1 - allow

10.03% popularity Vote Up Vote Down

: Lost traffic from Google because of meta-tag adding I have a site aroundnails.com. It has English version on subdomain en.aroundnails.com. Reading about language related meta-tags for Google, I

@Sherry384

Posted in: #Google #MetaTags #Traffic

2 Comments

: Are the websites hosted on Github restricted to programming? I was searching about static hosting and I found blog posts talking about hosting their blog on Github, the hosting feature of Github

@Sherry384

Posted in: #Github #StaticContent #WebHosting

1 Comments

: Use SSL on Heroku without paying Heroku for SSL? In this post, the author suggest a way to get SSL on Heroku without paying Heroku for it: Pay Cloudflare for premium service, and get SSL

@Sherry384

Posted in: #Cloudflare #Heroku #Https

1 Comments

: 301 redirects, query strings and SEO We have recently consolidated one site into another one. We set up all of our 301s and those all work famously. The bulk of the 301s (including the

@Sherry384

Posted in: #301Redirect #Seo #Url #UrlRewriting

1 Comments

Login to post a comment!

3 Comments

Sorted by latest first Latest Oldest Best

@Cofer257

You should not use robots.txt to block duplicate content.

The first step is to stop linking to 'bad' URLs. Each article should have one, canonical URL. So for example the URL /tagA1/itemA should not exist. On your tag page that lists the articles, they should link to the preferred URL of /catA/subcatA/itemA.

If for some reason that is not possible, or you have links pointing to the 'bad' URLs from elsewhere, there are two possible solutions:

301 redirect the 'bad' URL to the 'good' one. This could be done via htaccess, especially if there are clear patterns for the redirects. This is the preferred solution.
Use the "rel=canonical" tag. Details in Google help files

10% popularity Vote Up Vote Down

@Tiffany637

A different approach ,

If you are using any CMS(Wordpress,Joomla etc) every CMS have separate page for listing tags and tags/with item.

So you can simple use canonical urls or nofollow,noindex option with meta tags.

You already mention you have thousands of dynamic urls so its better to use meta tags with nofollow,noindex based on your requirement in each pages.

Hope its make sense.

10% popularity Vote Up Vote Down

@Bryan171

User-agent: Google
Disallow: /tagA1/
allow: /tagA1

If you use this, It will disallow all the pages comes after tagA1/ and not tagA

Learn more information about robots.txt from www.robotstxt.org/robotstxt.html

10% popularity Vote Up Vote Down

Feed

: Robots.txt dissalow url containing string with a '/' at the end i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns

More posts by @Sherry384

: Lost traffic from Google because of meta-tag adding I have a site aroundnails.com. It has English version on subdomain en.aroundnails.com. Reading about language related meta-tags for Google, I

: Are the websites hosted on Github restricted to programming? I was searching about static hosting and I found blog posts talking about hosting their blog on Github, the hosting feature of Github

: Use SSL on Heroku without paying Heroku for SSL? In this post, the author suggest a way to get SSL on Heroku without paying Heroku for it: Pay Cloudflare for premium service, and get SSL

: 301 redirects, query strings and SEO We have recently consolidated one site into another one. We set up all of our 301s and those all work famously. The bulk of the 301s (including the

Login to post a comment!

3 Comments

Back to top | Use Dark Theme