Mobile app version of vmapp.org
Login or Join
YK1175434

: Complex Disallow pattern in robots.txt I have a URL like this: www.example.com/freelance-jobs-new-york I had a problem and many duplicated pages have been created like this: www.example.com/freelance-jobs-new-york-php-php

@YK1175434

Posted in: #RobotsTxt

I have a URL like this:
example.com/freelance-jobs-new-york

I had a problem and many duplicated pages have been created like this:
example.com/freelance-jobs-new-york-php-php www.example.com/freelance-jobs-new-york-php-php-php example.com/freelance-jobs-new-york-php-php-php-php

And so on, those pages have the same content as the main one, so what I did to fix it was redirecting all the pages with more than two times php keyword in the URL to the main URL.

But I have did it late, so Google has to redirect maybe more than 20.000 pages that have been already crawled.

So I want to setup a Disallow in robots.txt to block it for spending resources on those urls.

So my question is, what pattern should I use to disallow pages with more than two times the keyword php in the URL?

Will, Disallow: /*php*php* work as expected? I am asking this because I don't want to accidentally block good URLs.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @YK1175434

2 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

Simply you can use:

Disallow: /freelance-jobs-new-york-php-php*/

see this google page
support.google.com/webmasters/answer/6062596?hl=en&ref_topic=6061961

10% popularity Vote Up Vote Down


 

@Heady270

Googlebot does support wildcards in robots.txt. They announced this in their blog. googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html
Other browsers don't actually support wildcards, so that syntax is not universal.

However, putting urls into robots.txt does not prevent googlebot from indexing them. Your solution of the canonical tag sounds like a much better idea to get them out of the index. 301 redirects would also work.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme