Mobile app version of vmapp.org
Login or Join
Bryan171

: Proper robots.txt block URL I want to block this URL: /services/online.html?dir=asc&limit=12&order=price but I want to allow this one: /services/online.html. I have added this on the robots.txt:

@Bryan171

Posted in: #RobotsTxt #Url

I want to block this URL: /services/online.html?dir=asc&limit=12&order=price but I want to allow this one: /services/online.html.

I have added this on the robots.txt:

Allow: /services/online.html
Disallow: /*?dir=


When I try to check on the Robot Tester it is still allowed. What could be the right code for it?

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Bryan171

4 Comments

Sorted by latest first Latest Oldest Best

 

@Jessie594

There is some decent advice already in the existing answers but to answer your original question...

You could disallow all URL parameters like so if you wished:-

# Don't crawl URL's that contain sorting/narrowing parameters...
Disallow: *&cat=*
Disallow: *&dir=*
Disallow: *&p=*
Disallow: *&limit=*
Disallow: *&mode=*
Disallow: *&pg=*
Disallow: *&order=*
Disallow: *?cat=*
Disallow: *?dir=*
Disallow: *?p=*
Disallow: *?limit=*
Disallow: *?mode=*
Disallow: *?pg=*
Disallow: *?order=*


Although, it would be best to handle these in Google's Search Console where you can give Google specific instructions for each URL parameter that exists on your website.

10% popularity Vote Up Vote Down


 

@Steve110

You should know that URL

/services/online.html?dir=asc&limit=12&order=price


serving the same content as

/services/online.html?limit=12&dir=asc&order=price


as

/services/online.html?order=price&dir=asc&limit=12


and so on, and so on. Position of params in query string doesn't mean anything for a web server. So disallow just query starting with

?dir=


doesn't solve all your problems. So maybe you should try something like this

Disallow: /services/online.html?


But personally I would go for canonical tags, as pointed by danielwill786. It is recommended method. You can read about it here from Google.

10% popularity Vote Up Vote Down


 

@Ann8826881

Allow: /services/online.html
Disallow: /*?dir=



The most specific rule (based on the length of the path argument) wins when resolving Allow: / Disallow: conflicts - regardless of the order of the directives in the file. So, for the given URL, the first rule wins because it is the most specific path that matches the requested URL.

To resolve this you can either make the Disallow: directive more specific (ie. longer), for example:

Allow: /services/online.html
Disallow: /services/online.html?dir=


Or, simply remove the Allow: directive altogether, since allow is the default action. So, all you would seem to need is:

Disallow: /*?dir=


A URL of the form /services/online.html would be implicitly allowed since it doesn't contain a query string that starts ?dir=.

10% popularity Vote Up Vote Down


 

@Welton855

It appears that you are looking forward to block URL with appearing with sorting parameters. The best thing to do is add canonical tags. It the recommended method

Coming to the above, The robots test in webmaster often takes time to update and display the new data add in live robots.txt page. Please confirm that it has been updated.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme