: Proper robots.txt block URL I want to block this URL: /services/online.html?dir=asc&limit=12&order=price but I want to allow this one: /services/online.html. I have added this on the robots.txt:
I want to block this URL: /services/online.html?dir=asc&limit=12&order=price but I want to allow this one: /services/online.html.
I have added this on the robots.txt:
Allow: /services/online.html
Disallow: /*?dir=
When I try to check on the Robot Tester it is still allowed. What could be the right code for it?
More posts by @Bryan171
4 Comments
Sorted by latest first Latest Oldest Best
There is some decent advice already in the existing answers but to answer your original question...
You could disallow all URL parameters like so if you wished:-
# Don't crawl URL's that contain sorting/narrowing parameters...
Disallow: *&cat=*
Disallow: *&dir=*
Disallow: *&p=*
Disallow: *&limit=*
Disallow: *&mode=*
Disallow: *&pg=*
Disallow: *&order=*
Disallow: *?cat=*
Disallow: *?dir=*
Disallow: *?p=*
Disallow: *?limit=*
Disallow: *?mode=*
Disallow: *?pg=*
Disallow: *?order=*
Although, it would be best to handle these in Google's Search Console where you can give Google specific instructions for each URL parameter that exists on your website.
You should know that URL
/services/online.html?dir=asc&limit=12&order=price
serving the same content as
/services/online.html?limit=12&dir=asc&order=price
as
/services/online.html?order=price&dir=asc&limit=12
and so on, and so on. Position of params in query string doesn't mean anything for a web server. So disallow just query starting with
?dir=
doesn't solve all your problems. So maybe you should try something like this
Disallow: /services/online.html?
But personally I would go for canonical tags, as pointed by danielwill786. It is recommended method. You can read about it here from Google.
Allow: /services/online.html
Disallow: /*?dir=
The most specific rule (based on the length of the path argument) wins when resolving Allow: / Disallow: conflicts - regardless of the order of the directives in the file. So, for the given URL, the first rule wins because it is the most specific path that matches the requested URL.
To resolve this you can either make the Disallow: directive more specific (ie. longer), for example:
Allow: /services/online.html
Disallow: /services/online.html?dir=
Or, simply remove the Allow: directive altogether, since allow is the default action. So, all you would seem to need is:
Disallow: /*?dir=
A URL of the form /services/online.html would be implicitly allowed since it doesn't contain a query string that starts ?dir=.
It appears that you are looking forward to block URL with appearing with sorting parameters. The best thing to do is add canonical tags. It the recommended method
Coming to the above, The robots test in webmaster often takes time to update and display the new data add in live robots.txt page. Please confirm that it has been updated.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.