Mobile app version of vmapp.org
Login or Join
Bryan171

: How do I disallow a specific query string in robots.txt? I have the URL http://www.example.com/shopping/books/?b=9 and the following robots.txt file: User-agent: * Disallow: /?b=9 But when I test

@Bryan171

Posted in: #QueryString #RobotsTxt

I have the URL
www.example.com/shopping/books/?b=9

and the following robots.txt file:

User-agent: *
Disallow: /?b=9


But when I test this in Google Webmaster Tool's robots.txt tester it is showing allowed when it should be disallowed.

Whilst /?b=9 is fixed, /shopping/books will change with different categories and I need to block them all.

Please tell me what's wrong with my robots.txt.

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Bryan171

4 Comments

Sorted by latest first Latest Oldest Best

 

@Sarah324

Doesn't a self altering text configuration file suggest an issue with your directories and actual ability to reach/edit that file? Not to cause panic, but... the input you entered changed....I don't think it's a text file issue.

10% popularity Vote Up Vote Down


 

@Harper822

I don't think there's such a way to do it in robots.txt and also whatever is advertised in robots.txt is also what can be advertised to hackers because robots.txt is a file accessible to all.

What I would suggest is to use your scripting language to detect for the query string you don't want people to access and if the query string matches, create a redirect to a relevant page people are allowed to access or take them to a page with a 410 HTTP code.

For example, in PHP, you can use either of these to block the b=9 parameter from being accessible:

<?php
if ($_GET['b']=="9"){
header("HTTP/1.1 410 Gone",true);
echo "This page is gone.";
exit();
}
?>

<?php
if ($_GET['b']=="9"){
header("HTTP/1.1 301 Redirect",true);
header("Location: example.com/newpage ,true);
echo "This page moved <a href="http://example.com/newpage">here</a>";
exit();
}
?>


If you are looking to specifically block just robots and not real users, then you could make the parameters accessible via POST only. Here's the HTML and PHP you can use:

Html:

<form action="phpscript.php" method="POST">
<input type="hidden" name="b" value="9">
<input type="submit" value="special page">
</form>


Php file named phpscript.php:

<?php
if ($_GET['b']=="9" && strtoupper($_SERVER['REQUEST_METHOD']) != "POST"){
header("HTTP/1.1 410 Gone",true);
echo "This page is gone";
exit();
}
?>


Only problem with the post method is that making post requests are generally non-cacheable based requests since they're primarily meant for user data submission.

10% popularity Vote Up Vote Down


 

@Miguel251

The answer is on the link i posted :

Disallow: /shopping/*/*?b=9


* is a joker which mean "all"

10% popularity Vote Up Vote Down


 

@Alves908

robots.txt is prefix matching, so a rule like Disallow: /?b=9 will block all URLs that start /?b=9. Your URLs start /shopp... so they are not blocked.

However, you can use a * (wildcard - 0 or more instances of any character) to represent the first part of the URL. This is an addition to the "standard", but the main search engine bots ("Google, Bing, Yahoo, and Ask") support it:

Disallow /*/?b=9


The above should block /shopping/books/?b=9 and /<anything>/?b=9.

Reference: developers.google.com/webmasters/control-crawl-index/docs/robots_txt?hl=en#url-matching-based-on-path-values

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme