: Robot.txt disallow *?s Looking on the robot file of our ,soon to be, website. I want to know what prevent the site to be crawled. Is it this line ? If not, what will it disallow ? Disallow:
Looking on the robot file of our ,soon to be, website. I want to know what prevent the site to be crawled. Is it this line ? If not, what will it disallow ?
Disallow: *?s=
More posts by @Welton855
3 Comments
Sorted by latest first Latest Oldest Best
Usually, this line disallow inner search results from crawling.
The best way to prevent site from crawling is to close it with the password (custom authorization).
Disallow: *?s=
Bots following the original robots.txt specification would not be allowed to crawl URLs like these:
example.com/*?s= http://example.com/*?s=foo example.com/*?s=/
So they interpret *, ? and = literally (i.e., these characters have to appear at the beginning of the URL path).
But many bots use (their own) extensions to the robots.txt specification, where some characters are reserved, i.e., they get a specific meaning.
Google, for example, uses * for pattern matching:
To block any sequence of characters, use an asterisk (*).
That means the Googlebot is not allowed to crawl URLs like these:
example.com/?s= http://example.com/?s=foo example.com/foo?s= http://example.com/foo?s=bar example.com/foo/foo/foo?s=bar
Other bots may have other interpretations.
If you want to disallow all robots to crawl your site simply use:
User-agent: *
Disallow: /
User-agent: * means that all robots should follow the rule that comes next. And Disallow: / prevents them to crawl any path. You can see more here on robotstxt.org.
I think your Disallow: *?s= means that robots are not allowed to crawl URIs with parameters, but I'm not sure about that.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.