: What does "Disallow: /search" mean in robots.txt? In my blog's Google Webmaster Tools panel, I found the following code in my robots.txt of blocked URLs section. User-agent: Mediapartners-Google
In my blog's Google Webmaster Tools panel, I found the following code in my robots.txt of blocked URLs section.
User-agent: Mediapartners-Google
Disallow: /search
Allow: /
I know that Disallow will prevent Googlebot from indexing a webpage, but I don't understand the usage of Disallow: /search.
What is the exact meaning of Disallow: /search?
More posts by @Steve110
5 Comments
Sorted by latest first Latest Oldest Best
it means that the user agent Mediapartners-Google will not be allowed to go into any of the directories under /search
/search/go blocked
/search blocked
/ not blocked.
Other answers explain how robots.txt is processed to apply this rule, but don't address why you would want to disallow bots from crawling your search results.
One reason might be that your search results are expensive to generate. Telling bots not to crawl those pages could reduce load on your servers.
Search results pages are also not great landing pages. A search result page typically just has a list of 10 pages from your site with titles and descriptions. Users would generally be better served by going directly to the most relevant of those pages. In fact, Google has said that they don't want your site search results indexed by Google. If you don't disallow them, Google could penalize your site.
In the Disallow field you specify the beginning of URL paths of URLs that should be blocked.
So if you have Disallow: /, it blocks everything, as every URL path starts with /.
If you have Disallow: /a, it blocks all URLs whose paths begin with /a. That could be /a.html, /a/b/c/hello, or /about.
In the same sense, if you have Disallow: /search, it blocks all URLs whose paths begin with the string /search. So it would block the following URLs, for example (if the robots.txt is in example.com/):
example.com/search http://example.com/search.html example.com/searchengine http://example.com/search/ example.com/search/index.html
While the following URLs would still be allowed:
example.com/foo/search http://example.com/sea
Note that robots.txt doesn’t know/bother if the string matches a directory, a file or nothing at all. It only looks at the characters in the URL.
Since the OP indicated in his comments that he was only interested in the "/search directory", my answer below is in regards to disallowing just a "search" directory:
The following is a directive for robots not to crawl something named "search" located in the root directory:
Disallow: /search
According to the following Google Webmaster Tools help doc below, directory names should be proceeded and followed by a forward slash /, as also specified in the other following reference sources:
Google Webmaster Tools - Block or remove pages using a robots.txt file
To block a directory and everything in it, follow the directory name with a forward slash.
Disallow: /junk-directory/
Robotstxt.org - What to put in it
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
In this example, three directories are excluded.
Wikipedia - Robots exclusion standard
This example tells all robots not to enter three directories:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
So according to Google (as copied above), the following would disallow bots with the user-agent Mediapartners-Google from crawling the "search" directory located in the the root directory, but allow all other directories to be crawled:
User-agent: Mediapartners-Google
Disallow: /search/
Allow: /
It tells AdSense not to crawl anything files in the /search directory or below (i.e. any subdirectories of /search).
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.