: Why is Disallow: /search in Blogger's robots.txt? Can anyone tell me what does this mean in Blogger's "robots.txt" file? Do I need to edit anything in it? Should I remove /search from the Disallow:
Can anyone tell me what does this mean in Blogger's "robots.txt" file? Do I need to edit anything in it? Should I remove /search from the Disallow: line?
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: css3wdesign.blogspot.com/sitemap.xml
More posts by @Sims2060225
3 Comments
Sorted by latest first Latest Oldest Best
Robots.txt is a file that other websites, ISP's, and search engines use to "ask you" whats ok to visit. It allows you to whitelist or blacklist all or specific bots from areas of your realm. It's like a treaty. It's a promise. Good things keep the promise, bad things do not.
As far as search: I agree that in the past it was not good practice to allow robots to hit search. Nowadays, allowing Google to hit up search may work out well; at least in certain niches; and you don't even need search caching.
The robots.txt's across our platforms vary, but we always leave the search disallow commented out (AKA robots allowed to search, but it's ready to be uncommented if needed). There are a few reasons:
Fills in SEO - sometimes you will see search results popup for category niches you missed.
Fills in LSI - helps you create organics from organics, automagically
May Help RDF - this is edge but allowing G to search may expose rich snippets faster
Makes Authority - See a search page SERP result dominating organics? Turn it into a lander to gain PR
Helps G Understand - between tab-search in address bar, analytics search teach, and webmaster tools query string parameters, G will understand and help.
Look for areas in G analytics, G webmaster tools, and other G areas to set up search now and in the future.
In addition to closetnoc's answer...
Should I remove /search from the Disallow: line?
No. It is a good idea to block bots from crawling your search results (which I assume is what this is referring to).
You don't normally want your search result pages from appearing in Google's search results pages! And Google doesn't want this either. Google wants to index your actual pages, and return these in the SERPs. Allowing bots to crawl your search results (which could potentially be infinite) could also use up a lot of unnecessary bandwidth.
However, Mediapartners-Google (Google's AdSense bot) is permitted to crawl your /search results. I believe this is necessary if you wish to serve adverts from your search results pages.
Do I need to edit anything in it?
Not unless you want/need to block some bots from crawling certain areas of your site. Note that some bots will completely ignore your robots.txt file anyway.
Robots.txt is a way of telling bots (robot agents) where they can go and cannot go. It is placed in the root of your web site as a standard to be found easily. It is really that simple.
In your example:
User-agent: Mediapartners-Google is not disallowed. The Disallow: with nothing following is an allow all (without restriction).
User-agent: * is a directive that applies to all bots to disallow access URI /search (example.com/search) and allow access the site otherwise.
Sitemap: tells bots that you have a sitemap available. A sitemap is an XML (a standardized data mark-up language) formatted file that lists the pages of your site. This is handy for search engines to know your sites pages. Sitemaps are not always necessary, however, if some pages are not easily available to a search engine, the sitemap makes it easier for the search engine find our page.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.