Mobile app version of vmapp.org
Login or Join
Bethany197

: Can web crawlers that visit my site fake their user agent? I want to write a robots.txt file for my website and allow the famous bots (Google, Bing, and Yahoo) to crawl my website, but deny

@Bethany197

Posted in: #Googlebot #RobotsTxt #UserAgent #WebCrawlers

I want to write a robots.txt file for my website and allow the famous bots (Google, Bing, and Yahoo) to crawl my website, but deny the rest.

I want to know if I add User-agent: Googlebot, will fake Googlebot crawlers be able to view my website? Is even possible to fake a bot?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Bethany197

3 Comments

Sorted by latest first Latest Oldest Best

 

@Murphy175

As mentioned, user agents can be spoofed so it is unreliable for blocking access (in Google Chrome you can open Dev Tools and navigate to 'overrides' to change your user agent). No one with enough knowledge to spoof a major search engine's user agent is going to be deterred by robots.txt.

While it doesn't offer any protection from falsified user agents, adding directives to your server configuration files (e.g. .htaccess) to block user agents would offer you more protection...although I'm not sure why you'd want to do it.

10% popularity Vote Up Vote Down


 

@Heady270

Here is a robots.txt file that will allow Google, Bing, and Yahoo to crawl the site while disallowing all other crawling:

User-Agent: *
Disallow: /

User-Agent: googlebot
Disallow:

User-Agent: bingbot
Disallow:

User-agent: slurp
Disallow:


Some crawlers ignore robots.txt entirely and crawl whatever they feel like. Some crawlers impersonate Googlebot or another legitimate crawler. Some crawlers impersonate browser user agents such as Internet Explorer or Firefox.

There is a procedure for verifying that a user agent of Googlebot is actually a Google crawler. It involves doing some DNS queries against the IP address from which the crawler visited.

There is also the concept of a spider trap which is a place on your website that users wouldn't find, but crawlers would. A spider trap can be used to identify crawlers that are masquerading as browser user agents.

10% popularity Vote Up Vote Down


 

@Odierno851

Whether or not crawlers honor your robots.txt is entirely an on-your-honor based system. Nothing you put in that file is going to prevent a "fake" crawler from doing anything.

With regards to User-agent:, that value is completely voluntary as well. You can instruct your browser, or any other HTTP client to send whatever value you want for that header.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme