Mobile app version of vmapp.org
Login or Join
Kevin317

: Only allow Google and Bing bots to crawl a site I am using following robots.txt file for a site: Target is to allow googlebot and bingbot to access the site except the page /bedven/bedrijf/*

@Kevin317

Posted in: #RobotsTxt #WebCrawlers

I am using following robots.txt file for a site: Target is to allow googlebot and bingbot to access the site except the page /bedven/bedrijf/* and block all other bots from crawling the site.

User-agent: googlebot
Disallow: /bedven/bedrijf/*
Crawl-delay: 10

User-agent: google
Disallow: /bedven/bedrijf/*
Crawl-delay: 10

User-agent: bingbot
Disallow: /bedven/bedrijf/*
Crawl-delay: 10

User-agent: bing
Disallow: /bedven/bedrijf/*
Crawl-delay: 10

User-agent: *
Disallow: /


Does the last rule User-agent: * Disallow: / disallow all bots from crawling every pages on the site?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Kevin317

2 Comments

Sorted by latest first Latest Oldest Best

 

@Fox8124981

Bots, especially bad ones, may ignore robots.txt file. So no matter what is written there some bots may crawl your site.

10% popularity Vote Up Vote Down


 

@Ann8826881

The last record (started by User-agent: *) will be followed by all polite bots that don’t identify themselves as "googlebot", "google", "bingbot" or "bing".
And yes, it means that they are not allowed to crawl anything.

You might want to omit the * in /bedven/bedrijf/*.
In the original robots.txt specification, * has no special meaning, it’s just a character like any other. So it would only disallow crawling of pages that literally have the character * in their URL.
While Google doesn’t follow the robots.txt specification in that regard, because they use * as a wildcard for "any sequence of characters", it’s not needed for them in this case: /bedven/bedrijf/* and /bedven/bedrijf/ would mean exactly the same: block all URLs whose path begins with /bedven/bedrijf/.

And finally, you could reduce your robots.txt to two records, because a record can have multiple User-agent lines:

User-agent: googlebot
User-agent: google
User-agent: bingbot
User-agent: bing
Disallow: /bedven/bedrijf/
Crawl-delay: 10

User-agent: *
Disallow: /

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme