: Robots.txt isn't preventing my site from being crawled I'm having problem with robots.txt. I put the robots.txt file in the website main directory (and also in /var/www/html - to make it work
I'm having problem with robots.txt.
I put the robots.txt file in the website main directory (and also in /var/www/html - to make it work on all the server) but robots still keep crawling my websites.
this is my robots.txt:
User-agent: YandexBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot/1.2~bl
Disallow: /
Do you have any suggestions?
More posts by @Carla537
2 Comments
Sorted by latest first Latest Oldest Best
Note that your robots.txt is invalid (but that doesn’t necessarily mean that this is the reason for the issue you are having; bots might ignore such errors).
If a bot parses your robots.txt file strictly according to the robots.txt spec, this bot would only see one record, and this record would only apply to bots with the name "YandexBot". All other bots would be allowed to crawl everything.
The reason is that records must be separated with blank lines. So it should be:
User-agent: YandexBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot/1.2~bl
Disallow: /
If you’ll always have the same Disallow for all these bots, you could use one record with multiple User-agent lines, if you prefer it:
User-agent: YandexBot
User-agent: SemrushBot
User-agent: AhrefsBot
User-agent: SemrushBot/1.2~bl
Disallow: /
(You might have to use different names for some of the bots you intend to block, as @StephenOstermiller suggests in his answer.)
After you create your robots.txt file it will take a day or more for the crawlers that honor it to fetch it.
Yandex has a number of bots and have documentation about how to disallow all of them using robots.txt here: yandex.com/support/webmaster/controlling-robot/robots-txt.xml You might want to consider changing your robots.txt to this for Yandex:
User-agent: Yandex
Disallow: /
SEM Rush has two bots. Their documentation about it is here: www.semrush.com/bot/ You have disallowed one of them correctly but your second rule with the version number of the bot will not be effective. You should consider using these rules to disallow all SEM Rush crawling:
User-agent: SemrushBot
Disallow: /
User-agent: SemrushBot-SA
Disallow: /
You are already disallowing AhrefsBot exactly according to their documentation: ahrefs.com/robot
User-agent: AhrefsBot
Disallow: /
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.