html - to make it work

Posted in: #RobotsTxt #WebCrawlers #Yandex

I'm having problem with robots.txt.
I put the robots.txt file in the website main directory (and also in /var/www/html - to make it work on all the server) but robots still keep crawling my websites.

this is my robots.txt:

User-agent: YandexBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot/1.2~bl
Disallow: /

Do you have any suggestions?

10.02% popularity Vote Up Vote Down

: Which is better for SEO HTML tables vs un ordered lists I am using un ordered lists to display product's specs data, also using valid microformats data structure for products. I am an author

@Carla537

Posted in: #Microformats #Seo #Table

0 Comments

: How can I get Google to index two pages for the same product with different categories? I have 2 pages that are very similar: same products but different categories. I need both to be indexed

@Carla537

Posted in: #DuplicateContent #GoogleIndex

1 Comments

: Is it SEO friendly to disable link clicks in JavaScript when the click is handled with AJAX or Angular? If I use the anchor tag as: <a href="/page1" onclick="return false;">page 1</a>

@Carla537

Posted in: #Ajax #AngularJs #Links #Seo

2 Comments

: Configuration error: couldn't check user: / I am getting the following error when trying to access documents on a new instance of Apache2: configuration error: couldn't check user: / The main

@Carla537

Posted in: #Apache2 #Configuration #Virtualhost

0 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Ann8826881

Note that your robots.txt is invalid (but that doesn’t necessarily mean that this is the reason for the issue you are having; bots might ignore such errors).

If a bot parses your robots.txt file strictly according to the robots.txt spec, this bot would only see one record, and this record would only apply to bots with the name "YandexBot". All other bots would be allowed to crawl everything.

The reason is that records must be separated with blank lines. So it should be:

User-agent: YandexBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot/1.2~bl
Disallow: /

If you’ll always have the same Disallow for all these bots, you could use one record with multiple User-agent lines, if you prefer it:

User-agent: YandexBot
User-agent: SemrushBot
User-agent: AhrefsBot
User-agent: SemrushBot/1.2~bl
Disallow: /

(You might have to use different names for some of the bots you intend to block, as @StephenOstermiller suggests in his answer.)

10% popularity Vote Up Vote Down

@BetL925

After you create your robots.txt file it will take a day or more for the crawlers that honor it to fetch it.

Yandex has a number of bots and have documentation about how to disallow all of them using robots.txt here: yandex.com/support/webmaster/controlling-robot/robots-txt.xml You might want to consider changing your robots.txt to this for Yandex:

User-agent: Yandex
Disallow: /

SEM Rush has two bots. Their documentation about it is here: www.semrush.com/bot/ You have disallowed one of them correctly but your second rule with the version number of the bot will not be effective. You should consider using these rules to disallow all SEM Rush crawling:

User-agent: SemrushBot
Disallow: /

User-agent: SemrushBot-SA
Disallow: /

You are already disallowing AhrefsBot exactly according to their documentation: ahrefs.com/robot
User-agent: AhrefsBot
Disallow: /

10% popularity Vote Up Vote Down

Feed

: Robots.txt isn't preventing my site from being crawled I'm having problem with robots.txt. I put the robots.txt file in the website main directory (and also in /var/www/html - to make it work

More posts by @Carla537

: Which is better for SEO HTML tables vs un ordered lists I am using un ordered lists to display product's specs data, also using valid microformats data structure for products. I am an author

: How can I get Google to index two pages for the same product with different categories? I have 2 pages that are very similar: same products but different categories. I need both to be indexed

: Is it SEO friendly to disable link clicks in JavaScript when the click is handled with AJAX or Angular? If I use the anchor tag as: <a href="/page1" onclick="return false;">page 1</a>

: Configuration error: couldn't check user: / I am getting the following error when trying to access documents on a new instance of Apache2: configuration error: couldn't check user: / The main

Login to post a comment!

2 Comments

Back to top | Use Dark Theme