Mobile app version of vmapp.org
Login or Join
Jessie594

: How to prevent search engine bots from crawling specific pages? My website crashes my server when a certain page is crawled. I've tried implementing a robots.txt file to disallow that page and

@Jessie594

Posted in: #Nofollow #RobotsTxt #WebCrawlers

My website crashes my server when a certain page is crawled. I've tried implementing a robots.txt file to disallow that page and to implement a crawl delay, but web crawlers are ignoring it.

Is it possible to completely block a page from being crawled under any circumstance?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Jessie594

1 Comments

Sorted by latest first Latest Oldest Best

 

@Ravi8258870

I will answer your question exactly as asked in the hope that it will persuade you to reconsider if the root cause of your problem is not better resolved in some other way.


Is it possible to completely block a page from being crawled under any circumstance?


Not in a typically configured web server.

The robots.txt file is just a polite request by your server. No one has to pay any attention to it. The big respectable commercial crawling bots do obey it voluntarily but there are many many others that do not.

There are 2 basic ways to control access to your server

1

Allow everyone access by default and block a subset based on some combination of IP address, how they identify themselves, frequency of usage etc etc. This is the typical configuration of the vast majority of public web sites. It requires you to know enough information about the visitor to uniquely identify traffic from them. This may not always be possible.

2

Block everyone by default and only allow a subset based on the above criterion and things like cookies or client certificates in the browser.
Arguably this is not really a "public" web server anymore. It requires you to have a way of identifying ALL valid visitors automatically. Obviously this is impossible if you want any of the billions of normal web users to visit the site.

So you can completely block all possible bots using option 2 but you would then need some way to grant access to the visitors that you DO want to visit the site. This might be exactly what you intend (hard to guess without more context).

If option 2 is not what you want then you need to address the root cause of the crashing as the commenters have also encouraged you to do.

A separate question explaining the technical details of the crash and how to prevent it would be a helpful way to proceed.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme