: How can I block who.is and archive.org from getting information of my website with htaccess? who.is is a service that gives people whois information of websites and archive.org automatically saves

who.is is a service that gives people whois information of websites and archive.org automatically saves people's website. How can I block them using htaccess from getting access with my website?

10.02% popularity Vote Up Vote Down

: Does Google use AJAX powered hash fragment links in sitelinks? On my website, I use anchor tags to navigate as it's a single page. With that in mind, my links for the main nav look like:

@Hamaas447

Posted in: #Google #Links #Sitelinks

2 Comments

: Dynamics urls - redirects or rewrite and page SERP position I'm trying to get my head on this, our site have thousands of links, is an e-commerce site, where products titles require changes

@Hamaas447

Posted in: #301Redirect #302Redirect #Redirects #Url #UrlRewriting

1 Comments

: How can i Bulk Check URL Status? I have thousands line of URL. I need to know it is properly working or not like 200 or 404, how can I bulk check?

@Hamaas447

Posted in: #301Redirect #302Redirect #Redirects #Url

0 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Kevin317

Whois does not crawl your website to gather the info, therefore there is nothing to block. However, you can set your info to private with your registrar.

For the Internet Archive, you can do this:

in .htaccess, you can use this to block bots from accessing your site:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (archive.org_bot) [NC]
RewriteRule .* - [R=403,L]

This will return a 403. I have found the bot name here. The problem with this is that if the bot name is wrong, it will obviously not work.

Blocking crawlers is a job better suited for robots.txt like this:

User-agent: archive.org_bot
Disallow: /

However, Internet Archive says that robots.txt are meant for search engines and might very well be ignored by the Internet Archive.

A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to info@archive.org). As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly.

Source: blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/
Your best bet would be to send them an email to ask them not to list your site.

10% popularity Vote Up Vote Down

@Chiappetta492

You can block archive.org from crawling your site in your robots.txt file with

User-agent: ia_archiver
Disallow: /

I believe this will also block archive.org from accessing your site by putting this in your htaccess file:

SetEnvIfNoCase User-Agent "^ia_archiver" bad_bot

<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

10% popularity Vote Up Vote Down

Feed

: How can I block who.is and archive.org from getting information of my website with htaccess? who.is is a service that gives people whois information of websites and archive.org automatically saves

More posts by @Hamaas447

: Does Google use AJAX powered hash fragment links in sitelinks? On my website, I use anchor tags to navigate as it's a single page. With that in mind, my links for the main nav look like:

: Dynamics urls - redirects or rewrite and page SERP position I'm trying to get my head on this, our site have thousands of links, is an e-commerce site, where products titles require changes

: How can i Bulk Check URL Status? I have thousands line of URL. I need to know it is properly working or not like 200 or 404, how can I bulk check?

Login to post a comment!

2 Comments

Back to top | Use Dark Theme