: I don't want my site to be analyzed on WooRank or builtwith.com I don't want my site to be analyzed on WooRank or builtwith.com. Is there any way I can do that by editing the robots.txt

I don't want my site to be analyzed on WooRank or builtwith.com.

Is there any way I can do that by editing the robots.txt file or any other possible way?

10.02% popularity Vote Up Vote Down

: Duplicate transactions in Google Analytics caused by page refreshes I'm using Universal Analytics on my order confirmation page: // Create the tracker ga('create', 'UA-XXXXX-Y'); // Fire off a pageview

@Ravi8258870

Posted in: #GoogleAnalytics #UniversalAnalytics

3 Comments

: Why my webpages body content is not taken into account to display in google SERP? It's confusing for me that I have content on my webpages, however, when I search with the site: operator it

@Ravi8258870

Posted in: #Googlebot #GoogleIndex #GoogleSearch #Keywords #Seo

1 Comments

: Compare mobile and desktop usage over time in Google Analytics Is there any custom dashboard to compare mobile vs desktop traffic over a year time period? I want to know how much usage has

@Ravi8258870

Posted in: #Desktop #GoogleAnalytics #Mobile #UsageData

1 Comments

: Robots.txt exclude certain urls and include others I have a robots text file which needs to mass exclude certain urls, currently setup as so Disallow: /somestring Disallow: /*/somestring However,

@Ravi8258870

Posted in: #RobotsTxt

1 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Becky754

There are literally thousands of sites similar to these. Most of them are scraper sites that use Alexa data and possibly other data to build their pages. But some go against your site specifically. Woorank.com and builtwith.com are two sites among the many that access your server directly.

I did a cursory search and did not see that either site will respect robots.txt. So here goes.

You can block both of these easily in your .htaccess file. I am assuming Apache. I do not know how to block using IIS or other web servers that do not use .htaccess.
#woorank .com
RewriteCond %{REMOTE_ADDR} ^103.21.(2*4*[4-7]*).([0-2]*[0-5]*[0-5]*)$ [NC]
RewriteRule .* - [F,L]
#builtwith .com
RewriteCond %{REMOTE_ADDR} ^5.39.([0-1]*[0-2]*[0-7]*).([0-2]*[0-5]*[0-5]*)$ [NC]
RewriteRule .* - [F,L]

These .htaccess rules block AS13335 - CloudFlare IP Address Range: 103.21.244.0 - 103.21.247.255, and AS16276 - OVH Systems IP Address Range: 5.39.0.0 - 5.39.127.255. These are not subscriber lines but rather hosting companies and you would not be blocking users.

If your site is listed, it is generally too late. They will likely remain listed. However, I have noticed that some of these sites will drop an entry once they determine that the server is inaccessible and after a period of time. But please note that these sites are for monetization and may not update a page unless requested by a user if at all. This means that it can take years for a site to try and update their data and therefore may not know your site is unavailable. Even then they may not care as long as they are getting search traffic.

10% popularity Vote Up Vote Down

@Goswami781

Robots.txt is a technological politeness. However, it is not a defined legal standard and legally, search engines and indexing engines do not have to follow it.

Yes, big search engines like Google are designed to follow the standard; that's why you get "a description for this page is not available because of this site's robots.txt".

However, many sites don't follow it. In the worst case, malicious sites may use robots.txt as a starting point specifically for pages to crawl, rather than to ignore.

So if you think these two sites are likely to follow your robots.txt, go for it. If not, you're going to need to record thousands of IPs for the two sites and specifically block them with htaccess or similar.

10% popularity Vote Up Vote Down

Feed

: I don't want my site to be analyzed on WooRank or builtwith.com I don't want my site to be analyzed on WooRank or builtwith.com. Is there any way I can do that by editing the robots.txt

More posts by @Ravi8258870

: Duplicate transactions in Google Analytics caused by page refreshes I'm using Universal Analytics on my order confirmation page: // Create the tracker ga('create', 'UA-XXXXX-Y'); // Fire off a pageview

: Why my webpages body content is not taken into account to display in google SERP? It's confusing for me that I have content on my webpages, however, when I search with the site: operator it

: Compare mobile and desktop usage over time in Google Analytics Is there any custom dashboard to compare mobile vs desktop traffic over a year time period? I want to know how much usage has

: Robots.txt exclude certain urls and include others I have a robots text file which needs to mass exclude certain urls, currently setup as so Disallow: /somestring Disallow: /*/somestring However,

Login to post a comment!

2 Comments

Back to top | Use Dark Theme