: Prevent search bots from indexing server (sub)domain name A web application I wrote is hosted on an in-house server with the name myserver, which is under my university's domain (department.uni.edu),

Posted in: #Nginx #SearchEngines #Subdomain #Webserver

A web application I wrote is hosted on an in-house server with the name myserver, which is under my university's domain (department.uni.edu), resulting in the server's address being myserver.department.uni.edu. When I Google myserver, the first result is that exact server hosting the web application.

I have a robots.txt file for the application (root directory) with the following contents:

User-agent: *
Disallow: /

It's the actual server domain name that was indexed, and not anything in the web application.

I know that I can remove search results with Google Webmaster Tools, but how do I prevent Google from indexing a server's domain name (or address)? I believe the server is running Nginx on Ubuntu 14.10 (I am not the person in charge of the server, just coding the web application).

The desire here is to prevent the server from being indexed by web searching tools such as Google, Bing, Yahoo, etc. - basically block all known search engine crawlers.

Perhaps a solution is to block all crawlers to the subdomain's root (mysever.department.university.edu) using an Nginx rewrite rule such as:

map $http_user_agent $limit_bots {
default 0;
~*(google|bing|yandex|msnbot) 1;
~*(AltaVista|Googlebot|Slurp|BlackWidow|Bot|ChinaClaw|Custo|DISCo|Download|Demon|eCatch|EirGrabber|EmailSiphon|EmailWolf|SuperHTTP|Surfbot|WebWhacker) 1;
~*(Express|WebPictures|ExtractorPro|EyeNetIE|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|GrabNet|Grafula|HMView|Go!Zilla|Go-Ahead-Got-It) 1;
~*(rafula|HMView|HTTrack|Stripper|Sucker|Indy|InterGET|Ninja|JetCar|Spider|larbin|LeechFTP|Downloader|tool|Navroad|NearSite|NetAnts|tAkeOut|WWWOFFLE) 1;
~*(GrabNet|NetSpider|Vampire|NetZIP|Octopus|Offline|PageGrabber|Foto|pavuk|pcBrowser|RealDownload|ReGet|SiteSnagger|SmartDownload|SuperBot|WebSpider) 1;
~*(Teleport|VoidEYE|Collector|WebAuto|WebCopier|WebFetch|WebGo|WebLeacher|WebReaper|WebSauger|eXtractor|Quester|WebStripper|WebZIP|Wget|Widow|Zeus) 1;
~*(Twengabot|htmlparser|libwww|Python|perl|urllib|scan|Curl|email|PycURL|Pyth|PyQ|WebCollector|WebCopy|webcraw) 1;
}

location / {
if ($limit_bots = 1) {
return 403;
}
}

(borrowed from GD Hussle)

but, would this be sufficient or would something more sophisticated be necessary?

10.01% popularity Vote Up Vote Down

:

@Shakeerah822

0 Comments

: Do links in PDFs hosted online count as backlinks Do links in PDFs hosted online count as backlinks, and if they do does hyperlink text in a PDF count as anchor text for that backlink ?

@Shakeerah822

Posted in: #LinkBuilding #Pdf #Seo

1 Comments

: Check if domain registration has lapsed and been offered publicly in the past Im trying to tell if a currently registered doamin, registration has lapsed in the past. When i look at the Whois

@Shakeerah822

Posted in: #Domains #Whois

2 Comments

: Difference between `` and `rel="nofollow"` Is the meta tag <meta name="robots" content="nofollow" /> The same as rel="nofollow" in a link (apart from the fact the the meta tag would be page

@Shakeerah822

Posted in: #MetaRobots #MetaTags #Nofollow #Seo

2 Comments

Login to post a comment!

1 Comments

Sorted by latest first Latest Oldest Best

@Ann8826881

With robots.txt you can control crawling, not indexing. If a search engine is not allowed to crawl a document on your host, it might still index its URL, e.g. if it found the link on an external site.

You can control indexing with the meta-robots element or the X-Robots-Tag HTTP header (see examples).

You have to decide if you want to allow search engines to crawl but not to index, or to index but not to crawl. Because if you disallow crawling in robots.txt, search engines won’t be able to reach your documents, so they’ll never learn that you don’t want these to get indexed.

10% popularity Vote Up Vote Down

Feed

: Prevent search bots from indexing server (sub)domain name A web application I wrote is hosted on an in-house server with the name myserver, which is under my university's domain (department.uni.edu),

More posts by @Shakeerah822

:

: Do links in PDFs hosted online count as backlinks Do links in PDFs hosted online count as backlinks, and if they do does hyperlink text in a PDF count as anchor text for that backlink ?

: Check if domain registration has lapsed and been offered publicly in the past Im trying to tell if a currently registered doamin, registration has lapsed in the past. When i look at the Whois

: Difference between `` and `rel="nofollow"` Is the meta tag <meta name="robots" content="nofollow" /> The same as rel="nofollow" in a link (apart from the fact the the meta tag would be page

Login to post a comment!

1 Comments

Back to top | Use Dark Theme