Mobile app version of vmapp.org
Login or Join
Welton855

: How do I disallow robots to access links on a certain port? I have a private web server set up on a non-conventional port (say 6677). All people can have access to the whole site's urls

@Welton855

Posted in: #RobotsTxt

I have a private web server set up on a non-conventional port (say 6677). All people can have access to the whole site's urls from 6677. The links by mysite:6677 are also indexed by Google, which is not desireable, as I need to keep port 6677 discreet as a kind of workaround behind the front proxy which runs on 80.

So, how can I disallow access to 6677 in robots.txt?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Welton855

1 Comments

Sorted by latest first Latest Oldest Best

 

@Ogunnowo487

If you wish to disallow crawler access to example.com:6677 using robots.txt then you simply have to host your robots.txt file on the appropriate port, ie:
example.com:6677/robots.txt

The specification does not allow you to specify a port in the robots.txt file itself. Any paths specified use the same protocol, port number and host by which the file is accessed.

But, as mentioned in this answer, disallowing in robots.txt does not necessarily prevent the URL from being indexed; it prevents it from being crawled.



However, as noted in the comments, it seems that the same site is accessible from both port 80 and port 6677. But only port 6677 should be blocked from crawlers.

Since both ports access the same site then they would both share a common robots.txt file and so both sites would be blocked, unless you conditionally returned a different robots.txt file depending on which port was used to access the site. This could perhaps be done using .htaccess and an internal rewrite, but I don't think a robots.txt is what you require since you could still run the risk of these URLs being indexed.

In my opinion you need to conditionally check for the port in your server-side script and send the appropriate META tag or HTTP response header back to the client. In PHP you could do this with something like the following (near the top of your script):

<?php
// Block robots from port 6677
if ($_SERVER['SERVER_PORT'] == '6677') {
header('X-Robots-Tag: noindex');
}
?>

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme