Mobile app version of vmapp.org
Login or Join
Caterina187

: Robots.txt: do I need to disallow a page which is not linked anywhere? There are some pages on my website that I want the user to be able to visit only if I give him/her the URL. If I

@Caterina187

Posted in: #RobotsTxt

There are some pages on my website that I want the user to be able to visit only if I give him/her the URL.

If I disallow the single pages in robots.txt, they will be visible by anybody looking into it.

My question is: if I don't link them from anywhere, or at least from any indexed page, would they still be reached by crawlers in some way?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Caterina187

3 Comments

Sorted by latest first Latest Oldest Best

 

@Angela700

In addition to the comments above, I would commend HTACCESS authentication as a minimum too - that way you can give individuals a username/password combination for the duration of their entitlement to see the page(s)

If there is anything with privacy issues then you need to consider a proper login control script.

An unprotected page (no matter how well hidden you think it might be) will make it into the wild.

10% popularity Vote Up Vote Down


 

@Alves908

You don't want the page to appear in the SERPs at all...

Don't disallow in robots.txt. Add a noindex meta tag (or X-Robots-Tag HTTP header) to your pages instead.

As j0k suggests, your pages could be found somehow. Stats reports, directory listings, etc...

Disallowing in robots.txt prevents the page from being crawled, but could still be indexed and could appear as a URL-only link in the SERPs. Something like:



A noindex meta tag prevents the page from appearing at all in the SERPs - but Google must be able to crawl the page in order to see the noindex meta tag - so it cannot be disallowed in robots.txt!

If there is anything on the page that must not be publicly available then the pages must be behind some kind of authentication.

10% popularity Vote Up Vote Down


 

@LarsenBagley505

Well I think you have good crawler that read the robots.txt and follow directive. And other one that doesn't follow directive.

And how do you plan to give this url? By email, using Facebook or Twitter? All of these services crawl information you send. Gmail parse email you receive to provide ads. So, your url will be somehow crawled.

Some people use the Google Toolbar (or whatever other toolbar from search engine). There is an option (checked by default if I remember well) that allow the toolbar to send all urls you visit to Google. This is an other way for Google to see the hidden web. So even if you told to the person to not share the url, implicitly he/she will (thanks to the toolbar).

I think we can find many other possibilities.

So you might add it to robots.txt but also provide extra meta like noindex, nofollow, etc ..

edit:

w3d's suggestion about robots.txt seems good to me. So don't add it to robots.txt and provide propre meta tag.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme