Mobile app version of vmapp.org
Login or Join
Steve110

: Can google crawl an URL (say some static HTML file) that has no pointing link? Possible Duplicate: Is it possible for web crawlers to see static pages without following a link to them?

@Steve110

Posted in: #Googlebot #WebCrawlers

Possible Duplicate:
Is it possible for web crawlers to see static pages without following a link to them?




I have some URLs (some pdfs and static HTML files) in my website that I want only few people to know. These URLs dont have any pointing links from my website or any other source.

So my question is this:
Can google crawl an URL (say some static HTML file) that has no incoming link?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Steve110

3 Comments

Sorted by latest first Latest Oldest Best

 

@Radia820

Yes, Google will find it somehow!


They monitor people's browser/search history via Google Accounts/Toolbars/Social Networks and the like - then use that data to augment and prioritize their crawler.

So if a user visits your page while logged into a google account with it's search history tracking enabled Google may find out about your page. You also can't control what users post to social media sites and the like.

You can prevent it's inclusion in google's index though, robots.txt, a simple text file which sits in the www root directory of your server, will stop the GoogleBot in it's tracks.

House your non-google pages in a single directory and exclude like this:-

User-agent: *
Disallow: /your-directory-name/


As @Matteo and @Zaph note this isn't real protection and won't stop determined users finding your content. I use .htpasswd to block areas on my sites in conjunction with Coffee Cup Website Access Manager which outputs hashed htpasswd files and uploads them to your site, and provides multi user management.

10% popularity Vote Up Vote Down


 

@Sarah324

Usually no but you have to be really sure that the URL is not present anywhere on the web, in your sitemap if you are publishing it. You also have to be careful on web server access statistics if you make them public.

In addition You can always use the robot.txt file to tell Google not to crawl the URLs.

But this is just security by obfuscation, if you really want to protect them use a proper way (authentication/authorization)

10% popularity Vote Up Vote Down


 

@Eichhorn148

As long as there is no incoming link or anything that can point google towards the file (sitemap, open directory structure, etc..) then I believe that it shouldn't be indexed. alternatively you could put the files in a folder and block it through the robots file.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme