: Can google crawl an URL (say some static HTML file) that has no pointing link? Possible Duplicate: Is it possible for web crawlers to see static pages without following a link to them?
Possible Duplicate:
Is it possible for web crawlers to see static pages without following a link to them?
I have some URLs (some pdfs and static HTML files) in my website that I want only few people to know. These URLs dont have any pointing links from my website or any other source.
So my question is this:
Can google crawl an URL (say some static HTML file) that has no incoming link?
More posts by @Steve110
3 Comments
Sorted by latest first Latest Oldest Best
Yes, Google will find it somehow!
They monitor people's browser/search history via Google Accounts/Toolbars/Social Networks and the like - then use that data to augment and prioritize their crawler.
So if a user visits your page while logged into a google account with it's search history tracking enabled Google may find out about your page. You also can't control what users post to social media sites and the like.
You can prevent it's inclusion in google's index though, robots.txt, a simple text file which sits in the www root directory of your server, will stop the GoogleBot in it's tracks.
House your non-google pages in a single directory and exclude like this:-
User-agent: *
Disallow: /your-directory-name/
As @Matteo and @Zaph note this isn't real protection and won't stop determined users finding your content. I use .htpasswd to block areas on my sites in conjunction with Coffee Cup Website Access Manager which outputs hashed htpasswd files and uploads them to your site, and provides multi user management.
Usually no but you have to be really sure that the URL is not present anywhere on the web, in your sitemap if you are publishing it. You also have to be careful on web server access statistics if you make them public.
In addition You can always use the robot.txt file to tell Google not to crawl the URLs.
But this is just security by obfuscation, if you really want to protect them use a proper way (authentication/authorization)
As long as there is no incoming link or anything that can point google towards the file (sitemap, open directory structure, etc..) then I believe that it shouldn't be indexed. alternatively you could put the files in a folder and block it through the robots file.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.