Mobile app version of vmapp.org
Login or Join
Ann8826881

: Is it possible for web crawlers to see static pages without following a link to them? If I create a static page on a domain (http://www.domain.com/page.html), can a crawler still see it if

@Ann8826881

Posted in: #WebCrawlers

If I create a static page on a domain (http://www.domain.com/page.html), can a crawler still see it if there aren't any links to it anywhere on the site?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Ann8826881

2 Comments

Sorted by latest first Latest Oldest Best

 

@Si4351233

Another way the page may be discovered is when you have links to other sites on that page.

The URL of you page will appear in their referrer logs, a nice time-pass of many webmasters is to briefly browse through those logs and see what others are saying about their pages.

Some sites seem to provide access to those logs without any access restriction, thus crawlers can reach them too...

To keep the page really secret, don't let it link out to external sites.

10% popularity Vote Up Vote Down


 

@Kevin317

Can they see it? Yes. Can they find it? Not without help.

Web crawlers typically find pages to crawl by following links to them on other pages. Some crawlers (e.g. search engine crawlers) will also crawl pages listed in special XML files. So if there is no link to page on your website or any other website then that page will not be crawled (pages that contain the URL of that page but are in plain text will be found by Google).

However, once a page is found and crawled it may be crawled again even if all links to that page are removed from their respective websites. This is because pages that are crawled are then indexed (e.g. added to the crawlers list of pages to crawl again) so the crawler knows to crawl it again at a later time to look for changes. If you want to prevent this from happening you can do any of the following:

Most effective


Remove the page from the Internet
Changed the URL of that page (essentially removing the page and adding a new one)
Place it behind a login


Less effective


Block that page using a robots.txt file (which may be ignored)
Try to filter out bad bots by IP (which can change with every visit) or user-agent (may be spoofed)

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme