Mobile app version of vmapp.org
Login or Join
Sarah324

: How is crawler seeing unlinked directories / files? I'm running a crawler on my website to test for broken links and such. It starts by using a URL like www.domain.com One curious thing is

@Sarah324

Posted in: #WebCrawlers

I'm running a crawler on my website to test for broken links and such.

It starts by using a URL like domain.com
One curious thing is that it is showing directories with no internal links. For example, directory /example_dir/ is showing up in the crawl tree, but I can't find any internal link to that directory within the pages.

How could this be happening and is there a way to prevent it?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Sarah324

2 Comments

Sorted by latest first Latest Oldest Best

 

@Bryan171

My guess is that Jon is right, you must have a link somewhere. It might not show on the page but the spider is finding it.

Don't forget that code like this can happpen <a href="/my_dir/"></a>. Although it's blank to the user, it will be followed by the spider.

10% popularity Vote Up Vote Down


 

@Sarah324

What tool are you using to crawl your site?

Crawlers typically find new pages by following links so the odds are you have a link pointing to those directories. It may not be intentional, such as a a dynamic link that is pulling up bad data but not throwing out an error. If you aren't using Xenu's Link Sleuth I recommend using it as it will tell you what pages had links that lead it to crawl those directories.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme