Mobile app version of vmapp.org
Login or Join
Cody1181609

: Could a custom crawler find unlisted web pages? Example: A website has no sitemap.xml, no robots.txt, no index of those pages. Pages are not blocked, bots and humans have access, but they

@Cody1181609

Posted in: #WebCrawlers

Example:


A website has no sitemap.xml, no robots.txt, no index of those pages.
Pages are not blocked, bots and humans have access, but they would need the URLs
URL format: example.com/ofehdjtd/some-name.html
So every page has that random string in URL.


So these pages are unlisted like videos on YouTube, only if you have the URL you can view it.

Would a custom crawler be able to find these kind of pages? Blind guessing?

I'm asking this because some guy wants to sell me a custom spiderbot that he claims can find these pages.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Cody1181609

2 Comments

Sorted by latest first Latest Oldest Best

 

@Speyer207

Unlinked URLS are like passwords

Unlinked random strings URL would be subject to the same principle of cracking a password. Since the random string would need to comply with URL formatting the password type would most likely need to be a writable password without symbols other than those supported. A random writable password looks like this PBrEP3.

I covered a little about cracking times on a previous blog post I did awhile back on the topic of stopping WordPress brute force password hacks.


Crack Time

The crack time will vary depending on the length of the string and the amount of attempts possible per a second. A standard computer could most likely attempt 1 million guesses a second and so this would assume a:


6 digits zXrdR4


15.6 billion combinations would take around 2 hours on a home computer.

7 digits zXrdR4p


781 billion combinations would take around 5 days on a home computer.

8 digits zXrdR4p7


39.1 trillion combinations would take around 8 months on a home computer.

9 digits zXrdR4p78 would take around 31 years on a home computer.


1.95 quadrillion combinations would take around 31 years on a home computer.



The examples above on the time it takes to crack a password is taken from Pro Webmaster MOD Stephen Ostermiller on his website about Password types and strengths.



Talking about websites not computer passwords

However, since your talking about a website, the number per a second will be limited to the quality of server and the connection attached to it. Shared hosting for example, you would be lucky to get 100 attempts a second therefor it would take many years for a crawler to get one on standard hosting.

Not even Google gets a million visits a second, Google for example gets 40,000 search queries per a second, granted that not everyone will search but visit, so even if we tripled that its still far off 1 million.

Summary... use a 9 digit or more... then in 2 decades up it :)

10% popularity Vote Up Vote Down


 

@Smith883

One would have to assume it is using brute force attempts to guess the random URL. I suppose the answer is that given enough time, yes it will find pages without links to them.

However if the spider was attempting this on my server it wouldn't last long before my firewall blocked the IP address. Other servers are likely to not only block the IP address but add it to blacklists.

Not sure of the legal implications of doing something like that, I suppose it depends on whose site and in what jurisdiction, but it might not be viewed favourably.

I imagine if you used a bot like that it not only won't work as expected, but it will possibly backfire on you.

Go ahead and give it a go...report back. Hmmm...do they have internet in prison where you are?

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme