: Could a custom crawler find unlisted web pages? Example: A website has no sitemap.xml, no robots.txt, no index of those pages. Pages are not blocked, bots and humans have access, but they
Example:
A website has no sitemap.xml, no robots.txt, no index of those pages.
Pages are not blocked, bots and humans have access, but they would need the URLs
URL format: example.com/ofehdjtd/some-name.html
So every page has that random string in URL.
So these pages are unlisted like videos on YouTube, only if you have the URL you can view it.
Would a custom crawler be able to find these kind of pages? Blind guessing?
I'm asking this because some guy wants to sell me a custom spiderbot that he claims can find these pages.
More posts by @Cody1181609
2 Comments
Sorted by latest first Latest Oldest Best
Unlinked URLS are like passwords
Unlinked random strings URL would be subject to the same principle of cracking a password. Since the random string would need to comply with URL formatting the password type would most likely need to be a writable password without symbols other than those supported. A random writable password looks like this PBrEP3.
I covered a little about cracking times on a previous blog post I did awhile back on the topic of stopping WordPress brute force password hacks.
Crack Time
The crack time will vary depending on the length of the string and the amount of attempts possible per a second. A standard computer could most likely attempt 1 million guesses a second and so this would assume a:
6 digits zXrdR4
15.6 billion combinations would take around 2 hours on a home computer.
7 digits zXrdR4p
781 billion combinations would take around 5 days on a home computer.
8 digits zXrdR4p7
39.1 trillion combinations would take around 8 months on a home computer.
9 digits zXrdR4p78 would take around 31 years on a home computer.
1.95 quadrillion combinations would take around 31 years on a home computer.
The examples above on the time it takes to crack a password is taken from Pro Webmaster MOD Stephen Ostermiller on his website about Password types and strengths.
Talking about websites not computer passwords
However, since your talking about a website, the number per a second will be limited to the quality of server and the connection attached to it. Shared hosting for example, you would be lucky to get 100 attempts a second therefor it would take many years for a crawler to get one on standard hosting.
Not even Google gets a million visits a second, Google for example gets 40,000 search queries per a second, granted that not everyone will search but visit, so even if we tripled that its still far off 1 million.
Summary... use a 9 digit or more... then in 2 decades up it :)
One would have to assume it is using brute force attempts to guess the random URL. I suppose the answer is that given enough time, yes it will find pages without links to them.
However if the spider was attempting this on my server it wouldn't last long before my firewall blocked the IP address. Other servers are likely to not only block the IP address but add it to blacklists.
Not sure of the legal implications of doing something like that, I suppose it depends on whose site and in what jurisdiction, but it might not be viewed favourably.
I imagine if you used a bot like that it not only won't work as expected, but it will possibly backfire on you.
Go ahead and give it a go...report back. Hmmm...do they have internet in prison where you are?
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.