: Could a custom crawler find unlisted web pages? Example: A website has no sitemap.xml, no robots.txt, no index of those pages. Pages are not blocked, bots and humans have access, but they

Example:

A website has no sitemap.xml, no robots.txt, no index of those pages.
Pages are not blocked, bots and humans have access, but they would need the URLs
URL format: example.com/ofehdjtd/some-name.html
So every page has that random string in URL.

So these pages are unlisted like videos on YouTube, only if you have the URL you can view it.

Would a custom crawler be able to find these kind of pages? Blind guessing?

I'm asking this because some guy wants to sell me a custom spiderbot that he claims can find these pages.

10.02% popularity Vote Up Vote Down

: Is adding Disqus to website safe for user privacy? Some guides (see gohugo.io or webmasters.stackexchange.com for examples) recommend Disqus as a commenting feature for static sites. Is adding Disqus

@Cody1181609

Posted in: #Comments #Disqus #Privacy #Security

2 Comments

: Google Analytics tracking by previous session activity One of my managers has asked if it's possible to see how many users are returning to our website after having previously visited a specific

@Cody1181609

Posted in: #GoogleAnalytics #UniversalAnalytics

1 Comments

: Content over websocket from an external server: which server is hosting it, from a legal perspective? Example scenario: In Japan, adult content must be censored. So what if I have a server in

@Cody1181609

Posted in: #AdultContent #Legal #Redirects #WebHosting

0 Comments

: New cPanel user can't login to cPannel nor ssh I'm running a webserver with CloudLinux installed on it. To make the server management a little easier, we've also installed cPanel on it. With

@Cody1181609

Posted in: #Cpanel #Users

0 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Speyer207

Unlinked URLS are like passwords

Unlinked random strings URL would be subject to the same principle of cracking a password. Since the random string would need to comply with URL formatting the password type would most likely need to be a writable password without symbols other than those supported. A random writable password looks like this PBrEP3.

I covered a little about cracking times on a previous blog post I did awhile back on the topic of stopping WordPress brute force password hacks.

Crack Time

The crack time will vary depending on the length of the string and the amount of attempts possible per a second. A standard computer could most likely attempt 1 million guesses a second and so this would assume a:

6 digits zXrdR4

15.6 billion combinations would take around 2 hours on a home computer.

7 digits zXrdR4p

781 billion combinations would take around 5 days on a home computer.

8 digits zXrdR4p7

39.1 trillion combinations would take around 8 months on a home computer.

9 digits zXrdR4p78 would take around 31 years on a home computer.

1.95 quadrillion combinations would take around 31 years on a home computer.

The examples above on the time it takes to crack a password is taken from Pro Webmaster MOD Stephen Ostermiller on his website about Password types and strengths.

Talking about websites not computer passwords

However, since your talking about a website, the number per a second will be limited to the quality of server and the connection attached to it. Shared hosting for example, you would be lucky to get 100 attempts a second therefor it would take many years for a crawler to get one on standard hosting.

Not even Google gets a million visits a second, Google for example gets 40,000 search queries per a second, granted that not everyone will search but visit, so even if we tripled that its still far off 1 million.

Summary... use a 9 digit or more... then in 2 decades up it :)

10% popularity Vote Up Vote Down

@Smith883

One would have to assume it is using brute force attempts to guess the random URL. I suppose the answer is that given enough time, yes it will find pages without links to them.

However if the spider was attempting this on my server it wouldn't last long before my firewall blocked the IP address. Other servers are likely to not only block the IP address but add it to blacklists.

Not sure of the legal implications of doing something like that, I suppose it depends on whose site and in what jurisdiction, but it might not be viewed favourably.

I imagine if you used a bot like that it not only won't work as expected, but it will possibly backfire on you.

Go ahead and give it a go...report back. Hmmm...do they have internet in prison where you are?

10% popularity Vote Up Vote Down

Feed

: Could a custom crawler find unlisted web pages? Example: A website has no sitemap.xml, no robots.txt, no index of those pages. Pages are not blocked, bots and humans have access, but they

More posts by @Cody1181609

: Is adding Disqus to website safe for user privacy? Some guides (see gohugo.io or webmasters.stackexchange.com for examples) recommend Disqus as a commenting feature for static sites. Is adding Disqus

: Google Analytics tracking by previous session activity One of my managers has asked if it's possible to see how many users are returning to our website after having previously visited a specific

: Content over websocket from an external server: which server is hosting it, from a legal perspective? Example scenario: In Japan, adult content must be censored. So what if I have a server in

: New cPanel user can't login to cPannel nor ssh I'm running a webserver with CloudLinux installed on it. To make the server management a little easier, we've also installed cPanel on it. With

Login to post a comment!

2 Comments

Back to top | Use Dark Theme