Mobile app version of vmapp.org
Login or Join
Yeniel560

: How did Google manage to crawl my 403 pages? I had a couple private files in a directory on my school folder. You could see that the files existed by going to myschool.edu/myusername/myfolder,

@Yeniel560

Posted in: #Googlebot #Security #WebCrawlers

I had a couple private files in a directory on my school folder. You could see that the files existed by going to myschool.edu/myusername/myfolder, but trying to access the files themselves via myschool.edu/myusername/myfolder/myfile.html returns a 403 error.

And yet Google somehow managed to grab the contents of those private files and store them in its cache! How is this possible? [I've since removed those files, so I'm just curious how Google managed to do this.]

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Yeniel560

1 Comments

Sorted by latest first Latest Oldest Best

 

@Kaufman445

The most probable reason is that the pages won't return a 403 header.

You can check that using the Web Developer Toolbar in Firefox or Chrome. The tool is located under "Information" -> "View Response Headers".

Also, the way I create my error pages is:


I create some dummy error page. Let's say 403.php.
I create an actual error page. For example error403.php.
On the dummy error page, I put the following code: <?php header("Location: /error403.php",TRUE,301); ?>
In my .htaccess, I put the following:

Options -Indexes

ErrorDocument 403 /403.php


This adds all the redirects in a proper way and makes me sure I'm getting some juice from my error pages.

This can actually be extended in an extremely cool way if your website has a search engine which uses GET requests.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme