: Looking for explanation of Apache behavior on manipulating headers Tried to get an answer at ServerFault - they don't know, all real gurus are sitting here. The background of the issue: Googlebot

Posted in: #403Forbidden #Apache #Htaccess #HttpHeaders

Tried to get an answer at ServerFault - they don't know, all real gurus are sitting here.

The background of the issue: Googlebot creates non-existing URLs and tries to crawl them. On some URLs Apache fires 404 (correctly), on another URLs - 403 (wrong). I can't catch URLs with RegEx, where Apache fires 403, so i can't properly rewrite them to force 404.

I created following workaround to force 404 instead of 403:

i add to htaccess

ErrorDocument 403 /404.php
ErrorDocument 404 /404.php

also for both cases the same file.

And then, to force the correct header, i add to 404.php, at the beginning, <?php http_response_code(404); ?>
On this way i show Googlebot 404 even there, where Apache tries to answer with 403.

The question is: could somebody explain me, how this workaround indeed works detailedly? How i'm able to manipulate header on this way? I thought always, Apache decides, which answer code to serve, before it looks into htaccess...

10.01% popularity Vote Up Vote Down

: Is it best for SEO to have one large domain vs multiple small Building genuine links is tough, if you have only one domain that does everything vs many domains that are specific it will be

@Heady270

Posted in: #MultipleDomains #MultiSubdomains #Seo

4 Comments

: Can .htaccess be changed through FTP after it was edited in WordPress in a way that breaks the site? I made the mistake of putting in the following code on my .htaccess file thru Yoast SEO:

@Heady270

Posted in: #Ftp #Htaccess #Redirects #Seo

1 Comments

: Advice on URL structure I'm looking for some advice on my URL structure for a project I'm working on. I'm a web developer so my SEO knowledge is relatively limited. My website's purpose is

@Heady270

Posted in: #SearchEngines #Seo #Url

0 Comments

: Can I start a Microdata after and end the same just before ? I want to use itemscope Website. Can I start it just after <head> and end it just before </body>? For Website itemscope,

@Heady270

Posted in: #Microdata #SchemaOrg

3 Comments

Login to post a comment!

1 Comments

Sorted by latest first Latest Oldest Best

@Alves908

how this workaround indeed works

PHP runs later in the request, so most of the time you can simply override any headers that Apache has already set in your PHP code. That's pretty much it.

(Aside: Sending 403s through your 404 handler in this way obviously makes it harder to trigger a real 403 from your Apache config/.htaccess, if you should need to.)

most of the time

However, if there is a serious error (things are not working normally) then the server might respond with a 500 Internal Server Error - this is something that you may not be able to trap in your own code.

Also, by default, Apache is configured to return a (system generated) 404 for requests that contain an encode slash (%2F) - this is also something that you cannot override (without disabling this feature).

There are other situations where Apache will take over (mod_security etc), but otherwise, if things are running normally, you should be able to manipulate the entire response headers.

I thought always, Apache decides, which answer code to serve, before it looks into htaccess...

It does, but any code in .htaccess will override this. (Providing there are no restrictions in the server config preventing this.)

Googlebot creates non-existing URLs and tries to crawl them.

A lot of people see this behaviour. However, I don't think Googlebot is "creating" these URLs out of nowhere. It is more likely that these URLs are being found somewhere. (Or it's not actually a real Googlebot.)

On some URLs Apache fires 404 (correctly), on another URLs - 403 (wrong). I can't catch URLs with RegEx, where Apache fires 403, so i can't properly rewrite them to force 404.

Apache (mod_dir) will trigger a 403 when requesting a directory that doesn't contain an index document and where server-generated directory indexes are forbidden (hence the "403 Forbidden" response). mod_dir will also try to "fix" these URLs by appending a trailing slash (if omitted) - you will not be able to match the URL unless you include the trailing slash in your pattern (mod_dir fires early). So, this does sound like it might be a mod_dir issue. However, we'd need to see the URLs in question (and probably ask more questions about the server config / .htaccess files) to check this out.

Unless there is something else going on, you should still be able to trap/rewrite these URLs. Changing all 403s to 404s is not a particularly desirable workaround.

10% popularity Vote Up Vote Down

Feed

: Looking for explanation of Apache behavior on manipulating headers Tried to get an answer at ServerFault - they don't know, all real gurus are sitting here. The background of the issue: Googlebot

More posts by @Heady270

: Is it best for SEO to have one large domain vs multiple small Building genuine links is tough, if you have only one domain that does everything vs many domains that are specific it will be

: Can .htaccess be changed through FTP after it was edited in WordPress in a way that breaks the site? I made the mistake of putting in the following code on my .htaccess file thru Yoast SEO:

: Advice on URL structure I'm looking for some advice on my URL structure for a project I'm working on. I'm a web developer so my SEO knowledge is relatively limited. My website's purpose is

: Can I start a Microdata after and end the same just before ? I want to use itemscope Website. Can I start it just after <head> and end it just before </body>? For Website itemscope,

Login to post a comment!

1 Comments

Back to top | Use Dark Theme