Mobile app version of vmapp.org
Login or Join
Annie201

: How to make whole site headers send code 404? I have a website for testing. I don't want this site to be indexed by search engines. Now pages on the site return 200 OK in headers. How

@Annie201

Posted in: #Indexing #Modx

I have a website for testing. I don't want this site to be indexed by search engines.

Now pages on the site return 200 OK in headers. How to make the whole site send code 404 in headers, but stay working?

Site is built on ModX.

10.05% popularity Vote Up Vote Down


Login to follow query

More posts by @Annie201

5 Comments

Sorted by latest first Latest Oldest Best

 

@Vandalay111

In that case I'ld rather have an .htpasswd authentication added to the root of my site. Simply add the following lines to your .htaccess file, then create an empty .htpasswd file and let the Htpasswd Generator create you the correct user-password-pair-string to paste inside it. Take a simple username and password you can remember and share with your colleagues. It's only to keep bots out, so it doesn't have to be too difficult.

.htaccess:

AuthType Basic
AuthName "My Protected Area"
AuthUserFile /path/to/.htpasswd
Require valid-user

10% popularity Vote Up Vote Down


 

@Nickens628

Using 404 error codes on the entire site is a terrible practice especially when search engines look at it because it gives them the idea that you're gonna fix missing pages yet your pages you plan to test aren't missing so it makes no sense to convert 200 statuses into 404 statuses.

The way you should tackle the problem depends on the level of security you want.

If you want only certain computers to be able to test your site, then you can modify the server configuration files to allow only certain IP addresses access to your site. That way, search engines can never access it.

The next most secure method is to include this line between <head> and </head> in your scripts or files that produce the HTML output:

<meta name="ROBOTS" content="NOINDEX">


You can also use the methods provided in the other answers. The only thing with robots.txt is that anyone trying to hack the system will have access to that file, so whatever you do, please DO NOT add comments in robots.txt file as it can give hackers a better shot at tampering your system.

10% popularity Vote Up Vote Down


 

@LarsenBagley505

From previous answers I understood that it is a bad practice. Thank you people.

But if you need to do exactly what I was asking about: you need just to add to index.php such phrase to the top before any code:

<?php
header('HTTP/1.1 404 Not Found');
?>


So every page of the site will have 404 status but will stay alive

10% popularity Vote Up Vote Down


 

@Lee4591628

If this is a test site that shouldn't be indexed at all, there are a couple of steps you can take that tell search engines not to index your site more effectively than returning 404 headers.

robots.txt

Include a robots.txt at the site's root including:

User-agent: *
Disallow: /


X-Robots-Tag

Include the following to your .htaccess to add an X-Robots-Tag across all resources:

Header set X-Robots-Tag "noindex,nofollow"


Password Protect

You could also require a password to access the website. This gives one way to do that, but the .htaccess portion would be:

AuthType Basic
AuthName "Password Protected Area"
AuthUserFile /path/to/.htpasswd
Require valid-user


IP Lock

Finally, you could block all traffic to this site by IP address. Only you or authorised IPs would be able to view the website and search engines would be locked out entirely.

10% popularity Vote Up Vote Down


 

@Candy875

This is an XY problem. You want to prevent indexing on your site and you know that 404s are not indexed, so you want to prevent indexing 'using' 404s. This is the wrong way to go.

There are many proper ways to prevent indexing such as using robots.txt, meta tags or authentication.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme