: Stop Google from crawling proxied pages but still allow the proxy itself to be found via search engines I have a proxy at http://rahul2001.com/proxy/ I site can be 'proxied' by like this: http://rahul2001.com/proxy/proxy.php/
I have a proxy at rahul2001.com/proxy/
I site can be 'proxied' by like this: rahul2001.com/proxy/proxy.php/http://site-to-be-proxied.example.com/
The problem is that Google seems to be crawling pages through my proxy :/
I do NOT link a proxied version of Yahoo food or Google Careers or Google Search Hindi Help, but these still turn up in the search results...
PROBLEMS:
-I DO NOT want to block my website/proxy from search engines entirely, I still want the proxy service itself to be able to be located in search.
-I DO NOT want Google to use up my bandwidth by crawling useless sites.
-I DO NOT want to use captcha since a few of my apps use this proxy.
-I DO NOT want Google to spoil the search results of my website in this manner.
What do I do??
ALSO, why is Google entering random URLs into the form??
EDIT
After adding the meta tag, I get an error :(
proxy.php (first few lines):
<head><meta name="robots" content="noindex, nofollow" />
</head>
<?php
/*
miniProxy - A simple PHP web proxy. <https://github.com/joshdick/miniProxy>
Written and maintained by Joshua Dick <http://joshdick.net>.
miniProxy is licensed under the GNU GPL v3 <http://www.gnu.org/licenses/gpl.html>.
*/
Error:
Warning: Cannot modify header information - headers already sent by (output started at /home/rahulcom/public_html/proxy/proxy.php:3) in /home/rahulcom/public_html/proxy/proxy.php
HELL BREAKS LOOSE - GOOGLE IS CRAWLING THE INTERNET USING MY PROXY!!
www.google.co.in/search?q=site:rahul2001.com+proxy
More posts by @Alves908
2 Comments
Sorted by latest first Latest Oldest Best
OK, so, you do not want to block the proxy from search engines, but you don't want the result to show up on search engines? Sorry, I don't get it :) Is it that you want the original site to show up on top on the proxies? Google decides which pages are more important I'm afraid.
Also be careful with duplicate content. The same information should only be on one URL. For duplicates you should use canonical links (http://speckyboy.com/2012/07/16/what-a-canonical-link-is-and-how-to-use-it-properly/) to tell Google which of the page is the original one. This is probably the one that will show up on Google.
There is not need for two URLs to show up on Google with the same information in it?
EDIT
From comments i was lead to believe that you want to show /proxy/ but not anything under this page, like /proxy/subpage. In this case, use robots.txt like this:
User-agent: *
Disallow: /proxy/
Allow: /proxy/$
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.