Mobile app version of vmapp.org
Login or Join
Alves908

: Remove subdomains from Google index and stop indexing them I am running static content through a CDN, cdn1-cdn5 I am using such subdomains for that. I am loading just images, CSS and JS files

@Alves908

Posted in: #DuplicateContent #GoogleIndex #GoogleSearchConsole #RobotsTxt #Subdomain

I am running static content through a CDN, cdn1-cdn5 I am using such subdomains for that.

I am loading just images, CSS and JS files this way, but apparently Google has indexed some pages on subdomains and they now appear in the Google index and they are duplicates of my "normal" pages.

The thing is that CDN is set the way to have files appear on subdomains without any extra uploading of stuff, meaning subdomains are mirror copies of content that is on main site, I can't upload files to subdomains, I can upload to main site and change www to cdn1 in address bar to show the same content through the CDN as is on my site.

I have 2 questions:


how do I remove the subdomains from Google index in GWT if it only allows to write anything that goes after `http://domain.com/ ?
how do I prevent from bots indexing the pages on subdomains when I can't upload special robots.txt files or upload a google's verification files to them to prove my ownership in GWT?


Maybe there is something else that I need to know related to this matter?

UPDATE: text in bold is updated

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Alves908

3 Comments

Sorted by latest first Latest Oldest Best

 

@LarsenBagley505

You can remove the sub-domains in webmaster tools, but first you need to add the sub domains as seperate sites and then submit a site removal. They should be gone within a day or so.

See these instructions for removing a site from google : support.google.com/webmasters/answer/1663427?hl=en

10% popularity Vote Up Vote Down


 

@Jamie184

Short answer.

Put <meta name="robots" content="noindex"> in the header of your HTML for all pages. Once the search engines have spidered these pages and you are sure of it, put

User-agent: *
Disallow: /


...in a robots.txt file in the root directory of each sub-domain.

This will take time of course. It can take 30-60 days typically for say Google to notice the changes and reflect it in the SERPs. It can take less or more time depending upon how Googles gauges freshness for your sub-domains.

10% popularity Vote Up Vote Down


 

@Yeniel560

There are different way, here you have some, you can use only one or combine them


Use rel="canonical". More examples and details
If you can use a .htaccess file, set a 301 redirect on the servers that you don't want to get indexed.


About the robots.txt, you can use it, but it's much better option to use a solution that is more robust and that all crawlers will have to follow, like the redirect.

Here you can see a short video from Matt Cutts talking about 301 redirects vs rel="canonical". A short extract from that page and video would be:


Okay, I sometimes get a question about whether Google will always use the url from rel=canonical as the preferred url. The answer is that we take rel=canonical urls as a strong hint, but in some cases we won’t use them:


For example, if we think you’re shooting yourself in the foot by accident (pointing a rel=canonical toward a non-existent/404 page), we’d reserve the right not to use the destination url you specify with rel=canonical.
Another example where we might not go with your rel=canonical preference: if we think your website has been hacked and the hacker added a malicious rel=canonical. I recently tweeted about that case. On the “bright” side, if a hacker can control your website enough to insert a rel=canonical tag, they usually do far more malicious things like insert malware, hidden or malicious links/text, etc.



On the video he mentions some more situations and reasons, like the fact that 301 has to be followed by everybody.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme