Mobile app version of vmapp.org
Login or Join
Phylliss660

: How can I fix Google indexing URLs that are not part of my sitemap? Using site:example.com in Google is returning many results with the following format: https://www.w.example.com/services/edison/16mm-to-2k

@Phylliss660

Posted in: #Dns #Google #Indexing #Sitemap #Url

Using site:example.com in Google is returning many results with the following format: www.w.example.com/services/edison/16mm-to-2k
Obviously this is not what I submitted and is not part of my sitemap. What are some solutions for dealing with this kind of problem?

This is particularly a problem since they indexed the HTTPS protocol and all the links are showing a warning before visiting the site as a result.

Getting wildcard SSL certificates for *.w.example.com and *.ww.example.com seems like a bad idea.

The site's DNS runs through AWS Route 53 and the site is running on an Ubuntu 12.04 EC2 with Apache.

10.05% popularity Vote Up Vote Down


Login to follow query

More posts by @Phylliss660

5 Comments

Sorted by latest first Latest Oldest Best

 

@Courtney195

The question focuses a lot on what Google is doing but to me it appears that your fundamental problem not really Google specific at all.

Why do these names, which you clearly don't seem to want people to use, even exist in DNS?

If it is intentional that these names exist and resolve, why are you serving your actual site when people (and Googlebot) connect using these names? If you want to lead people to the site, it would be much better to do so by redirecting (permanent redirect / 301) them to the real site, using its canonical name, instead of leaving them navigating around your site using this incorrect name.

10% popularity Vote Up Vote Down


 

@Gail5422790

Google follows not only links made by other content writers, but it also heuristically interprets your javascript and even tries to "simplify" your URLs to strip them off wrappers, such as /index.php?page=news.php => /news.php! One way would be to ban those mangled URLs in your robots.txt, but that would (1) grow your robots.txt and make it messy, and (2) take away your rank for those links. You must either implement a 301 Moved Permanently or add a Canonical URL tag

<link rel="canonical" href="http://moz.com/blog" />


pointing to the most basic address of the same content. Beware, most "Chinese" bots won't obey this, so you might consider a server-side conditional that would redirect everything else but Googlebot and user browsers and leave Gogolebot and users with the metadata.

10% popularity Vote Up Vote Down


 

@Shakeerah822

Sitemaps serve to include, not limit the content Google indexes. If you want to exclude some files, use a robots.txt file as mentioned, or setup redirects.

The reason this URL is included is likely that Google found a link pointing to it somewhere else. It could be on your site (which you can fix) or on a third-party site as incoming link. To figure that out, you can use the link syntax link:https://www.w.example.com/services/edison/16mm-to-2k that will tell you what page(s) is linking there.

10% popularity Vote Up Vote Down


 

@Connie744

do you have a google webmaster tools account? if you create a free account with them and verify that you are the actual site owner then google will allow you to request for removal of a folder or specific urls.

my personal experience is that search engines take the liberty of not following instructions but this step would at least remove your pages from their index.

before you create an account pls change your robots.txt to disallow access to specific areas. as soon as you verify google will check the robots.txt file and update itself.
www.google.com/webmasters/tools

10% popularity Vote Up Vote Down


 

@Debbie626

Most likely some part of your web site generated links like that, and that is how Google started to crawl the URLs.

You should check the links in your web pages to see where these incorrect URLs are, and you should fix them.

Also, you could change your Apache configuration so that requests for any other virtualhost than example.com or example.com would 301 redirect to the correct URL at example.com. This way Google will eventually index the correct versions.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme