: How to fix blogger duplicate content m=1 m=0 problem Anyone has a proven solution for blogger duplicate content indexing problem. My posts are indexed with m=0 and m=1 parameters. What I have
Anyone has a proven solution for blogger duplicate content indexing problem.
My posts are indexed with m=0 and m=1 parameters.
What I have done till now:
blocked m=0 and m=1 on robots.txt (added: Disallow: /*/*/*.html?m=0 and Disallow: /*/*/*.html?m=1)
On Google Webmaster Central > Crawl > URL Parameters --- I have added the "m" parameter with effect: Paginates and Crawl (Which URLs with this parameter should Googlebot crawl?) with the value of No URLs.
On blogger template, I have added "nofollow" robots meta tag - when matching "data:blog.isMobile" condition.
Edited:
I'm using the canonical tag: expr:href='data:blog.canonicalUrl' rel='canonical'
I have custom domain for my blog.
More posts by @Jamie184
3 Comments
Sorted by latest first Latest Oldest Best
Google does not consider duplicate content when it serves differently for desktop search and mobile search, for example m.facebook.com/some-page/ and facebook.com/some-page/ will not consider as duplicate content even the content are same on two different URL.
I have used blogger recently and they use example.blogspot.com?m=1 when visitors/googlebot are coming from mobile devices. So they will handle this for you, and you really don't need to do anything. By adding this blogger robots.txt ?m=1 you're just blocking the Google mobile bot, which you should not, because now Google index the mobile content first.
If you already added your website on search console, then checkout HTML improvement option, and if Google display you have same title tag for two different URL then it means Google consider your content is duplicate, otherwise let Google do it's job.
Do not use site: or inurl operator to check duplicate content, Google own official blog display both URL's for this query inurl:"m=" "site:webmasters.googleblog.com"
So do nothing in your blogger blog, remove that robots.txt rules and remove the nofollow attribute as well.
I'm using this query inurl:"m=" "site:mydomain.com" to detect those posts with m=0 and m=1.
It would seem that what are seeing is simply the results of a site: search. Using the site: operator is not a "normal" Google search and has been shown to return non-canonical (including redirected) URLs in the SERPs. These are URLs that don't ordinarily get returned in a "normal" organic search (when no search operators are used). Even URLs that are the source of 301 redirects have been shown to be returned for a site: search, when they are not returned normally. These non-canonical URLs are still crawled (and processed) by Google and they are often acknowledged in a site: search.
Reference:
How to submit shortened URLs to Google so that they are included in the index
Related question: Google indexing duplicate content despite a canonical tag pointing to an extarnal URL. Am I risking a penalty from Google?
Normally, a rel="canonical" (which you have already done) is sufficient to resolve such conflicts with query parameters and duplicate content. But note that it doesn't necessarily prevent the non-canonical pages from being indexed (which you see when doing a site: search), but from being returned in a "normal" Google search.
blocked m=0 and m=1 on robots.txt ....
You probably don't want to block these URLs from being crawled as it could damage your ranking on mobile search.
BTW what about Disallow: /.html, Allow: /.html$
Aside: This looks "dangerous". Google doesn't process the robots.txt directives in top-down order. They are processed in order of specificity (length of URL), but when it comes to the use of wildcards, the order is officially "undefined" (which also means it could even change). The Allow: directive is also an extension to the "standard" and might not be supported by all search engines. It would be better to be more explicit. eg. Disallow: /*?m=. But, as mentioned, you probably should not be blocking these URLs in robots.txt anyway.
See also my answer to this question for more info about robots.txt and how it is processed:
Robots.txt with only Disallow and Allow directives is not preventing crawling of disallowed resources
If pages with both m=0 and m=1 or any other parameters have the correct canonical url you should have no issues.
If you don't have canonical url set for your pages, you should, it tels google what is the actual URL of the current page.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.