Mobile app version of vmapp.org
Login or Join
Steve110

: How far does Google go to check duplicate content A long paragraph exactly matching the same duplicate on another URL can be detected by Google. But I'm curious about content in a table with

@Steve110

Posted in: #DuplicateContent #Google

A long paragraph exactly matching the same duplicate on another URL can be detected by Google.

But I'm curious about content in a table with rows order changed. Though the text will not match in order but the content is still almost duplicate.

Can it be classified as duplicate content?

For example Google landed me to this page when I searched for Dr. Latika Joshi www.grotal.com/Dehradun/Dr-Latika-Joshi-C78/
But below it's entry are other doctors. Now if I search Dr. K P Joshi, I land in a similar page with Dr. K P Joshi in the top and Dr. Latika Joshi in somewhere below it.

The content looks like this for a doctor Doctor-1:


Doctor-1
Doctor-2
Doctor-3
Doctor-4
Doctor-5


Now if I search for Doctor-3 then this site shows content in this way:


Doctor-3
Doctor-5
Doctor-4
Doctor-1
Doctor-2


Looks like such pages aren't penalized by Panda Algo.

So how long Google goes to find such content?

How can we improve the content so as to be more valuable. I guess using rel=canonical will not help such pages.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Steve110

2 Comments

Sorted by latest first Latest Oldest Best

 

@Kristi941

Yes. This is duplicate content. Here's Google's definition of duplicate content:


Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin


This content is the same content just presented in a different way. Basically, your content just hasn't been caught yet.

In this case you need to use canonical URLs to point to the "main" page you want indexed. Indeed, this will not only prevent potential penalties from having duplicate content but can help your SEO. One thing Google does when it finds canonical URLs is:


We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL.


This means all of the links pointing to the duplicate content are essentially considered to be pointing to the "main" page. That's obviously great for that page's rankings.

10% popularity Vote Up Vote Down


 

@Alves908

This is a good question and I am not sure anyone really knows the exact limits where duplication counts as duplicate content. The line is kept intentionally fuzzy to discourage this behavior. But let me try and explain a few things that might help you to understand how this works.

With the advent of Google Scholar in 2008, Google realized the power of citations and how it can positively effect it's search product. Without getting into too much detail, citation for Google is any match to original content, person, site, works, etc. that creates a linkage. For example, in 2012, a research paper was written and made available online that had my name as a researcher. I was able to follow the site usage data on this article. Now, please understand that my name does not appear on the web, or in my various site registrations, nor do I participate in social media (except this site starting in December 2013). However, my one site showed an increase in traffic that very closely matched the article day by day for months. One simple minor citation and Google instantly knew the link to my primary site. This was done with historic site registration data.

Google does looks for unique phrases and content snippets and other data and creates linkages between them. This can be link text, content of any kind, registration information, names, addresses, phone numbers, e-mail addresses (even partial e-mail addresses), and so on to create a linkage. This is the crux of the citation as we know it. Google keeps a database of citations of all kinds to help better understand site quality, authorship, and other things by which Google can rate content quality and return more satisfying results for it's users.

Please understand one thing. Previously, all one had to do to escape a duplicate content issue was to change the content enough or the formatting enough that Google would not notice. One way to do this was with CSS which would allow the content to appear the same to you and I but be ordered differently in the HTML code as Googlebot would read it. Google needed a better mechanism.

Part of it's duplicate content mechanism is looking for unusually high numbers of linkages between two elements as compared to the norm. For example, it is possible that a PDF file is posted as HTML. This would not necessarily constitute a problem, however, say, 13 citations between two HTML pages might indicate a high concentration of quoting, and more can indicate duplication. What the limits are and exactly how they are weighed, is kept secret.

With enough duplicate content, no matter what the format or order, that Google will notice it. What the result would be I think depends on how much the content is identical from one page to the next. It is not uncommon to cut and paste parts of another's work, primarily as quotes, in your own work. This is done quite often especially in the SEO market. But replicating too much content may effect one or both pages. Again, where the exact limit lies is any ones guess.

If you are concerned, ask yourself this. How much of my content is a duplication, albeit reordered and possibly formatted differently, of another work? Ask yourself if you think that enough duplication would constitute too many citation links in Google's eyes? Then ask yourself, should I be giving credit for the duplicate content and how would I do this? The answer to the last question I think depends upon the content, but options exist that could credit the citation and not be a distraction.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme