Mobile app version of vmapp.org
Login or Join
Samaraweera270

: Will Google penalize me for duplicate content on GitHub from my knowledge base? I want to have a copy of each of our documentation articles in GitHub so that other users can improve, edit

@Samaraweera270

Posted in: #DuplicateContent #Github #Google #Indexing #Seo

I want to have a copy of each of our documentation articles in GitHub so that other users can improve, edit and add to the document. The accepted changes will be made live in our knowledge base forum.

Does Google crawl GitHub files and content?
Will I be penalized for duplicate content on GitHub and on my site?

I got the idea from MS Azure documentation. If you scroll down to the end of this page - azure.microsoft.com/en-us/documentation/articles/virtual-machines-set-up-endpoints/ you will see an option to contribute to the article in GitHub.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Samaraweera270

1 Comments

Sorted by latest first Latest Oldest Best

 

@Kristi941

Ideally, no. It is very, very difficult to get penalised (Sandboxing, Deindexing).

Having duplicate content may 'de-value' your content and it will have less weight than it would organically but it's Google's job to identify the original content whilst devalueing other pages.

Setting a canonical link on your website is your way of telling search engines that this url is the original source of content.

<link rel="canonical" href="http://example.com/document">

As for external web-pages where you cannot control the <link> attributes in the header, search engines such as Google will have to infer who the original publisher is.

Whether this is due to date of indexing, relevance, page structure etc. A lot of Github Pages have their source code fully available and indexed on Github so I can confidently infer that Google can do the math on the source content- based on Github's architecture, content patterns etc.

Syndication is a normal part of the web and Google is very intelligent. Look at this example of the content from a Mashable article:

-site:mashable.com/2015/02/05/whatsapp-testing-voice-calling/ "It's not clear when the feature may be rolled out more widely or when the app's iPhone users will be able to use it."

As you can see, there are hundreds of verbatim content rips - doesn't harm Mashable whatsoever as the publisher.

Until something like rel=syndication is fully accepted into the spec, cases like this are really "Let Google do their job", you can only truly control content of your website.

As a final aside, you have to understand why duplicate content penalties exist and who they are targeting - they were originally formulated to devalue automatic content farms and content scrapers/spinners who were intentionally trying to game the system.

This is not you.

The modern way Google indexes the web, it's usually the first indexed page that gets the value (i.e. the first press release shows up, 400 clones are ommited).

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme