Mobile app version of vmapp.org
Login or Join
Gonzalez347

: How could Google identify if a video is duplicate? SEO duplicate content theme A few days ago Google's John Mueller said this: We do try to understand when something is a duplicate and

@Gonzalez347

Posted in: #DuplicateContent #Google #Seo #Video #Youtube

A few days ago Google's John Mueller said this:


We do try to understand when something is a duplicate and treat it appropriately. So we do that with textual content, webpages for example, we try to recognize if something is a duplicate and filter it out when we show it in search. We do it with images where we can and we do try to do that with video as well.

So if you go and host your video on a number of different services that doesn’t mean it is going to show up 5 times instead of once in the search results.


Link to the source

Question: How could Google identify if a video is really a duplicate?

If I upload the same video to Youtube, Vimeo and Dailymotion, it will get transcoded differently by each of these sites thus the videos will have different hashes...

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Gonzalez347

2 Comments

Sorted by latest first Latest Oldest Best

 

@Cofer257

There is a lot more to matching video than just comparing file hashes. Google developed and entire video matching system for YouTube called ContentID which checks every video uploaded against a library of copyrighted videos.

For a simple explanation, let's start with images. (Google does more than matching hashes there too.) Resizing/cropping any image, even by 1px, would provide a different file hash. So there are many techniques employed to determine similarity.

If the images are not the same size we would resize one to match the other. Then we would compare pixel-by-pixel. Of course most of the pixels will be slightly different, but they will be quite close. So if over the entire image, the average "difference" between the pixels is less than some threshold, the images are the same.

Expanding that for videos, we can repeat that process for several frames in a video, as well as snippets of audio.

Of course, Google's ContentID is far more advanced than my explanation, but hopefully it gives you a basic idea.

Further reading:


YouTube vs Fair Use
YouTube Content ID technology

10% popularity Vote Up Vote Down


 

@XinRu657

Try creating different versions of your video with differential sound, frame-rate and encoding for different websites that you are uploading to.

Google maps the timelines to the video frames much like how Shazam maps waveform to timeline (as a whole or clips) for audio.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme