Mobile app version of vmapp.org
Login or Join
Martha676

: Voice to text duplicate content? Will a direct transcript of a voice to text blog post be seen as duplicate content Do Google web crawlers see this as duplicate?Do they have the tech and can

@Martha676

Posted in: #DuplicateContent #Google #GoogleSearch #Seo

Will a direct transcript of a voice to text blog post be seen as duplicate content Do Google web crawlers see this as duplicate?Do they have the tech and can and do they use it to recognize such possible duplicate content problems?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Martha676

1 Comments

Sorted by latest first Latest Oldest Best

 

@Nimeshi995

If you are concerned that an audio file and a transcript of the file would be duplicates, I assure you this is not a consideration. While it is technically possible to transcribe an audio file, this is not a pragmatic exercise. Fears of Google trying to understand everything ignores whether or not it serves any purpose. It does not. Not in search. Not when the web is viewed as an ontology to be understood. Audio and video does not fit this model at all and may never.

Let me explain.

Prior to Google, search engines were fairly simple text based search applications. This all changed when Google appeared on the scene fully intended to be a semantics based search engine. Semantics have existed for decades at least since the early 70's. While the technology at the time was mature, the primary problem with the application of semantics as a technology is the opportunity to apply the technology. Simply put, the ontologies, in this case a collection of documents, were not significant enough in size and scope to apply the technology except on rare occasions.

Enter the web.

Holy snarky SE posters Batman! We may actually have an occasion to use semantics. Admittedly, Google applied semantics in a fairly rudimentary way when it started, however, the application of the technology exploded in ways no one could have envisioned.

That said, it is technically possible to extract text from images, videos, and vocal content from audio. However, since semantics is designed to understand the written word and applied so thoroughly to text as it exists on the web, much would have to change.

Also consider that audio, for example, could contain other sounds as to make a recording difficult or impossible to extract the speech from. As well, with so many recordings, the question becomes, What could be extracted from recording and would it be of value?

Let us for the moment simplify matters. Take extracting text from images as an example. Not many images would have text and of the ones that do, not much text would be extracted. The next question would be, Would the text be of value? For image search? Yes. For document search? No. Why? Because of the high level of noise in the data. Even when the text is clear, there would be little to no value in it. As a signal, keeping in mind that data does not fair well with missing data or nonsensical data, the value of the signal would be nill. Applying this further, audio and video has the same problem. It does not fit the model of applying semantic analysis to the text of a large scale ontology already decades set.

Today, this is not a consideration and therefore audio files are not indexed and cannot create a duplicate content scenario. This may change in the future of course, but not without significant work. Because something is technically possible does not mean it is practical or make sense. Think smellivision for your T.V. The technology has existed for decades but makes no sense at this point. (Humor)

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme