: How to publish content from OCR of a poster I have a website with thousands of posters containing useful information. At this moment, OCR reads the the image file during upload process and
I have a website with thousands of posters containing useful information. At this moment, OCR reads the the image file during upload process and I have a separate column in database to store this data and allow users to perform full text search in it.
I would like to make this data available to search engines so my website will have advantage of original content others don't have, but this text from OCR is not formatted, not presentable for the user.
From my research, data hidden from user is not considered as a good practice.
My question is, what kind of tag is used for this or how is this done in industries that deal with images, videos, books, ... but have some metadata that is not meant to be displayed?
What we already have:
All the posters have title and alt tag, that is done by moderators carefully by hand. We also semi-automatically generate schema metadata defined by schema.org type Event, where we publish structured data for event on the poster, but OCR data is a lot of unpredictable text, not formatted and always different, we do not have capacity to edit it all, but is mostly useful data.
More posts by @Looi9037786
2 Comments
Sorted by latest first Latest Oldest Best
If you want to have search engines index it, you have to put it on the page where users can see it.
Users can get benefit from this data too. They could copy and paste a snippet. They could bump up the font size to make words legible.
It doesn't have to be prominent on the page. It could be in a section below the fold where you need to scroll to see it. Just make sure it can be found on the page using ctrl-f browser search.
If the text isn't good enough to show to users, it isn't good enough to have search engines index. If you need to improve the quality of it, allow users to be able to edit it like a wiki to improve it.
From my research, data hidden from user is not considered as a good
practice.
That's true.
If you are searching for tag that can give description for multimedia files, then it's alt tag that you are searching.
Here's an example:
<img src="image.jpg" alt="image description" title="image tooltip"/>
Every time you have an image in your text, the alt tag should describe what’s on the image. Screen readers for the blind and visually impaired will read out this text and thus make your image accessible.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.