Mobile app version of vmapp.org
Login or Join
RJPawlick198

: Do search engines crawl PDFs and if so are there any rules to follow when making them The website I am working on has a few hundred PDFs in it. I don't think I have ever seen any of

@RJPawlick198

Posted in: #Pdf #Seo

The website I am working on has a few hundred PDFs in it. I don't think I have ever seen any of them come back in a search but there are linked to directly from out site. They are also full of keywords because they are product documents.

Is there anything special we need to do to get Google or other search engines to crawl them?

Is there any hard and fast rules for making PDFs to help Google like them more? For instance should I run them through ghostscript to clean up broken PDF tags that Adobe creates during generation?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @RJPawlick198

3 Comments

Sorted by latest first Latest Oldest Best

 

@Nimeshi995

Just like making a website compliant can't hurt with your SEO, making your PDF accessible can't hurt. The Adobe built-in accessibility checker is far from perfect, but at least fixing those areas will get you started.

I probably spend 5 minutes on each 4 or 5, mostly text PDFs we put online. The time goes up evenly depending on the number of pages, and how complex those pages are.

Assuming you have Adobe Acrobat Pro to do your editing:


Run an Accessibility Full Check. (Quick check is pretty pointless to me)
Update the meta information in the document properties (keywords, subject, language, etc)
Make sure tags are added
Make sure the text is tagged as text, images as images, background stuff as background
Tag useless fluff (like decoration or design) as background
Add good alt text to the images
Make sure in the reading order, the text is ordered properly
In the content toolbar, make sure the text isn't duplicated or grossly mistranslated
Use the OCR scanner on scanned pages


For more advanced editing like tables and really oddball Adobe errors, we use a plugin called CommonLook. CommonLook gets the job done, but I hate it almost as much as I hate the Adobe tools.

Get familiar with the Touch Up Reading Order tool, the Tags toolbar, the Reading Order toolbar and the Content toolbar. My job requires fully compliant documents before going out on the web, but anybody could benefit from some simple tagging and document properties.

10% popularity Vote Up Vote Down


 

@Candy875

Google definitely indexes PDF files and you can search just for PDF files by adding filetype:pdf to your search query (example).

I would say the main things to do to optimise a PDF so it's easily indexed would be:


Give it a meaningful filename
Complete all the document metadata properties (title, author, keywords etc)
Make sure your PDF is comprised of actual text and not scanned images
Ensure you have good content with correct use of headings, just as you would an HTML document


For more tips read Optimizing PDF Documents and Eleven Tips For Optimizing PDFs For Search Engines

10% popularity Vote Up Vote Down


 

@Cooney921

I'm not sure about other search engines, but as far as Google is concerned the main rule would be to not exclude them via robots.txt

This was their initial announcement of supporting PDF search.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme