: Best workflow for pdf to ePub processing and editing (large project) Colleagues have been asked to convert a large pdf archive (about 150,000 items) to ePub and Mobi (keeping the pdfs, too,
Colleagues have been asked to convert a large pdf archive (about 150,000 items) to ePub and Mobi (keeping the pdfs, too, of course).
I ran a few newish ones through Calibre to find the pain points: TOCs need deletion and recreation, tables dissolve into lines of text, extra line spaces become flush left periods. And of course the older pdfs are images, so there's no point to the conversion. Might as well retype...
If they do need to figure out how to do this after reporting all of the issues, I wonder if there are any known ways to automate any of the tasks involved. If each document needs individualized editing, the project looks to me like it will take years. Anyone interested?
BTW, I'm the only designer in the group; the others are archivists, data processors. So only I use InDesign (OS X); they use Windows 7 (I believe) and MS Office. No XML pros either as far as I know.
More posts by @Samaraweera207
4 Comments
Sorted by latest first Latest Oldest Best
I actually found an answer to the question not long ago that hasn't yet been mentioned, so I'm answering my own question. It's a simple program that rejiggers a larger PDF to mobile/ipad screen sizes, and outputs a new PDF to the dimensions you choose. And it works for scanned PDFs as well.
www.willus.com/k2pdfopt/
Quote from the forum where I heard about it: "It reworks PDFs to a format that is Kindle-friendly (optimized for a 6in wide screen, with an option for 4in wide phone screen). The files remain PDFs. See the attached files for comparison (the one with _k2opt is the revised PDF). It even works on scanned PDFs, as you will see. On top of all that, it will convert a whole folder of PDFs at a time."
Using this we could prepare the needed formats with less effort than any other method. We haven't committed to doing this, however we now have a way to do it in less than the 10 years I estimated it might take one person to do it.
Regarding the older PDFs that are images, you should have a look at this blog entry which might be of some help.
Calibre will not convert scanned PDFs into readable ebooks.
You should use a program like AbbyFine Reader to convert the scanned pdfs to .doc files. Then, you can import them into Adobe InDesign, arrange them and convert to epub.
From what I've read, PDF is just not a good source format for conversion to .epub and .mobi. (The one time I tried it, it was a nightmare.) (Also, I think the Calibre documentation confirms this.) I'm not a power user, but from what little I've learned, this project would be prohibitively expensive and still yield mediocre results. To confirm/disprove/further research this, I'd recommend the MobileRead forums. The creator of Calibre contributes there quite often and may have already addressed this.
Does the project really need .epub and .mobi files? Who's going to access this archive that couldn't read, say, an optimized, best-possible-quality, flowable, accessibility-standards-meeting PDF?
If you absolutely can't get out of it, some suggestions:
Concentrate on producing .epub files -- you know that once you have those, you can make good .mobis through Calibre.
There are software brands on offer (a quick Internet search will find examples) that claim to be able to convert PDF to ebook format; for a project this size, it might be worth buying one or two and trying them out.
And you probably already know this, but you could run OCR on the older PDFs to make the text recognizable.
Again, not a power user, so don't give up just because I said it can't be done.
Amanda
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.