Mobile app version of vmapp.org
Login or Join
Rambettina238

: Is there a way to save MS Word document as HTML w/o the ms proprietary stuff? So normally I wouldn't use this feature ("Save as Web Page") but I have large documents from clients they just

@Rambettina238

Posted in: #Html #Markup #Microsoft #Xml

So normally I wouldn't use this feature ("Save as Web Page") but I have large documents from clients they just want put on their site as HTML, and formatting it all by hand seems like a waste of time.

I have tried "save as webpage" in Word 2007, but it produces all sorts of bad stuff. To wit:

<b style='mso-bidi-font-weight:normal'>
<span style="mso-spacerun: yes">


as well as a large block of XML formatting info:

<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Subject> </o:Subject>
<o:Author> </o:Author>
<o:Keywords> </o:Keywords>
...


As I said, formatting it all by hand seems like a waste of time, but the way MS exports currently just has too much cruft. Is there a way to export MS Word doc as html without all this?

EDIT: This document is a charter/bylaws type document and therefor has many levels of nested list. One of my criteria for "success" in this conversion endeavor is that the list hierarchy is retained, not discarded.

10.05% popularity Vote Up Vote Down


Login to follow query

More posts by @Rambettina238

5 Comments

Sorted by latest first Latest Oldest Best

 

@Murray155

I know this is three years old, but I came across it looking for the same answer today, for Office 2010 anyway there is an option to save as "filtered HTML" without the extra Microsoft code :

About using filtered HTML


When you save Web pages or send e-mail messages in HTML format with
Microsoft Word, additional tags are added so that you can continue to
use the full functionality of Word to edit your content.

To reduce the size of Web pages and e-mail messages in HTML format,
you can save them in filtered HTML so that the tags used by Microsoft
Office programs are removed.

This feature is only recommended for experienced Web authors, who are
concerned with the tags that appear in their HTML files.

If you reopen a Web page in Word that you saved in filtered HTML, your
text and general appearance are preserved, but you may not be able to
use certain Word features in the usual way to edit your files. For
example, the appearance of bulleted or numbered lists is preserved;
however, some of the Word functionality associated with lists will not
be preserved.

When possible, you should only save a Web page in filtered HTML when
you are finished editing the page in Word. However, if the underlying
HTML of your Web pages is not important to you, you should save your
files as a standard Web page.

If you will need to edit the file later, you can maintain two files:
one in Word format and one in filtered HTML format. You can edit the
content in the Word document, save it in Word format for future
editing, and then save a copy in filtered HTML format.

10% popularity Vote Up Vote Down


 

@Ann8826881

Try saving the Word document to an RTF format, then exporting that to HTML. Hopefully the RTF document wouldn't contain all of that complexity required in the Word document and will lead to simpler HTML.

10% popularity Vote Up Vote Down


 

@Merenda212

I has been a while since I've done this, but I believe that Google Doc's export to HTML works better than MS Word and I believe that Google Docs will read Word docs, so you might be able to load the doc into Google Docs and export it that way.

10% popularity Vote Up Vote Down


 

@Angie530

You can always use another application as an intermediary, like LibreOffice, and use it to save it as an HTML document.

LibreOffice (formerly OpenOffice, which is still available if you prefer it) generates much cleaner code comparatively.

10% popularity Vote Up Vote Down


 

@Candy875

There are some good answers in this What is the best free way to clean up Word HTML?

with HTMLTidy coming out on tops

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme