Mobile app version of vmapp.org
Login or Join
Ogunnowo487

: Serve up syntactic XHTML5 using the text/html MIME type? I have a site currently written with HTML5 tags. I'd like to be able to parse the site as XML, with support for namespaces, etc,

@Ogunnowo487

Posted in: #Html5 #Xhtml #Xml

I have a site currently written with HTML5 tags. I'd like to be able to parse the site as XML, with support for namespaces, etc, to facilitate programmatic extraction of data.

Currently I have <!DOCTYPE html> and

<meta charset="utf-8">


Which I gather is equivalent in HTML5 to explicitly setting the content-types as

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />


for my current setup. In order to serve XML it sounds like the right thing to do is

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">


Should I also change my Content-Type to

<meta http-equiv="content-type" content="application/xhtml+xml; charset=iso-8859-1" />


Or is that not necessary? What is the advantage of having content-type be "application/xhtml+xml"? What is the disadvantage? (Sounds like it may break internet explorer rendering of the site? but maybe that information is out of date now?)

Many thanks!

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Ogunnowo487

2 Comments

Sorted by latest first Latest Oldest Best

 

@Lengel546

It sounds like you're trying to create a Polyglot Document (and I sound like Clippy!) Essentially, that's an HTML5 document which is also valid XML.

Basically, you just need to carry on as normal, writing valid HTML5. You will need to close any self-closing elements (e.g. <br> becomes <br />, same for img, source, hr, etc) and make sure all attributes are quoted (e.g. class="foo", not class=foo).

Have a read of these:

dev.w3.org/html5/html-xhtml-author-guide/ stackoverflow.com/questions/3106699/should-i-write-polyglot-html5-documents philarcher.org/diary/2011/polyglot/ http://blog.whatwg.org/xhtml5-in-a-nutshell wiki.whatwg.org/wiki/HTML_vs._XHTML

(some of those might be slightly out of date)

Other notes:


The XHTML 1.0 Strict Doctype you mentioned in your question is not XHTML5. It's XHTML 1.0. Use <!DOCTYPE html> instead.
Serving up pages as application/xhtml+xml will stop them working at all in IE8 and below. Additionally if there are any errors at all in your markup, browsers that do support application/xhtml+xml will not render the page!
You're setting charset=UTF-8 in your HTML5 example and charset=iso-8859-1 in your XHTML. They're different things. If you don't understand where and when to use them, just use UTF-8 everywhere.
Using the XML Prolog (<?xml version="1.0" encoding="UTF-8"?>) drops older versions of IE into "Quirks mode", which is a legacy rendering mode and to be avoided.

10% popularity Vote Up Vote Down


 

@Courtney195

Not to be off-topic, but is this a dynamically-generated site? If so, why are you wanting to have people scrape data from your markup, rather than returning a format that is more suited to service-reading software?

It just sounds to me like the same purpose could be served by serving JSON or actual XML or RSS to clients requesting those feeds, and then you don't have to worry about whether you're sending down XHTML.

XHTML is much less likely to be as efficient as a dedicated feed, because unless you're returning really raw HTML markup without styles or any kind of interactive manipulation, you're going to be sending out a huge amount of junk-bits that a parser is going to just discard anyhow.

Most dynamic frameworks provide easy tools for generating JSON or XML directly from data sources, so I'd argue in favor of providing efficient feeds that can be consumed without having to parse through all the extras that are needed in most modern HTML markup.

Now, if this is not a dynamically generated site, I guess you might have a little more of a use-case, but then I'd be questioning why a static site would ever need to be used as a feed to a screen-scraper.

Sorry, just trying to get a better idea of the problem you're trying to solve, as there are a lot of ways to skin this cat, and an efficient solution may be presented with a little more detail.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme