Sadly many people use tools like Microsoft Word or OpenOffice.org to maintain large structured documents. If you’ve ever used a tool like this to maintain a large document you’ve probably already grown to hate it. Things like latex and DocBook make handling structured documents much more bearable. I’ve come up with some simple code that allows me to integrate DocBook documents into my website.
Latex was my first experience with a tool designed for structured documentation. It’s a form of markup that, when passed through a processor, creates output in whatever format you’d like (text, html, pdf, etc). Latex is widely used because it makes entering complex mathematical formulas easy and produces high quality output. I first used latex in college because a professor recommended it for a report. It didn’t take long before I developed a dislike for latex. What annoyed me the most was that you could not have arbitrarily nested sub-sections in your document. Having only a handful of nested sub-sections seemed like a needless restriction.
DocBook is a lot like latex in that it’s a form of markup for structured documents. The latest versions of DocBook use XML. Generally I feel that XML is bloated and a bad idea. DocBook doesn’t change my mind but it is a powerful standard that can be used to maintain structured documents. Maintaining an article, book, or just a report is relatively easy with DocBook. My first serious experience with DocBook was for my masters project report. After writing such a long report I was still reasonably happy with DocBook (that’s saying a lot!).
I wanted to write documents using DocBook and display them in my WordPress driven web site. It’s easy to generate html or xhtml from DocBook source but that produces a standalone document. I wanted to integrate it right into my site (with all the goodness of WordPress themes). My first plan of attack was to write my own stylesheet that would strip out the header and footer tags as well as the XML spec and DOCTYPE tag. This mostly worked except that xsltproc (correctly) insists on emitting a DOCTYPE when the output type is set to XML. There didn’t seem to be a good way around this so I went for another plan of attack.
My next idea was to use PHP to parse the xhtml (since it is XML) and strip out the elements that I couldn’t have in the output. This technique works but it comes at a price. Using this technique the server is doing a lot of extra work to display a document. This is using a SAX parsing technique, DOM would be even worse.
To include any XML document into my WordPress based website I use the following code:
<?php unset($_SERVER['PATH_INFO']); //Include current WordPress Theme Header etc. require('./wp-blog-header.php'); get_header(); include('gjr-wp-include/xmlInclude.inc'); gjrXmlInclude('foo.xhtml'); get_footer(); ?>
You can view the code for xmlInclude.php.