JNRowe

website-xml

website-xml is a CMS

website-xml

Introduction

Since the sale of E-fort, I’ve been slowly designing and implementing a replacement content management system mainly for my own personal use. It has given me the chance to play with some of more interesting technologies available for putting together a website including XML, XPath and XSLT.

The example website, which you are reading now, is really just a proof of concept design enabling me to test features and discover issues relating to display.

At some point in the future website-xml will have a public release, until that time comes you can check the archive out from the AST dforce development archives if you have access. Currently there are approximately 10 items on the to-do list which I feel need to be completed before an initial public release would be useful.

An example page

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE page PUBLIC "-//JNROWE/DTD website-xml 3.0//EN"
    "http://fserv.astdev/website-xml/4.0/page.dtd">
<-- A global server has not yet been decided on-->
<article section="projects" name="test" keywords="test page, website-xml">
    <title>Test page</title>
    <author title="AST">
        <mail link="website@jnrowe.ukfsn.org">James Rowe</mail>
        <!--
            <position>AST Development Engineer</position>
        -->
    </author>

    <summary>
        This is for the page abstract.
    </summary>

    <!--
        If you use dforce it is possible to leave the revision and date tags
        out, and they will be filled in automatically using website repo
        information.
        <revision />
        <date />
        There is an example dforce hook available for formatting revision
        history for website-xml.
        <revhis>
            All tags available to chapter are available in revhis.
        </revhis>
    -->

    <chapter>
        <title>Note</title>
        <body>
            <para>
                A paragraph.
            </para>

<!--
website-xml defaults to using the build variables for the location of
links and images that have been specified with the 'l' prefix, for example:
<limg> for local images
<lref> for local links.
-->

            <limg link="dforce.png" desc="dforce logo" />

        </body>
    </chapter>
</article>

Note

The dforce integration is currently only available to users of dforce version 2.1 and above.

Dynamic vs static

Initially designed with static content generation in mind, website-xml has recently been updated to make dynamic usage possible.

The main features of website-xml when used to generate dynamic content are its support for Client Designed Output and On-demand Content Provision.

Client Designed Output

Client Designed Output(CDO) is a method for allowing users to specify a one-time layout for website-xml managed dynamic sites which all website-xml powered sites will use to display content for that client.

This feature makes it very easy for a user to set their own layout styles, compressed file type choices, etc and have them available everywhere website-xml is used.

This feature requires the use of cookies and a central server to work. When a user designs their interface, or chooses their options they upload a small data chunk to the central server which issues a OSF DCE compatible Universally Unique IDentifier(UUID) for the user which is stored in a cookie.

Note

If users do not have third party cookies enabled, for whatever reason, the worst case scenario is the same behaviour they had before - whatever the site designer chose for them.

When the user then visits another website-xml powered site the cookie is checked, the corresponding design information is queried from the central server, and data is returned to the client modelled on their previously defined settings.

Note

There is no specific requirement to use a single central server to host this information, however it is the most applicable method at this stage of development. It is just as simple to receive the information from another source if the need arises.

On-demand Client Provision

On-demand Client Provision(OCP) is a method for generating files dynamically, especially useful for distributing files in many differing formats with wide ranging popularity.

This method was originally put in place for the static website-xml code to allow for simple automated generation of patches, compressed tarballs and zip archives.

To take a simple example, imagine hosting website-xml. Instead of having to maintain packages in gzip compressed tarballs, bzip2 compressed tarballs, PK compatible ZIP archives and possibly even rzip compressed tarball formats to please users you just host a single file. If/when a user requests the file in that format it is automatically generated. This makes it incredibly easy to support many different filetype formats, and if users do not wish to take advantage of them they don’t take up space on the server.

Note

Normally it is recommended to cache all generated files to improve performance, even with caching you benefit from zero maintenance support of many different filetypes.

This method currently also supports generation of possibly compressed patches between versions, either immediate releases or across user chosen versions.

One of the other uses OCP has been put to in recent website-xml versions is for creating rasterised versions of SVG and MathML for browsers that are incapable of rendering them.

Why GNU Autotools?

At the outset I needed a fast way to produce the data for the website, and autotools provided that in and in a standard manner too.

The choice of autoconf was simple, as I knew I was going to need an easy way to produce the different versions of the site initially. One that contains all the pages, and has no real bandwidth restrictions for my pages that are hosted on the intranet at work. And a smaller subset of pages that are not embargoed in any way, that I can host on the UKFSN server.

Note

The reason for the mass culling of pages on UKFSN, is to ease maintenance for me. I could in theory deselect embargoed pages, but I prefer to manually select “free” pages as it reduces the chance of me publishing a document I shouldn’t [again].

The choice of autoheader is always a simple one, there is no point in maintaining your own collection of pre-processor directives when the information is available in configure.ac.

As for automake, I’ve never understood why people produce their own low-grade Makefile.in when they are using autoconf. The automake maintainers have spent an enormous of amount of time and effort on producing a system that is incredibly portable and featureful. Plus most users consider the automake targets to be almost a default.

Remote file system support

Since version 1.1.0 website-xml has supported direct output to remote file systems thanks to the AST ninstall automake feature.

Using ninstall it is possible to select a network URI as an installation prefix, for example ./configure --prefix=ftp://user:password@host/directory or ./configure --prefix=rsync://user:password@host/directory (you can escape the : and @ character if it appears in either the username or password by using a ). Once a site is built, and you run install target it will automatically install to the network address you provided.

If you do not wish to include the password in any of the prefix variables, and for security reasons it isn’t recommended, you can use the keyloop key management proxy. For more information on the security implications of remote file system support read the README.security in the ninstall source directory or /usr/share/doc/ninstall.

Note

This feature requires automake-1.8.1-ast5 or higher.

Modules

Many of website-xml’s features are implemented as pluggable modules. The list of modules include:

  • Output
  • Images
  • Downloads
  • Patch
  • Blog
  • RDF

Output

Output from website-xml is generated using simple output modules. Currently implemented output modules are HTML 4, XHTML 1.0, PS, PDF and AbiWord’s own abw XML-based format.

It is simple to add new modules, and new modules can import other output processors. For example, the HTML 4 output is implemented using only 4 XSLT directives after importing the XHTML module.

New modules need not use XSLT for their processing requirements, currently PDF output uses a C tool to implement its output(although future versions will use XSL-FO).

There is already a built in mechanism to include build-time replacements in to output files. Whether they be simple replacements for the current date and time, or complex replacements generated by passing the input through a filter. (Simple keyword replacement is implemented directly in the build system using a configuration Makefile snippet, see README.Keyword for information)

In version 4 and above there is also an integrated XHTML validator, that can be used in the output chain to check your personal filters are generating valid output. It can also be used to validate input provided using the web-based editor, for example in blog posts.

Images

website-xml features an automatic thumbnail generator, which can be used by specifying filenames in the project definition as timages:

# Images which do not require a thumbnail
images = sky.jpg
# This image will be installed, and a thumbnail named
# desktop_mini.png will be created and installed
timages = desktop.png
# To set the dimensions of the thumbnails use the
# thumbsize variable.  The default is 256x192(or the closest
# possible while maintaining an image's aspect ratio
thumbsize = 512x256

The current mechanism for generating thumbnails relies on the python-imaging module or image-magick. The choice of which to use is determined during the configuration process, with the default being python-imaging if it is available.

It is also possible to make website-xml convert all, or some, images to different filetypes or colour depths. Once again this feature requires an external utility, and is determined during the configuration process.

If you are using python-imaging it is also possible to make thumbnails that are passed through any supported filter, the current method is a little complex and is best explained in the image appendix in the manual.

Downloads

website-xml support file downloads, which can be automatically compressed if required. This is useful for instance if you only wish to manually manage a single tarball for a software distribution. Just place the uncompressed file in the support/files directory and tell website-xml you wish to serve compressed files. Currently bzip2 and gzip compression is supported, although it would be easy to add your own compression methods if you understand Makefile syntax.

It is also possible to specify download links using a printf style format, which allows website-xml to automatically handle showing a “latest” link for example. This way you can just add a new file for download without the need to manually change all the references to it in your pages.

Patch

If you specify a patch in the project definition, instead of using the files option, it will also be installed as a GNU enscript generated HTML file and becomes easily linkable from within a master document. It is possible to specify options to enscript or to use a different conversion tool if you wish.

Once you define a patch file it is automatically managed by the downloads module, and as such will benefit from automatic link checking and updating.

In the development branch there is a new patch handling mechanism which does not rely on enscript and also adds some nice new features such as file filtering and dynamic patch generation allowing you to only store the version tarballs in the source repository and let website-xml handle the building of patch files in both downloadable and HTML forms.

Note

With the new patch handling mechanism the patches are in part dealt with by the downloads module, which means it is also possible to serve compressed files. For example, if you add my_app-0.1.0.tar and my_app-0.1.1.tar then website-xml will be able to serve compressed tarballs, incremental patches, and compressed incremental patches without the need for manual management.

Blog

The blog module for website-xml is in effect a fully standalone module, it can be used outside of the website-xml framework. It stores its content in docbook-xml compatible files, although it can use a SQL database if you choose.

Note

Although the default source format is docbook-xml you can choose to use any input format you wish, as long as you are willing to write a small stylesheet to transform the input in to docbook-xml. For example, there is a website-xml DTD compliant blog stylesheet which allows you to use the website-xml format in the blog entries.

The blog module features “permalinks”, and is capable of handling multiple users who submit multiple entries per day. In fact, the day based blog entry system is just the default, it is possible to deal with any event frame. For example per hour entries can be used to ease the publication of network information through the blog module. You could even use it to host development information, it is simple to write a script which outputs dforce repository data in to an hourly updated blog style format(there is an example of such a use in the scripts directory in the source package).

The blog module supports automatic forward reference notifications. For example if you reference an entry dated 2003-12-16 in your 2005-01-01 entry, the entry for 2003-12-16 will contain a link to the newer entry to show that you have updated some information within the context of the original entry. This features makes it easier to add information, or clarify facts, without needing to break the content of old entries with new data.

This module supports multiple categories and entry drafting to allow you to generate personal/private feeds, and make it easier for users to locate content.

RDF

Initially the RDF module was only used for creating feeds from blog data, now it can be used for generating feeds from any content. Its main purpose is for generating index newsfeeds and revision history newsfeeds for individual documents.

RDF is also used to supply Dublin Core metadata for all pages under the control of website-xml. For more information on the Dublin Core Initiative visit dublincore.org.

The RDF creation process is totally transparent to the user, regardless of whether you are generating page metadata or newsfeeds. The only high-level configuration option is the maximum number of entries you wish to present in newsfeeds.

DOAP description files can be automatically generated from the XML source if required. With the default options set most of the DOAP vocabulary can be met, however it is possible to add extra data to the source files for generation of all the available DOAP entries.

Return to Top