MediaWiki Programming Side Trips: Potholes, Detours, and Dead Ends
By Eric Hartwell - last updated April 26, 2006
Potholes, Detours, and Dead Ends
I've always believed it's as important to keep a record of what doesn't
work as what does. This section documents some of the obstacles
opportunities I encountered along the way.
MediaWiki doesn't have a built-in XML/XSL processor, and neither does PHP 4
which is installed on my hosted server.
How about running an external process? PHP theoretically supports
COM,
.NET, and shell-based
Program Execution
functions. Of course, all these require the function to be installed on the
server with appropriate rights. My site is hosted on a Windows server, so it
already has to have the basic Microsoft COM components installed.
After way more munging around than seems humanly reasonable, I finally got a
PHP XSL script to work on the server:
This code only works on Windows servers.
<?php
$XmlDoc = new COM("MSXML2.DOMDocument");
$XslDoc = new
COM("MSXML2.DOMDocument");
$XmlDoc->async
= false;
$XslDoc->async
= false;
# The DOMDocument component needs a server-relative
path !!!
$XmlDoc->load("C:/Domains/ehartwell.com/wwwroot/testxsl/AS17Flight.xml");
$XslDoc->load("C:/Domains/ehartwell.com/wwwroot/wiki/testxsl/AS17Flight.xsl");
echo $XmlDoc->transformNode($XslDoc);
$XmlDoc =
NULL;
$XslDoc =
NULL;
?>
To verify that the XSL transform will work from inside MediaWiki, I built a test extension script:
This code only works on Windows servers.
<?php
# ApolloTranscript WikiMedia extension <ApolloTranscript>
some text </ApolloTranscript>
# The function registered by the extension gets the text between the tags as
input and can transform it into arbitrary HTML code.
# Note: The output is not interpreted as WikiText but directly included in the
HTML output. So Wiki markup is not supported.
# To activate the extension, include it from your LocalSettings.php with:
include("extensions/ApolloTranscript.php");
$wgExtensionFunctions[]
= "wfApolloTranscript";
function wfApolloTranscript()
{
global $wgParser;
# Register the extension with the WikiText
parser.
# The first parameter is the name of the new tag. In this case it defines
the tag <ApolloTranscript> ... </ApolloTranscript>
# The second parameter is the callback function for processing the text
between the tags
$wgParser->setHook(
"ApolloTranscript",
"renderApolloTranscript"
);
}
# The callback function for converting the input
text to HTML output
function
renderApolloTranscript(
$input,
$argv ) {
# $argv is an array containing any arguments
passed to the extension like <example argument="foo" bar>..
$XmlDoc = new
COM("MSXML2.DOMDocument");
$XslDoc = new
COM("MSXML2.DOMDocument");
$XmlDoc->async
= false;
$XslDoc->async
= false;
# The DOMDocument component needs a
server-relative path !!!
$XmlDoc->load("C:/Domains/ehartwell.com/wwwroot/wiki/extensions/AS17Flight.xml");
$XslDoc->load("C:/Domains/ehartwell.com/wwwroot/wiki/extensions/AS17Flight.xsl");
$output .=
$XmlDoc->transformNode($XslDoc);
$XmlDoc =
NULL;
$XslDoc =
NULL;
return $output;
}
?>
Some of the things I wish I knew before I started:
- The DOMDocument component needs a server-relative absolute path for the
file names, not url or relative path.
- XmlDoc->text
and XmlDoc->xml
sometimes appear to be empty, even when they're not.
- XSLT drops spaces between tags, but does not support Many sources recommend using   as a replacement, but
it turns out the HTML standard does not support ASCII characters
higher than #FF. MediaWiki outputs the   tags as ?, which
can be pretty misleading until you puzzle it out.
- The MSXML parser is much stricter than the one in Internet Explorer; IE
will display an XSL transform even when the XML and/or XSL aren't perfectly
well-formed.
MediaWiki uses the term "image" for any uploaded file. MediaWiki's
internal workings are not well documented, but, of course, the source is
available.
The image processing is handled by the Image object defined in the
file includes/Image.php. According to the source, you can load an image
using Image::newFromTitle(
$title
);
If $wgUseSharedUploads is set, the wiki will look in the shared
repository. If no file of the given name is found in the local repository (for [[Image:..]], [[Media:..]] links). Thumbnails will also be looked for and generated in this directory.
I also found a promising code snippet in includes/ExternalEdit.php:
# ExternalEdit.php
if ($this->mMode=="file")
{
$type="Edit
file";
$image =
Image::newFromTitle(
$this->mTitle
);
$img_url =
$image->getURL();
if (strpos($img_url,"://"))
{
$url =
$img_url;
} else {
$url =
$wgServer .
$img_url;
}
$extension=substr($name,
$pos);
In wikimarkup, the standard image title format is [[Image:Name.jpg]].
What's a valid title string for
Image::newFromTitle(
)? A quick series of experiments gives an answer:
- $image
=
Image::newFromTitle(
"[[Image:AS17FlightTranscript.xsl]]"
); Result: Image constructor given bogus title.
- $image
=
Image::newFromTitle(
"Image:AS17FlightTranscript.xsl"
); Result: Image constructor given bogus title.
- $image
=
Image::newFromTitle(
"AS17FlightTranscript.xsl"
); Result: Image constructor given bogus title.
Hmmm... totally bogus, dude. Another look at the source finds the very similar function Image::newFromName(
). Back for another experiment:
- $image
=
Image::newFromName(
"AS17FlightTranscript.xsl"
); Result: works!
For the XSL transform we need either the text contents of the file or its
local path on the server's file system.
## Return the image path of the image in the local file system as an absolute path
function getImagePath()
{
$this->load();
return $this->imagePath;
}
This looks promising; in fact, it does return an absolute path in the format that MSXML2.DOMDocument
needs.
Note the $this->load();
function call. This actually locates the "image" file in MediaWiki's cache,
database, or file system and creates a temporary file on the server.
It might be handy to store the XSL as its own Wiki page. Internally,
WikiMedia uses
Class:
Article to load an article:
$article
= new Article($result->mTitle);
$text =
$article->fetchContent(0,
true,
false);
or, from the actual code in Parser.php line 2271:
$article
= new Article(
$title );
$articleContent =
$article->getContentWithoutUsingSoManyDamnGlobals();
if ( $articleContent
!== false ) {
$found =
true;
$text =
$articleContent;
$replaceHeadings =
true;
}
I tried loading a predefined page:
$title
=
"Main_Page";
$article = new
Article(
$title );
$output .=
"Article object has title '"
. $article->getTitle()
. "'<br />";
This gives an error: Fatal error: Call to a member function on a
non-object in Article.php on line 371. Could it be that there's
a conflict between MediaWiki globals?
Anyway, the uploaded "image" file approach works, so I decided to shelve this
approach.
The XML transcript source is organized with <section>, and <subsection> tags.
Originally, I used simple inline tags of the form:
<section>Countdown</section>
and used XSL to translate it into an HTML header:
<xsl:template match="section">
<h2><xsl:value-of select="node()" /></h2>
</xsl:template>
While this is good HTML, MediaWiki (usually) doesn't recognize that the
headers should be used to build a table of contents. I also tried using
wikimarkup tags:
<xsl:template match="section">
==<xsl:value-of select="node()" />==
</xsl:template>
but, it doesn't work. RTFM: as the documentation states, "Note that the
return string should be HTML, not wiki markup."
I finally decided to use MediaWiki headings outside the XmlTransform
It makes much more sense to use the <section> tags as XML content wrappers:
<section title="Countdown" seq="1.1">
assorted transcript content
</section>
and using XSL to output the section contents only:
<xsl:template match="section">
<xsl:apply-templates />
</xsl:template>
Making the section title an attribute instead of a value keeps it
available for other XSL processing, while separating it from the MediaWiki
markup. I also added a seq tag which can be used for section numbering,
or just for keeping the sections in order in applications that sort by title
text.
Other possibilities:
- If we assign each transcript to its own
Category, and use the
time as part of each item's title, then MediaWiki will automatically build a
table of contents. Unfortunately, the categories are in alphabetical order,
organized by first letter.
- See also Series of articles.