MediaWiki Extension Parser Debugging
By Eric Hartwell - latest revision
April 26, 2006
According to the MediaWiki extension documents, you should be able to renter
wikitext within your extension by calling $wgOut->parse(). This works
just fine for inline extensions if they're all on the same page.
Things fall apart when an extension is used within a template. The parser is
confused for all other extensions on the page, and even for some built-in tags
like <pre>. The output text has strings like
"UNIQ10842e596cbb71da-PageTransform2a0abb1e6fa1476500000001" where the tag's
output should be.
Here's the wikitext for my test case:
This is a test page for troubleshooting the
extension reparse problem (see [[:Testing wiki extension
reparse]]).
Invoke <code>TestExtension</code> inline: <TestExtension>==
This is a reparse test ==</TestExtension>
Invoke <code>TestExtension</code> inline through the
<code><span>{</span>{:TestExtensionTemplate}<span>}</span></code>
template: {{:TestExtensionTemplate}}
The wikitext <code><pre>test</pre></code> displays
as this: <pre>test</pre>
-----
== Testing ==
# Same page with changes to <code>TestExtension</code> code:
#* Call <code>$wgOut->parse()</code>:<br />TestExtension:
UNIQ601176b538715c69-TestExtension715301547906bcc500000001<br
/><code><pre>test</pre></code>:
UNIQ601176b538715c69-pre456b93972d2a0c400000001
#* Don't call <code>$wgOut->parse()</code>: processes as
expected
{{:TestExtensionTemplate}}
TestExtensionTemplate is a page that simple contains a
TestExtension tag.
<TestExtension>== This is a reparse test
==</TestExtension>
TestExtension itself is a trivial extension that returns the
input, either unchanged or after a call to $wgOut->parse():
<?php
$wgExtensionFunctions[] =
"wfTestExtension";
function wfTestExtension()
{
global $wgParser;
$wgParser->setHook(
"TestExtension",
"renderTestExtension"
);
}
function renderTestExtension(
$input,
$argv
) {
## Render wikitext in the
output
global
$wgOut;
# return $input;
return
$wgOut->parse($input);
}
?>
If the parser evaluates the first {{:TestExtensionTemplate}}
template reference, which invokes TestExtension.
- return
$input;
- output is as expected
- return
$wgOut->parse($input);
- even the <pre> tag no longer works! The wikitext <pre>test</pre>
displays as UNIQ446190381dab8dcf-pre7765f227b837f4800000001
instead of:
test
How MediaWiki processes extensions
Note: The comment for the parser->parse() function definition in
Parser.php clearly states, "Do not call this
function recursively."
The parser first calls parser->extractTagsAndParams() to
replace all occurrences of <$tag>content</$tag> in the text with
a random marker, and loads the $ext_content associative array
with data of the form $unique_marker =>content. Then for each
tag it calls the extension's function to transform the text and
put the result back in $ext_content.
# Extensions
foreach ( $this->mTagHooks as $tag => $callback ) {
$ext_content[$tag] = array();
$text = Parser::extractTagsAndParams( $tag, $text, $ext_content[$tag],
$ext_tags[$tag], $ext_params[$tag], $uniq_prefix );
foreach( $ext_content[$tag] as $marker => $content ) {
$full_tag = $ext_tags[$tag][$marker];
$params = $ext_params[$tag][$marker];
if ( $render )
$ext_content[$tag][$marker] = call_user_func_array(
$callback, array( $content, $params, &$this ) );
else {
if ( is_null( $content ) ) {
// Empty element tag
$ext_content[$tag][$marker] = $full_tag;
} else {
$ext_content[$tag][$marker] = "$full_tag$content</$tag>";
}
}
}
}
- parser->extractTagsAndParams()
- Replaces all occurrences of <$tag>content</$tag>
in the text with a random marker and returns the new text.
The output parameter $content will be an
associative array filled with data on the form
$unique_marker => content.
/**
* Replaces all occurrences of <$tag>content</$tag> in the text
* with a random marker and returns the new text. the output parameter
* $content will be an associative array filled with data on the form
* $unique_marker => content.
*
* If $content is already set, the additional entries will be appended
* If $tag is set to STRIP_COMMENTS, the function will extract
* <!-- HTML comments -->
*
* @access private
* @static
*/
function
extractTagsAndParams($tag,
$text, &$content,
&$tags, &$params,
$uniq_prefix
= ''){
$rnd
= $uniq_prefix
. '-'
. $tag
. Parser::getRandomString();
if ( !$content
) {
$content =
array( );
}
$n =
1;
$stripped =
'';
if ( !$tags ) {
$tags = array(
);
}
if ( !$params ) {
$params =
array( );
}
if( $tag ==
STRIP_COMMENTS ) {
$start =
'/<!--()()/';
$end =
'/-->/';
} else {
$start =
"/<$tag(\\s+[^\\/>]*|\\s*)(\\/?)>/i";
$end =
"/<\\/$tag\\s*>/i";
}
while ( '' !=
$text ) {
$p =
preg_split(
$start,
$text,
2,
PREG_SPLIT_DELIM_CAPTURE );
$stripped .=
$p[0];
if( count(
$p ) <
4 ) {
break;
}
$attributes =
$p[1];
$empty =
$p[2];
$inside =
$p[3];
$marker =
$rnd .
sprintf('%08X',
$n++);
$stripped .=
$marker;
$tags[$marker]
= "<$tag$attributes$empty>";
$params[$marker]
= Sanitizer::decodeTagAttributes(
$attributes );
if ( $empty
=== '/' ) {
// Empty element tag, <tag />
$content[$marker]
= null;
$text =
$inside;
} else {
$q =
preg_split(
$end,
$inside,
2 );
$content[$marker]
= $q[0];
if( count(
$q ) <
2 ) {
# No end tag -- let it run out to
the end of the text.
break;
} else {
$text
= $q[1];
}
}
}
return $stripped;
}
Later on, the parser substitutes the translated text back
into the output. Now, if the "UNIQ..." markers are still in the
output text, then either $ext_content has been corrupted, or the
replacement never happens...
# Merge
state with the pre-existing state, if there is one
if (
$state ) {
$state['html']
= $state['html']
+ $html_content;
$state['nowiki']
= $state['nowiki']
+ $nowiki_content;
$state['math']
= $state['math']
+ $math_content;
$state['pre']
= $state['pre']
+ $pre_content;
$state['gallery']
= $state['gallery']
+ $gallery_content;
$state['comment']
= $state['comment']
+ $comment_content;
foreach( $ext_content
as $tag
=>
$array ) {
if ( array_key_exists(
$tag,
$state
) ) {
$state[$tag]
= $state[$tag]
+ $array;
}
}
} else {
$state
= array(
'html'
=>
$html_content,
'nowiki'
=>
$nowiki_content,
'math'
=>
$math_content,
'pre'
=>
$pre_content,
'gallery'
=>
$gallery_content,
'comment'
=>
$comment_content,
) + $ext_content;
}
return $text;
If there's no existing state, then the $ext_content is simply
tacked on to the end of the $state array without applying the
translation.
Troubleshooting the Reparse Problem
To track what's actually happening inside the parser, I added some debug code
to the test extension:
trigger_error("TestExtension
called " . var_export(debug_backtrace()),
E_USER_NOTICE);
| No reparse |
Reparse |
|
wikitext input |
wikitext input |
inline: <TestExtension>== This is a reparse test ==</TestExtension>
template: {{:TestExtensionTemplate}}
pre:<pre>test</pre>
template: {{:TestExtensionTemplate}} |
inline: <TestExtension>== This is a reparse test ==</TestExtension>
template: {{:TestExtensionTemplate}}
pre:<pre>test</pre>
template: {{:TestExtensionTemplate}} |
| Output (no reparse) |
Output (with reparse) |
inline == This is a reparse test ==
template == This is a reparse test ==
pre: <pre>test</pre>
template <p>== This is a reparse test ==
</p><p><br /> |
inline
UNIQ2130a2344e48f0fe-TestExtension4ce844657b03daa800000001
template UNIQ76222ba3742437c0-TestExtension5fa4b24836d5619b00000001
pre: UNIQ2130a2344e48f0fe-pre3950620b2071ed0100000001
template <div class="editsection"
style="float:right;margin-left:5px;">[<a href="/wiki/index.php?title=Wiki_extension_test_page&action=edit&section=1"
title="Edit section: This is a reparse test">edit</a>]</div><a name="This_is_a_reparse_test"></a><h2>
This is a reparse test </h2>
<p><br /> |
| 1st TestExtension call 8
=> OutputPage.php:314 function 'parse' |
1st TestExtension call 8
=> OutputPage.php: 314 function 'parse' |
mStripState:
'pre' => array (
'UNIQ2a01516467c051fe-pre4bbb1bc3b63bbca00000001' => '<pre>test</pre>',
),
'TestExtension' => array (
'UNIQ2a01516467c051fe-TestExtension5ed3df6f54c5b16d00000001' => '== This
is a reparse test ==',
), |
mStripState:
'pre' => array (
'UNIQ59c1127449650efe-pre5ad453236159ce5500000001' => '<pre>test</pre>',
),
'TestExtension' =>
array (
'UNIQ59c1127449650efe-TestExtension722751597395f68c00000001' => '<div
class="editsection" style="float:right;margin-left:5px;">[<a href="/wiki/index.php?title=Wiki_extension_test_page&action=edit&section=1"
title="Edit section: This is a reparse test">edit</a>]</div><a name="This_is_a_reparse_test"></a><h2>
This is a reparse test </h2>',
),Comment: The 'pre' array is still populated,
and one 'TestExtension' element has been parsed.
|
| 2nd TestExtension call 8 => OutputPage.php:314
function 'parse' |
2nd TestExtension call 8 => OutputPage.php:314 function 'parse' |
mStripState:
'pre' => array ('UNIQ2a01516467c051fe-pre4bbb1bc3b63bbca00000001' => '<pre>test</pre>',
),
'TestExtension' => array (
'UNIQ2a01516467c051fe-TestExtension5ed3df6f54c5b16d00000001' => '== This is a
reparse test ==',
'UNIQ2a01516467c051fe-TestExtension26e9562944a1ca4100000001' => '== This is a
reparse test ==',
),Comment: Both the 'pre' and 'TestExtension'
arrays are still populated.
|
mStripState:
'pre' => array (
),
'TestExtension' => array (
'UNIQ6676470e18af4763-TestExtension40a20b9465a85f1500000001' => '<div class="editsection"
style="float:right;margin-left:5px;">[<a href="/wiki/index.php?title=Wiki_extension_test_page&action=edit&section=1"
title="Edit section: This is a reparse test">edit</a>]</div><a name="This_is_a_reparse_test"></a><h2>
This is a reparse test </h2>',
),Comment: The 'pre' array is empty, and 'TestExtension'
has only one element. The UNIQ codes are left in the output, which
suggests they must have been deleted from the state during the first
render call.
|
To take a closer look, do another dump after the first 'TestExtension' calls $wgOut->parse()
but before it returns.
First
extension call, before $wgOut->parse():
mStripState:
'pre' => array (
'UNIQ51cab28f3b372394-pre28907c5829a2abb400000001' => '<pre>test</pre>',
),
'TestExtension' => array (
'UNIQ51cab28f3b372394-TestExtension70e0355c37d4627d00000001' => NULL,
),
First extension
call, after $wgOut->parse() but before return:
mStripState:
'pre' => array (
),
'TestExtension' => array (
),
So, when the extension is called while processing a template expansion,
the $wgOut->parse() function clears mOutput's mStripState array. It
shouldn't.
Now, take a close look at the source for the parser:
/**
* Convert wikitext to HTML
* Do not call this function recursively.
*/
function parse(
$text, &$title,
$options,
$linestart =
true,
$clearState =
true,
$revid =
null ) {
/**
* First pass--just handle <nowiki> sections, pass the rest off
* to internalParse() which does all the real work.
*/
global $wgUseTidy,
$wgAlwaysUseTidy,
$wgContLang;
$fname =
'Parser::parse';
wfProfileIn(
$fname );
if ( $clearState )
{
$this->clearState();
}
$this->mOptions
= $options;
$this->mTitle
=& $title;
$this->mRevisionId
= $revid;
$this->mOutputType
= OT_HTML;
$this->mStripState
= NULL;
//$text = $this->strip( $text,
$this->mStripState );
// VOODOO MAGIC FIX! Sometimes the above segfaults in PHP5.
$x =&
$this->mStripState;
wfRunHooks(
'ParserBeforeStrip', array( &$this,
&$text, &$x
) );
$text =
$this->strip(
$text,
$x );
wfRunHooks(
'ParserAfterStrip', array( &$this,
&$text, &$x
) );
. . . .
. . . .
This is where the parser state gets killed. My guess is that the $this->mStripState = NULL statement
shouldn't be there; the clearState() function is
the place to
reinitialize mStripState:
function
clearState() {
. . . .
$this->mStripState
= array();
What happens if we don't clear mStripState ? Is this mandatory, or is it a leftover?
Why is it being cleared in the first place?
To figure this out, I had
to study the
MediaWiki source code on SourceForge.net.
This is the mixed blessing of open source ...
History of $this->mStripState = NULL
The $stripState
= NULL; line was added in revision
1.11 (Mar 6, 2004), probably a leftover from "Mov[ing] body of Article::preSaveTransform to Parser.php".
The $stripState variable was never used in the main parse()
routine.
- Revision 1.11: Mar 6 2004 (view)
Changes since
1.10: +179 -56 lines
- "In Parser.php, generalised stripping of <nowiki>, <pre> and <math> to
allow more general use such as nesting"
- "Moved body of Article::preSaveTransform to Parser.php"
|
$stripState
= NULL; |
| $text = $this->strip( $text,
$this->mStripState, true ); |
| $text = $this->doWikiPass2(
$text, $linestart ); |
| $text = $this->unstrip( $text,
$this->mStripState ); |
| $this->mOutput->setText( $text
); |
| wfProfileOut( $fname ); |
| return $this->mOutput; |
| function preSaveTransform( $text,
&$title, &$user, $options, $clearState = true ) |
| { |
| $this->mOptions = $options; |
| $this->mTitle = $title; |
| if ( $clearState ) { |
| $this->clearState; |
| } |
| $stripState
= false; |
| $text = $this->strip( $text,
$stripState,
false ); |
| $text = $this->pstPass2( $text,
$user ); |
| $text = $this->unstrip( $text,
$stripState ); |
| return $text; |
| } |
The $stripState
= NULL; line remained in the source until revision
1.4.23 (Apr 21, 2005), when it was changed to $this->mStripState =
NULL; as part of the fix for (bug
1931) "cleanup, removing
unused code and variables". The bug fix description states, "The $stripState
var is not used, ever. I am positive this is supposed to be $this->mStripState."
- Revision 1.423: Apr 21 2005 - (view)
Changes since
1.422: +15 -20 lines
- (bug 1931) cleanup, removing
unused code and variables.
- "The $stripState var is not used, ever. I
am positive
this is supposed to be
$this->mStripState."
| $this->mTitle =& $title; |
$this->mTitle =& $title; |
| $this->mOutputType = OT_HTML; |
$this->mOutputType = OT_HTML; |
| $stripState
= NULL; |
$this->mStripState
= NULL; |
| global $fnord; $fnord = 1; |
global $fnord; $fnord = 1; |
| //$text = $this->strip( $text,
$this->mStripState ); |
//$text = $this->strip(
$text, $this->mStripState ); |
Alas, that's not the case. It's true that "The $stripState var is
not used, ever." But, this line is supposed to be deleted, not
changed to $this->mStripState.
Nulling out $this->mStripState causes the parser to reset its
state. This is not normally a problem - unless you call $wgOut->parse from a
extension function that is referenced from a template, or otherwise recursively.
(Note: Yes, I also
wondered about the "VOODOO MAGIC FIX!"
- but
that turned out to be unrelated.)
Fixing $this->mStripState = NULL
Here's the test page with wgOut->parse() and $this->mStripState = NULL:

I deleted the "$this->mStripState = NULL" line, and here's the test page with wgOut->parse()
:

The result:
No change.
D'Oh!
|
 |
Revision History
- April 26, 2006 - initial version