More DirWalker in MODX

In the previous article on Extending DirWalker, I made a foolish statement about how easy it would be to modify the example program to produce an accurate report containing the actual events fired with $modx->invokeEvent() and the variables sent in the invokeEvent() call. Having said that, I couldn’t resist actually doing it, but it took a lot longer than I thought it would — mainly due to some inconsistency in the way MODX calls events and the usual difficulties with formatting the output. This version assumes that you are running the code as a snippet in MODX. It still processes the files as found, but the creation of the output is deferred until after the search is finished.

MODX logo

A Word About MODX Events

The point of MODX events is to allow users to intercept the MODX engine at various points and have code of their own executed. Whenever MODX saves a chunk, for example, it fires the OnChunkBeforeSave event just before saving the chunk, and the OnChunkSave event just after saving the chunk.

A plugin connected to either of those events can perform any actions it wants to, and it will receive information about the chunk being saved in the second argument of the invokeEvent() call (the first argument is the name of the event itself). In the case of OnChunkBeforeSave, the plugin could modify the actual HTML code of the chunk before it is saved.

If you’re an old-time MODX user and sometimes forget and spell it MODx, a plugin could easily correct that mistake before any chunk (or resource) is saved. For a chunk, a plugin to do that would be connected to the OnChunkBeforeSave event and would look like this:

$content = $chunk->getContent();
$content = str_replace('MODx', MODX, $content);
$chunk->setContent($content);

In order to write a plugin, you often need to know what variables are sent to it. That’s the point of the report we are producing with DirWalker. You may also need to know whether or not the variable sent are reference variables (preceded by &$). If they are reference variables, a reference to the actual object is sent in the arguments, rather than a copy. That means that you can modify that object itself in your plugin. In the invokeEvent() call for OnChunkBeforeSave, for example, you’ll find this:

'chunk' => &$this,

That means that any plugin attached to that event can modify the $chunk variable and the modified version will be saved to the database.

Strategy

In this version, we again override the processFile() method of the DirWalker class. Instead of the $this->files array containing just the full path and the filename, however, we do a search for the events fired and the variables sent to them. Then, we make the $this->files array a little more complex so it will contain information on which events are called in which files and what arguments are sent to them.

We’ve also added a search of the manager directory, because some events are fired there as well. Notice that we call dirWalk() twice and didn’t call resetFiles() after the first call, because we wanted the manager directory files to be added to the ones found in the core directory. We also had to add a line in the processFiles() directory to shorten the full path of the files found in the manager directory.

The Code

Here is the resulting code for the extended DirWalker Class. If you ever have to bid a programming job that involves complex regular expressions (especially multi-line regexes), multiply your time estimate by at least a factor of three.

The preg_replace(), preg_match() and preg_match_all() methods are notoriously slow. Considering the number of files involved and the number of complex regex operations going on in each file, the code is remarkably fast. On my local machine, it runs in less than three seconds when run inside PhpStorm.

As before, this code assumes that you’ve installed the DirWalker package or downloaded the class file. See the note below on where to get DirWalker. Like our previous example, it skips some types of files and directories and only looks in files with ‘.class’ in their names.

require_once MODX_CORE_PATH . 'components/dirwalker/model/dirwalker/dirwalker.class.php';

class MyDirWalker extends DirWalker {

    /* We override this method to process the files as found */
    protected function processFile($dir, $fileName) {
        /* Note that $dir is just the directory with no
           trailing slash so we have to create the full path*/
        $fullPath  = $dir . '/' . $fileName;

        /* These are to make sure we've found them all */
        $trueCount = 0;
        $foundCount = 0;

        /* get the file's content */
        $content = file_get_contents($fullPath);
        /* Set $trueCount -- Only count instances *not*
               preceded by a space */
        preg_match_all('#[^\s]invokeEvent#', $content, $preMatches);
        $trueCount += count($preMatches[0]);

        /* Shorten the path for use in the display */
        $shortPath = str_replace(MODX_CORE_PATH, 'core/', $fullPath);
        $shortPath = str_replace(MODX_MANAGER_PATH, 'manager/', $shortPath);

        $pattern = '#[^\s]invokeEvent\([\'\"]*(\$*[^,"\']+)[\'\"\s]*,?([^;]*;)#s';

        preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

        if (!empty($matches[0])) {
            foreach($matches as $match){
                $foundCount++;
                if ($match[2] == ');') {
                    $match[2] = 'None';
                }
                $this->files[$shortPath][] = array(
                    'event' => $match[1],
                    'variables' => $match[2],
                );
            }
            /* print a message if we failed to find all of them or found
               too many of them -- Ideally, this never executes */
            if ($trueCount != $foundCount) {
                echo "\n\n" . $shortPath;
                echo "\nTrueCount: " . $trueCount .
                    ' -- ' . 'FoundCount: ' . $foundCount;
            }
        }
    }
}

$output = '';
$dw = new MyDirWalker($modx);
$dw->setIncludes('.class');
$dw->setExcludes('-all,-min,.git,modprocessor');
$dw->setExcludeDirs('cache,.git,packages,components');
$dw->dirWalk(MODX_CORE_PATH, true);
$dw->dirWalk(MODX_MANAGER_PATH, true);

$files = $dw->getFiles();

$output = '';

foreach($files as $file => $events) {
    $output .= "\n<h4>" . $file . '</h4>';
    foreach($events as $event) {
        $e = $event['event'];
        $v = $event['variables'];
        if (strpos($v, 'array') !== false) {
            $v = preg_replace('#^\s*\'#m', '<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\'', $v);
            $v = preg_replace('#^\s*\)\);#m', '<br />&nbsp;&nbsp;&nbsp;&nbsp;);', $v);
        } else {
            $v = str_replace(');', '', $v);
        }
       $output .= "\n<p>&nbsp;&nbsp;&nbsp;&nbsp;Event: <b>" . $e . '</b><br />';
       $output .= "\n&nbsp;&nbsp;&nbsp;&nbsp;Variables: " . $v . '</p>';
    }
}

return $output;

The Regex Patterns

The main regular expression pattern in the code is in the overridden processFile() method:

$pattern = '#[^\s]invokeEvent\([\'\"]*(\$*[^,"\']+)[\'\"\s]*,?([^;]*;)#s';

It’s a little difficult to read because some of the parentheses represent capture groups and some are looking for actual parentheses in the file. Here are some examples of what it has to find:

$modx->invokeEvent('OnFileManagerUpload', array(
    'files' => &$objects,
    'directory' => $container,
    'source' => &$this,
    );

$modx->invokeEvent('OnHandleRequest');

$modx->invokeEvent($event, array(
    'context' => $context,
);

$modx->invokeEvent('OnPageNotFound', $options);

Surprisingly, these can all be found (with no false positives) by the single regular expression above. The pound sign (#) is the delimiter for the expression and the ‘s’ after the final pound sign lets the expression search over multiple lines.

In English, the pattern starts its search for anything but a space (\s) followed by the term ‘invokeEvent’. This is to avoid capturing references to ‘invokeEvent’ in comments and in the invokeEvent() function itself. Next, the pattern searches for zero or more single or double quotes, because both are used to surround the event name in MODX invokeEvent() calls, and in the case of dynamically called events like the third example, there are no quotation marks.

After eating the quotation mark, the first capture group begins, starting with 0 or more dollar signs followed by a series of anything but a comma or another quotation mark. Then it eats (without capturing), the quotation mark or space and then eats a comma, if there is one.

After that, we get the second capture group, which captures the second argument of the call up to and including the final semicolon of the statement.

After the preg_match_all() call runs, we look at the array to see if there are any matches. If there are, we put them in an array where the main key is the full path to the file and the sub-arrays each contain the name of the event and its the variables (if any) in its second argument:

'core/model/modx/modmanagerrequest.class.php' => array(
    0 => array(
        'event' => OnHandleRequest
        'variables' => None
    ),
    1 => array(
        'event' => 'OnManagerPageInit',
        'variables' => array(
            'action' => $this->action
        ),
    ),
),

There are two preg_replace() calls in the section that produces the output:

$v = preg_replace('#^\s*\'#m', '<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\'', $v);
$v = preg_replace('#^\s*\)\);#m', '<br/>&nbsp;&nbsp;&nbsp;&nbsp;);', $v);

These simply replace any number of spaces with a fixed number of non-breaking space entities, so they’re all indented by the same amount. This is necessary because the number of leading spaces in the source code varies depending on how deeply nested the invokeEvent() call is. Without the adjustment, some of them would be quite far to the right in the output. If I weren’t so lazy, all of the spaces would be removed and each line would get a CSS class that specified the indent.

The Output

You can see the output of the code above (which is much too long to post here) here. It’s a little dressed up and has a jumplist with links to each event at the top. We’ll see the code for that in a later article. After spending all night fine-tuning the code, I didn’t have the energy left to do much fancy styling. The report shows all of the event invocations in the modx core directory (excluding the few events that fire in add-on components), the files they occur in, and the arguments sent in each call to invokeEvent().

Getting DirWalker

DirWalker is a single class file. You can see it at GitHub.

You can also install it in MODX through Package Manager (though the class does not require MODX) or get it at the MODX Repository. If you install the package, you’ll also get several files showing examples of how to use DirWalker to produce reports containing information gleaned from the MODX Codebase.


For more information on how to use MODX to create a web site, see my web site Bob’s Guides, or better yet, buy my book: MODX: The Official Guide.

Looking for quality MODX Web Hosting? Look no further than Arvixe Web Hosting!

Tags: , , , , , | Posted under MODX, MODX | RSS 2.0

Author Spotlight

Bob Ray

Bob Ray

I am the author of MODX: The Official Guide and over 30 MODX add-on components. I host Bob's Guides, a source of valuable information for MODX users, and I've been very active in the MODX Forums with over 18,000 posts.

Leave a Reply

Your email address will not be published. Required fields are marked *


× 2 = 2

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>