Parsing XML With Namespaces with SimpleXML

There’s one thing SimpleXML for PHP is horrible to use for: parsing XML containing namespaces. Namespaces requires special handling, and the only way I’ve found that allows you to refer to an element in another namespace, is to use the ->children() method with the namespace. I’m sure there’s an easier way than this, and if you know of any, please leave a comment!

Let’s start with the following XML snippet (using SOAP as an example):


    
        
            asdasd
        
    

The easiest way to do this is to “ignore” the namespaces, and simply do $root->{soap:Envelope} that to access the property. This will not work, as SimpleXML is quite peculiar about it’s namespaces (.. while everything else is simple and easy to use).

One solution is to provide the namespace you’re interested in to the $element->children() method, which returns all the children of the element in a particular namespace (or without arguments, outside any namespace):

$sxml = new SimpleXMLElement(file_get_contents('soap.xml'));

foreach($sxml->children('http://www.w3.org/2001/12/soap-envelope') as $el)
{
    if ($el->getName() == 'Body')
    {
        /* ... */
    }
}

Yes. That’s quite horrible.

But luckily the xpath method can help us:

$elements = $sxml->xpath('//soap:Envelope/soap:Body/queryInstantStreamResponse');

This will actually fetch all the elements titled “queryInstantStreamResponse” which are childs of soap:Envelope and soap:Body. And this works as you expect it to, without having to use children, provide the actual namespace URI, etc.

The xpath method returns an array containing all the matching elements, so in this case you’ll receive an array with a single element, containing the text inside the queryInstantStreamResponse element.

There should be an easier way than this.

PHP Namespaces and the What The Fuck problem

As many people has discovered during the last days, some of the developers behind PHP has finally reached a decision regarding how to do introduce namespaces into PHP. This is an issue which has been discussed on and off for the last three years (or even longer, I can’t really keep track), with probably hundreds of threads and thousand of mail list posts (read through the php.internals mail list archive if you’re up for it). The current decision is to go with \ as the namespace separator. An acceptable decision by itself, and while I personally favored a [namespace] notation, I have no reason to believe that this is not a good solution.

There does however seem to be quite a stir on the internet in regards to decision, which I now will call the “What The Fuck Problem”. Most (if not all) public reactions on blogs, reddit, slashdot and other technology oriented sites can be summed up as “What The Fuck?”. While I’m not going to dwell into the psychological reasons for such a reaction, my guess is that it’s unusual, lacks familiarity for developers in other languages already supporting namespaces and that it might be hard to understand the reasoning behind such a decision.

The problem: to retrofit namespaces into a language which were not designed to support such a construct, without breaking backwards compability. The part about not breaking backwards compability is very important here, as it leaves out everything which could result in a breakage in existing code by simply using a new PHP version.

The discussions have been long, the attempts has been several (thanks to Greg Beaver’s repeated persistence in writing different implementations along the way) and each and every single time an issue has crept in which either breaks existing functionality or results in an ambigiuity when it comes to resolving the namespace accessed. Most issues and the explaination why they are issues, are documented at Namespace Issues in the PHP Wiki. This provides some insight into why the decision to use a separate identifier was chosen.

This seemed to get through in the flamewars discussions after a while, and people instead started to point out the “gigantic flaw”:

If you were to load a class or a namespace dynamically by referencing it in a string, you’d have to take care to escape your backslash:

spl_autoload_register(array("Foo\tBar", "loader"));

.. would mean Foo<tab>Bar. Yep. It actually would. And if that’s the _biggest_ problem with the implementation of using \ as a namespace separator, then I feared its introduction earlier for no apparent reason.

The other examples has plagued us with ambiguity issues, autoloading issues, no-apparent-way-of-putting-functions-and-constants-in-the-namespace-issues and other problems. This way we are left with the task that we, as usual, have to escape \ in a string — when we want to reference a namespace in a dynamic name — and that’s the biggest problem?

I just hope that people keep it sane and don’t implement any special behaviour in regards to how strings are parsed in regards to new, $className::. This _will_ cause problems and magic issues down the road if it ever gets into the engine.

PHP is free, it’s open and it’s yours for the taking. Fork it if you want to, or provide a patch which solves the problem in a better way. The issue has been discussed to death, and so far there is no apparent better solutions on the table. If you have one, you’ve already had three years to suggest it. You better hurry if you’re going to make people realize it now.

Additional observation: most people who has an issue with this, seems to be the same people who would rather be caught dead than writing something in PHP. Yes. Python and Java does it prettier. That’s not a solution to the problem discussed in regards to PHP.