0xDECAFBAD

It's all spinning wheels and self-doubt until the first pot of coffee.

XPath based Python dictionaries, on loan

So Kimbro Staken posted this nifty idea to build XPath based Python dictionaries to access XML data as a part of his incredibly nifty Syncato microcontent management system. Eventually, I've really got to break down and get that thing built and running on my server and my laptop-- it really seems like I'm reinventing so many wheels by not basing dbagg3 on it.

But, while I'm in the process of wheel reinvention, how about I borrow Kimbro's idea? I just threw together a quick class called XPathDict, based on libxml2. It works a little something like this:

feed_xd = XPathDict(file="sample-atom.xml")
for entry_node in feed_xd.nodes("//atom:entry"):
    entry = XPathDict(doc=entry_node.doc, node=entry_node)
    print "Title: " % entry['atom:title']
    if 'atom:author' in entry:
        print "Author: " % entry['atom:author/atom:name']

xml = """
   <dbagg3:user xmlns="http://purl.org/atom/ns#" 
            xmlns:dbagg3="http://decafbad.com/2004/07/dbagg3/">
        <name>deusx</name>
        <email>deus_x@pobox.com</email>
        <url>http://www.decafbad.com/</url>
        <dbagg3:prefs>
            <dbagg3:pref name="foo">bar</dbagg3:pref>
        </dbagg3:prefs>
   </dbagg3:author>
"""

map = (
    ('userName',  'a:name'),
    ('userEmail', 'a:email'),
    ('fooPref',   "dbagg3:prefs/dbagg3:pref[@name='foo']")
)

xd = XPathDict(xml=xml)
xd.cd("/dbagg3:user")
print xd.extract(map)

#    {'userName'  : 'deusx', 
#     'userEmail' : 'deus_x@pobox.com', 
#     'fooPref'   : 'bar'}

There isn't any spectacular code behind all this, and the idea was Kimbro's, but it's working. It's also incredibly convenient, especially with the little XML-to-dict extraction map method I whipped up. This would take a bit more work to pry it out of its current context, such as turning the hardcoded namespaces into an option, among other things. But, here's the code for you to peruse.

(I got hooked early on subverting in-built language constructs from perl's tie facilities, and C++'s operator overloading. Now I'm loving Python's special class methods. Someday, maybe, I'll actually get down to doing some work in LISP and wrap my head around some real language subversion.)

Anyway, while this is neither quite Native XML Scripting nor XML as a native language construct, it's getting there.

Archived Comments

  • Funny that. I also have one that has survived a couple of failed apps. I have a hard time dropping it to be honest and just keep lugging it around to each new project. http://naeblis.cx/cvs/percolator/xb/lib/xpdm.py?rev=HEAD&content-type=text/vnd.viewcvs-markup It has some pretty big issues. Among other things, creating nodes with namespace support is a little.. ermmm.. not there. But it does a lot of things well like garbage collecting xmlDoc instances (freeDoc), copying nodesets between documents, encoding things when they need to be, etc. Anyway, I wonder if maybe we all might benefit by teaming up on this and try to define what a complete xpathish wrapper atop libxml2 should look like. And really, why limit it to libxml2? I'm of the opinion that the value here is an interface that embraces xpath. The fact that it's running on top of the blazingly fast libxml2 is nice but coding against the XMLTRAMP like interface is the value for me. So let me see if I can get some time together to whip up a quick comparison of the three implementations. I'll shoot that over to you and Kimbro and we can go from there. If these seem to work best as backyard APIs we like to keep close to us, we'll drop it. However, I think there's a good chance that we can all benefit by combining our efforts.