0xDECAFBAD

It's all spinning wheels and self-doubt until the first pot of coffee.

Python RDF repository wanted for web proxy metadata harvester

Okay, this is getting close to outstripping my enthusiasm and invoking my laziness: Does anyone happen to have RDFLib and ZODB working under Mac OS X 10.2.3? Have also tried compiling Redland and its Python and Java APIs, but that's not been a 100% success. Or can someone recommend another decent RDF repository to play with under Python? I've had fun with Jena under Java, love using RDQL, and dig switching between MySQL and BDB stores.



I want an RDF repository I can integrate into my proxy experiments, currently implemented in Python. I've been very tempted to switch to Java, which I know better and have a better sense of tools available. But I'm still pulling for Python. I suppose I could just go with an in-memory repository at first, but I don't want to stick with that.



I'm still finishing up the PersonalWebProxy notes and plan I've been working on, but I've still got an itch to play in code. The next major thing I want to do is extract as much metadata as I can from every HTML page I visit and load the RDF repository up with statements based on what I harvest. Examples would include things like HTML title, visitation date, referring url, any meta tags, any autodiscovered RSS and FOAF URLs, and anything else I could eventually dig out. Then, I want to amass some data and play with it. I'm thinking this could give me a kind of uber-history with which to work.



Update: Seems like I managed to get Python, RDFLib, and ZODB working, but I started completely from scratch and compiled everything from clean source. I guess Apple's build of Python has more hiccups in it than just the Makefile thing.

shortname=ooocgc

Archived Comments

  • I'm glad to hear you got everything working. Your choice of ZODB might make searching easier too. Especially since ZCatalog seems to be the best searching solution for Python. Python text searching is a hot topic lately. Here are some related links: http://www.oreillynet.com/pub/wlg/2317 http://blogs.salon.com/0000002/2002/12/17/search_engine_in_python.html http://blogs.salon.com/0000002/2002/12/18/adapting_other_search_engines.html http://www.zopenx.net/archives/000570.html http://www.pycs.net/archive/2002/12/19/ http://blogs.salon.com/0000002/2002/12/31/#200212311 Matt Griffith
  • If you haven't already, you might want to take a look at some of the OSAF/Chandler postings, since they're also using ZODB. They intend to provide an RDF interface, but I think that will be a "layer", not the native/primary interface... http://www.google.com/search?q=site%3Alists.osafoundation.org%20zodb%20rdf
  • Just to make sure--have you seen this? It might have some useful bits related to zodb/python/etc. http://www.zope.org/Members/raystream/zzKnowMan
  • Dude, Just switch to Java already ;) - itdp