0xDECAFBAD

It's all spinning wheels and self-doubt until the first pot of coffee.

Blosxom, Tiger, and Spotlight

Another tinkering project idea: Use Spotlight and HFS Extended Attributes in OS X Tiger to build an enhanced rendition of Blosxom. Blosxom already uses Unix file attributes to track blog posting dates, but now Spotlight offers indexes that are fairly easy to query via command line or scripts. I think it could also be used to facilitate search within blog postings, which are just text files anyway.

Take this further: What if I could just set arbitrary name/value pairs on files and get them indexed? Title, tags, related links, author-- all things that would have been inside the Blosxom entry. But now, they're in the file system and subject to Spotlight indexing and queries. Hell, what if I could do this for any kind of file, not just text files?

On a mailing list, Lee Morgan asks this question:

I realize that the new HFS Extended Attributes are not exposed to the higher level API's yet, and also that Spotlight doesn't extract their contents during indexing. However I'm working on a application that would greatly benefit from Spotlight indexing this information. Is it possible to write a Spotlight importer that handles every file with out it interfering with other plug-ins? So that I can extract the extended attributes and add it to the Spotlight store.

If anything, this is my biggest disappointment with Tiger and Spotlight. So close to an open database-like filesystem like BeFS, but so far away.

As I understand it, the issue is this:

  • HFS Extended Attributes allow you to attach arbitrary name/value pairs to files-- ie. lmoProject = "Client #2345".
  • Spotlight indexes name/value metadata associated with files-- ie. kMDItemDisplayName = "Unison".
  • However, Spotlight doesn't index the name/value metadata from HFS Extended Attributes. Instead, the source of name/value data for Spotlight comes from importer plugins which interrogate file contents. Huh, that kinda sucks: HFS metadata and Spotlight indexing seem made for each other.
  • Spotlight importers are are keyed on the kMDItemContentType of a file and there can be only one Spotlight importer per file type. Hmm, that seems limiting.

My sad conclusion? Since there can only be one Spotlight importer per type, and none of these handle HFS Extended Attributes, there's no way I can see to somehow hack support for an HFS Extended Attribute importer into Spotlight across all file types without either replacing all importers or hacking the Spotlight core itself.

I suppose I could create a "new" file type that's really just a text file, but create an importer that can read RFC822-inspired headers in blog postings and index those.

But I'd really like to rev things up by including MP3s (podcasts), MOVs (screencasts), and lots of other things besides text. However, I want to apply my own arbitrary metadata to them--like tags and related links and contributors and such. Instead, I'm stuck with whatever existing importers supply.

Granted, there is a lot that these importers supply, which is better than nothing and there are certainly some ways I can think to hack things... But, it's just shy of being great.

Archived Comments

  • The importer interface is public, right? Is there anything that prevents someone from writing an importer that handles all files, but doesn't actually do anything, and just passes calls through to any number of other importers that it calls on without the aid of Spotlight?
  • SpotMeta extends Spotlight so you can add arbitrary metadata. It uses xattrs to store them, and is written in such a way that other developers can easily use them too.