Using web services and XSLT to scrape RSS from HTML
After tinkering a bit with web services and XSLT-based scraping last week for generating RSS from HTML, I ripped out some work I was doing for a Java-based scraper I'd started working on last year and threw together a kit of XSLT files that does most everything I was trying to do.
I'm calling this kit XslScraper, and there's further blurbage and download links avaiable in the Wiki. Check it out. I've got shell scripts to run the stuff from as a cron job, and CGI scripts to run it all from web services.
For quick gratification, check out these feeds:
- - The Nation (using Bill Humphries' XSL)
- - KurzweilAI.net
- - J-List -- You've got a friend in Japan!
- - New JOBS at the University of Michigan (By Job Family)