Scraping with web services: Success
Okay, so I took another shot at scraping HTML with web services with another site that passes the HTML Tidy step. Luckily, this is a site that I already scrape using my own tool, so I have XPath expressions already cooked up to dig out info for RSS items. So, here are the vitals:
- Site: http://www.jlist.com
- XSL: http://www.decafbad.com/jlist.xsl
- Tidy URL: http://cgi.w3.org/cgi-bin/tidy?
- Final URL: http://www.w3.org/2000/06/webdata/xslt?
<p>Unfortunately, although it looks okay to me, this feed <a href="http://feeds.archive.org/validator/check?url=http%3A%2F%2Fwww.w3.org%2F2000%2F06%2Fwebdata%2Fxslt%3Fxslfile%3Dhttp%253A%252F%252Fwww.decafbad.com%252Fjlist.xsl%26xmlfile%3Dhttp%253A%252F%252Fcgi.w3.org%252Fcgi-bin%252Ftidy%253FdocAddr%253Dhttp%25253A%25252F%25252Fwww.jlist.com%25252FUPDATES%25252FPG%25252F365%25252F%26transform%3DSubmit">doesn’t validate yet</a>, but I’m still poking around with it to get things straight. Feel free to help me out! :)</p>