0xDECAFBAD

It's all spinning wheels and self-doubt until the first pot of coffee.

PersonalWebProxy as personal Google and Wayback machine

Matt Griffith proposes a virtual project: Jog. For the most part, what he wants is what I want from my PersonalWebProxy, and more.



The big difference in the writing, though, is that Matt writes from features and what he wants, where I'm already describing things in terms of implementation. That is, I started talking about "proxy" where he's talking about "my personal Google and Wayback machine". I think looking at it that way makes a more compelling case for this thing being generally useful, rather than just some nerdy toy.



Another way I'm looking at this PersonalWebProxy is as an assistant in a sidecar attached to my browser. I want this assistant to watch me, learn, and pipe up from time to time with suggestions. I also want to be able to ask questions and to remind me of things I vaguely remember. Eventually, I'd like this assistant to be able to drive for me from time to time, doing some info hunter-gatherer work for me while I do other things.



I'm still working on this thing. So far I've got a proxy in Python and a simple-minded plugin framework. Two plugins so far: one is a cookie jar separated from any browser - that is, cookies are managed in the proxy, not in the browser; the other is a little thing based on Mark Pilgrim's rssfinder.py that quietly seeks out and gathers RSS links from every text/* resource I view. It seems to be standing up fairly well.



My next steps are something along these lines: Should I continue in Python? To do so means delving deeper into Twisted, using their web app framework for the management UI and staying within their event-driven paradigm in lieu of threading. The reason I first chose Python is because I wanted something that was quickly and easily hackable and fun to contribute plugins for. Does this still apply, if things are deep in the Twisted mindset which is not quite straightforward?



On the other hand, I took a peek at Jetty in Java, which also comes with a simple and hackable HTTP proxy implementation. I could easily cobble together in Java what I have in Python using this. I would also say that I could easily make it compatible with whatever plugins were written for the Python version, using Jython, but there's also a paradigmatic difference were I to go with Java: Threads in lieu of event-driven design.



Maybe I'm thinking too much about this and should just keep doing what I'm doing. I'm trying to think and second guess a lot about what anyone who might care to play with this thing would actually care about. As for myself, I seem to be having fun with things as is.

shortname=ooocfb

Archived Comments

  • Your Personal Web Proxy is starting to sound an awful lot like the personal agents described in David Brin's novel "Earth". I have to say that I'd be very interested in being a beta site if you have need for any. As a "Research Analyst" a fair bit of my job is to find out about stuff that might be interesting to my division, and then let folks know about it. A web proxy that can be proactive in constantly searching and updating information in various bins/areas of interest is certainly going to be helpful. FWIW, Ewan
  • Do it in Java! That way I can help ;) Also you can leverage stuff like Lucene (apache's open-source search engine)
  • Ewan: Funny you should mention Mr. Brin and his novel - I've had a blog entry brewing for awhile wherein I gush about the novel and list out all the things from it I want to build. This proxy is, actually, one such agent :)
  • Disclaimer: Being on the Twisted team, of course I think you should keep the PersonalWebProxy in Python. You keep talking about the Async vs. Threaded decision when talking about whether to use Python or Java for your project. However, there's absolutely no reason you can't use threads if you continue with your Twisted implementation. Twisted fully supports the Half-Async/Half-Threaded design pattern. See twisted.internet.interfaces.IReactorThreads for information. The benefits of this design pattern are obvious. You get the speed advantages of the asynchronous bottom layer, which is already written, and the business logic code that runs in threads can be naive. Zope uses this design pattern to allow page assembly code to run synchronous db queries, xml-rpc queries, etc. Ewan Grantham: You can help if this project is continued in Python; if you know Java, you're going to have no problem learning Python, and you're going to have tons of fun using it. Personally, I am very interested in developing and using the PersonalWebProxy. The first thing I thought about writing was a look-ahead accellerator -- when you visit a page, the proxy fetches that page and all the pages that are linked from that page, so when you click a link the page is already downloaded. Another thing I'd like is a permanent history. Whenever I use any browser to browse through the proxy, the PWP should keep track of where I have been, and allow me to review this history chronologically, and also perhaps automatically sort things by keyword. I'd also like this history to be (once I have set up PWP to do so) synchronized with a remote server, so for example, when I am browsing at home, my history file gets synchronized with my web host, and when I am browsing at work, the same history file on my web host is updated. Something that goes along with this idea is a permanent bookmarks interface. Visiting the PWP application on a certain web port should show me a list of bookmarks and allow me to add to them, categorize them, etc. The big advantage here would be the syncronization I just talked about with my web host space. My bookmarks would then be available from any host on the internet; no matter what machine I am using (as long as I'm running PWP), my bookmarks would also be locally available. Another idea I have been thinking about ever since you started talking about the PersonalWebProxy project is something like a mail proxy. The way it would work is, you would set up the PWP to connect to your mail account as a client. It would then download certain mail messages (more about that later) and re-serve them as a local mail server. You would then set up your email client to talk to the mail server running inside PWP. Thus, the same sort of transformation that occurs while you're browsing web pages occurs to your email. The first thing I would do with this is set up the server only to download email from email addresses in a certain list. Then I would allow rules based sorting. Of course, this functionality is all present in a basic email client, but once you're relatively sure you're not downloading spam you can start doing more interesting things. (The idea of only downloading email messages from people in a list is similar to only accepting instant messages from people in your buddy list) So things start to get more interesting: We index email for searching, and synchronize email with the email repository on my web host; we parse email messages from friends looking for URLs and have the web proxy immediately download the pages so they are on the hard drive for later, instant viewing; we can send ourselves commands in a simple command language from anything that can send email. Commands could be things like "Bookmark this URL that I don't have time to look at right now"; "Delay delivery of this reminder until the event is near"; etc. Anyway, this is all just wild pie-in-the-sky dreaming, which I don't normally do. But the prospects of improving our daily internet experience with these ideas, using an application built on Twisted, is very exciting, and very possible.
  • I hear your conflict on the Python/Twisted thing. I run into that working with Zwiki/Zope all the time. I've been tempted to switch to Twisted, or to Python CGI, or something in-between. So far I've just been momentum-undriven... Donovan's email ideas sound like Zoe and Chandler/OSAF. http://webseitz.fluxent.com/wiki/ZoE http://webseitz.fluxent.com/wiki/OSAF