AgentFrank
Introduction
The goal of Agent Frank is to be a personal intelligent intermediary and companion to internet infovores during their daily hunter/gatherer excursions. Whew. Okay, so what does that mean? Well, let's take it one buzzword at a time:
Personal - While employing many traditionally server-side technologies, Agent Frank is intended to reside near the user, on the desktop or the laptop.
Intelligent - Agent Frank wants to learn about the user, observe preferences and habits, and become capable of automating many of the tedious tasks infovores face. Eventually, this will come to involve various forms of machine learning and analysis, & etc.
Intermediary - Amongst Agent Frank's facilities are network proxies that can be placed between local clients and remote servers. Using these, Agent Frank can tap into the user's online activities in order to monitor, archive, analyze, and alter information as it flows. For example, using a web proxy, Agent Frank can log sites visited, analyze content, filter out ads or harmful scripting.
Companion - Agent Frank's ultimate purpose is to accompany an infovore and assist along the way.
Agent Frank is, at least initially, a laboratory for hacker/infovores to implement and play with technologies and techniques to fulfill the above goals. At its core, Agent Frank is a patchwork of technologies stitched together into a single environment intended to enable this experimentation. At the edges, Agent Frank is open to plugins and scripting to facilitate quick development and playing with ideas.
Agent Frank wants to be slick & clean one day, but not today. Instead, it is a large and lumbering creature with all the bolts, sockets, and stitches still showing. This is a feature, not a bug.
Download
Latest tarball: http://www.decafbad.com/downloads/AgentFrank-20030215.tar.gz
CVS access: Working on it
Usage
Unpack the tarball and try running ./start.sh from within the project directory. Or, check out ./start.sh and do what it wants. That is, set the classpath to build and all the JARs in lib, run com.decafbad.agent.Main
Note - to get up and running in windows, try this batch file as a start.sh replacement.
Once running, you should be able to view Agent Frank's home page at http://localhost:8080/
To use the built-in web proxy, use port 51966 on localhost. The Archiver plugin will start stashing content and indexing it at data/Archiver/. The MetaMiner? plugin will start accumulating RDF metadata on web resources visited.
To launch a local BeanShell? desktop, visit http://localhost:8080/launchDesktop.jsp
Within the BeanShell? desktop, there are a few interesting commands available:
getMain(); - Return a reference to the current main app object (see: com.decafbad.agent.Main)
- At present, there are some interesting public fields to be accessed within Main:
- There are also a few interesting public methods:
-
addPlugin(File pluginPath)- given a path, try to load up the plugin found within - (more to come)
-
listPlugins(); - Print out a numbered list of active plugin contexts
reloadPlugin([plugin number]); - Reload a given plugin by number
getPlugins(); - Get a list of plugin contexts
rdfQuery("select query..."); - Perform an RDQL query on the app's RDF model
rdfQuery(new File("eg/q1.rdql")); - Perform an RDQL query from a file on the app's RDF model.
Overall Features
Implemented in Java, with an intent to stick to 100% pure Java.
Makes use of Jetty for an embedded browser-based user interface
Employs BeanShell to provide a shell prompt interface and scripting facilities
RDF metadata is employed via the Jena toolkit
Web proxy services are provided via the Muffin web proxy
Text indexing and searching enabled by Jakarta Lucene.
Exploring use of HSQL and/or Jisp for data storage.
Plugins / Components
Archiver
The Archiver plugin's purpose is to capture, archive, and index all web content viewed. Among other projects like it, it draws inspiration from AT&T's iProxy (see: A Proxy-Based Personal Web Archiving Service) and Aaron Swartz's Archiver Proxy. At present, its capabilities are few, but it currently captures headers and content, and uses Jakarta Lucene to index text content for searching.
MetaMiner?
Future Directions
Rework everything as blocks to tie together with the Avalon Phoenix server kernel
Replace Muffin with a Jetty-based proxy or a rewritten new proxy
Need a better way to efficiently compress and store archived web content
Completely BeanScripted? or BSF-based plugin
Support plugins in jars, ideally without unpacking the jar
Plugin download and update tracking
BSF support for expanded scripting options
Use Quartz for scheduled tasks
Check out JXTA for P2P experimentation (distributed Google, anyone?)
Look into integrating a SOCKS proxy to intermediate other network services?
Use port forwarding proxies for intermediating other network services?
Related Projects
Pro-Active Webfilter - http://paw-project.sourceforge.net/
Proxomitron plugins - http://www.geocities.com/u82011729/prox/
PIA - http://www.risource.org/PIA/index.shtml
iProxy - http://www.research.att.com/~iproxy/intro.html
Hep - http://www.fettig.net/index.cgi/2002/12/#Personal_Proxies
WebMate? - http://www-2.cs.cmu.edu/~softagents/webmate/Introduction.html
WebMate? research publications - http://www-2.cs.cmu.edu/~softagents/webmate.html
ArchiverProxy? - http://logicerror.com/archiverProxy
Amit's Web Proxy Project - http://theory.stanford.edu/~amitp/proxy.html
IBM's Web Intermediaries - http://www.almaden.ibm.com/cs/wbi/index.html
MeStream? - http://mestream.sourceforge.net/
Internet Junkbuster Proxy - http://www.junkbusters.com/
Surfboard - http://surfboard.sourceforge.net/
FilterProxy - http://filterproxy.sourceforge.net/
WWWOFFLE - http://www.gedanken.demon.co.uk/wwwoffle/
SSL Proxying by Jon Udell - http://udell.roninhouse.com/bytecols/2001-02-14.html
Privoxy - http://www.privoxy.org/
Bookmarks
http://www.chadfowler.com/index.cgi/Computing/LatentSemanticIndexing.rdocUPP thoughts from Russell Beattie
(wishlist) - http://www.russellbeattie.com/notebook/index.jsp?date=20021229#224206
http://www.russellbeattie.com/notebook/index.jsp?date=20030125#181923
http://www.russellbeattie.com/notebook/index.jsp?date=20021227#160210
http://www.russellbeattie.com/notebook/index.jsp?date=20021229#213649
"Jog" - http://matt.griffith.com/weblog/stories/2002/12/22/jogMyPersonalGoogleampWaybackMachine.html
"Desktop blog" - http://radio.weblogs.com/0108194/2003/01/25.html#a629
