style="margin-top:70px;" Clickstream

Clickstream

     
Natural Language Processing, Social Networks and Enron Emails

Enronic UIThis article by Jeffrey Heer at UC Berkeley describes an application called enronic he constructed to analyze the content of Enron's corporate email. He's making use of a several different visualization techniques and algorithms for message categorization, social network inference and community analysis for extracting information from email text.

His reasoning behind doing this work is exactly the right way to attack the seemingly intractable problems with text mining and natural language processing, as well as the slow adoption of data mining technology. Instead of working on improvements in the algorithms to make them better or more usable, he focuses instead on combining visualization techniques with the underlying algorithms to allow easy exploration. The goal is to enable a person to explore data and relationships instead of trying to replace the person with a system that spits out the answer.

Here he talks about a discovery he made while exploring the data with his application:
Such an analysis revealed the role of John Shelk, who regularly reported on Congressional meetings, sending all such meeting reports to Tim Belden. In fact, the visualization reveals that their conversation is completely one-sided, with John sending reports to Tim, with no back-traffic occurring. This is a bit suspicious. Clicking on Tim Belden then reveals that according to the database he hasn't sent ANY e-mails, but receives various legal reports from throughout the company.

This is an interesting occurrence to flag for increased investigation. It could be that Tim Belden's e-mails simply were not included in the annotation set given to the class, and this would be easy to check out (of course, ideally the interface would support this check already, a point for future improvement). Otherwise, might have his e-mails been expunged? If so why? It is clear that Tim Belden was receiving important information from throughout the organization -- what happens to this information subsequently? Is it being utilized outside of e-mail? This might highlight Tim Belden as a possibly interesting person for prosecutors investigating Enron.

After performing this analysis, I did a web search on Google for "Tim Belden". On my honor, I had never heard his name before doing this analysis exercise. Little did I know he was the first person charged by prosecutors, considered the "mastermind" of Enron's manipulation of California's markets, and was found guilty on charges of federal conspiracy. The full story is available at CBS News. Already, this bodes well for the potential of the system!


Comments: Post a Comment

Home

Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.


Subscribe to XML feed


Bio / About Me


Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from Amazon.com

Search this site or  the web



Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
Quicksilver
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
Hero
A Bronx Tale

Blogroll
Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily
Futurismic
Fafblog
Kottke.org

Miscellany
War in Context
Salon.com
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
WorldChanging
/.
Clickstream Data Warehousing
Technorati Profile

Archives
04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013


Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.