style="margin-top:70px;" Clickstream


Roundup of Open Source BI

I liked this roundup of open source BI products. More thorough description of some of the items I listed in an earlier post, and a little more discussion on the topic. He hits all the bases, with BI, OLAP and even the data warehouse server appliance companies Netezza and DATAllegro. I interviewed people from both companies for an article on hardware appliances for data warehousing, but it's been shelved for a couple months, just long enough for me to hear through the rumor mill that the third vendor I talked to was having difficulty with their current round of venture funding.

He also talks about Pentaho, which I want to dig into more before I write anything about them. From what I've seen (mostly some articles and their web site), they look promising. The question is whether their model will work, and what they really want to accomplish.

In Need of a Death Ray or Army of Combat Robots?

You can see and order useful robots and weapons from the past's future at Broton thanks to Southern California-based artist, sculptor, designer and machinist Greg Brotherton. I'm considering the ElectroLux death ray to deal with pesky intruders at my place after seeing it in action.

Free Patent Searches

Even though I believe software is not something that should be patentable under current insane patent laws, I'm applying for a couple software patents. The cost of doing simple searches is pretty high. Enter FreePatensOnline to save the day. At least partially. We'll see how it goes.

The problem with patent searching today is that you have a choice of free, barely productive search tools or expensive premier tools aimed at people who do this all the time. There's nothing for the in-between crowd. This kind of thing is important for the DIY-people who want to follow the patent/copyright approach instead of the Creative Commons / GPL route for something they are doing, and I'm one of them at the moment.

Hiring More Sales People Won't Grow Revenues Faster

A while back I went to a venture dinner and heard Mark Leslie give a talk about how hiring sales people doesn't have an immediate effect on revenues, even after taking into account the normal 6-9 month ramp up where the new people aren't productive [linked below].

This was on my mind after talking to a former VP of sales for a startup we worked at together. The CEO fired him after 12 months. He was hired was to turn the sales organization around, and laid out a plan to do this. The plan involved a slow hiring ramp, and about 18 months to hit revenue targets. The CEO refused to accept the plan, demanding speed by hiring more salespeople sooner. Thanks to this brilliant intrusion into the sales VP's domain the company burned through cash too quickly, didn't meet revenue targets, and was severely cash-strapped the following quarter.

Here's what Mark Leslie had to say, echoing these experiences:
When we apply the MLC [manufacturing learning curve] to sales, we come to the following conclusion: The time it takes to achieve cash flow breakeven is reasonably independent of sales force staffing. It is, instead, entirely dependent on how well and how quickly the entire organization learns what it takes to sell the product or service while incorporating customer feedback into the product itself. Because the entire organization has to come up to speed, hiring a large initial sales staff does not speed up the time to breakeven, it simply consumes cash more quickly.
I just found and uploaded a PDF of the slides for the talk. Mark Leslie also posted a note at AlwaysOn summarizing his talk and articles.

Open Source Event in Boston at Open Source Prices!

There's an Open Source Summit in Boston on Friday, June 24 that looks to be a great way to spend the day. It's only $20, some open source heavies will be there, and you get a free video by Dan Bricklin titled "A Developer's Introduction to Copyright and Open Source" - that's some serious value.

Here's the key info from their web site:

"The event, moderated by Dan Bricklin, President, Software Garden, and Bob Zurek, Vice President, Advanced Technology, Ascential Software, will consist of 3 key audience driven interactive panel sessions, along with a luncheon industry keynote.

Panel Sessions:

  • Open Source Licensing
    Hosted by Karen Copenhaver, general counsel of Black Duck Software, and Ira Heffan of Goodwin Procter, this panel will discuss some of the most interesting recent developments in open source licensing topics.
  • Open Source Business Models And Strategies
    Hosted by Tim Yeaton, Chief Marketing Officer, Red Hat, Paul Doscher, CEO, JasperSoft, and Douglas Heintzman, Director of Technical Strategy for IBM Software Group, this session will discuss the state of affairs in the world of open source businesses and the challenges they face as they go against the proprietary commercial giants.
  • Open Source Technical Discussion
    Hosted by Nat Friedman VP Engineering and Collaboration, Novell, and Miguel de Icaza, VP Engineering, Mono Project and Ximian Co-Founder, this session will feature a technical discussionalong with tips on how to create a successful open source Project.

The Luncheon Keynote will be presented by Marc Fleury, Founder, Chairman and CEO, JBoss Group."

If I were in Boston I'd be there. I hope there are some collaborative note-takers in attendence. Thanks go to Bob Zurek at IBM for telling me about this. He's blogged a note about it with a little more information.

BIRT's Out and I Didn't Even Know It

I'm embarrassed to say that even though I recently posted about some open source BI projects, I missed the fact that version 1.0 of BIRT (Business Intelligence and Reporting Tools) was released this week.

The Eclipse-based project was initiated by Actuate, a BI vendor. Interestingly, BIRT is a new project and not a pre-existing product that was opened up. A lot of people seem to be wondering why Actuate would do this. It seems to me that they are going for the open source and Java developer audience, probably hoping to become a de-facto vendor in the Java BI development space. None of the other BI vendors are paying much attention, other than making Linux ports of their products.

Here's the BIRT description from the web site:
BIRT is an Eclipse-based open source reporting system for web applications, especially those based on Java and J2EE. BIRT has two main components: a report designer based on Eclipse, and a runtime component that you can add to your app server. BIRT also offers a charting engine that lets you add charts to your own application.
There are some articles and videos on the BIRT site if you want more information.

Note: I missed OpenRPT in my list of open source reporting tools a little while back. Here's the feature list they describe:
  • Report Definitions are saved in the industry standard XML format
  • Stand-alone or Embeddable WYSIWYG Report Designer
  • Embeddable Report Renderer renders to local printers, including PDF and Postscript distillers
  • Support for All/Even/Odd/First/Last Page Headers and Footers
  • Support for Multiple Column Detail Sections
  • Support for static and database sourced images
  • Support for static and/or database sourced watermarks and page identifiers
  • Support for multiple detail sections and optional, multiple group heads and footers for each detail section

I also came across another project: OpenReports, which is built on top of Jasper. Related to that is ObjectVisualizer which they describe as "an open source business intelligence tool that builds upon Object Persistence technology to provide easy to use query, reporting, and charting capabilities."

All this sounds interesting. I wish I had time to follow up on all these projects and do some in-depth testing. Working full time really gets in the way.

New Project Management Book From O'Reilly

I like what I've read in "The Art of Project Management" so far. I have a lot of formal training in project management, and I find that most project management books fail to be useful. Most fall into the plan-centric model. The problems are that planning is just the start, and no project ever goes by the initial plan. The plan is your guess at how things will go. Then reality intrudes in the form of changing constraints; technical problems like a key person getting sick for a week or the DBA who forgets to test tape restores and discovers that the backups were write-only.

This book looks to be more focused on what you have to do to get the project done than to make nice charts with dependencies and resource leveling. So far my favorite books on managing technical people and projects aren't project management books per se. Tops on my list are "Peopleware" by Tom DeMarco and Tim Lister, and "The Mythical Man-Month"by Fred Brooks.

There's a sample chapter from "The Art of Project Management" available at the O'Reilly site titled How to figure out what to do.

Why Development in Crunch Mode Doesn't Work

I really liked this article about why trying to shove projects into unreasonable deadlines by having everyone work crazy hours doesn't work. It's extensively researched with sources disclosed, providing sound evidence for why a 40 hour work week (or even less) is sensible, and I'm certain will be completely ignored by mainstream management for years to come.

He's writing from the perspective of a software engineer in the game industry, which is legendary for insane work schedules. Each passing year I see it translating more and more into mainstream IT projects, not just game companies and startups. This is because constant cost pressures have driven staffing to skeleton levels at many IT shops, but the need for new applications or enhancements has not slowed.

One of my favorite parts:

In 1908 - almost a century ago - industrial efficiency pioneer Ernst Abbe published in Gessamelte Abhandlungen his conclusions that a reduction in daily work hours from nine to eight resulted in an increase in total daily output. (Nor was he the first to notice this. William Mather had adopted an eight-hour day at the Salford Iron Works in 1893.)

... When Henry Ford famously adopted a 40-hour workweek in 1926, he was bitterly criticized by members of the National Association of Manufacturers. But his experiments, which he'd been conducting for at least 12 years, showed him clearly that cutting the workday from ten hours to eight hours — and the workweek from six days to five days — increased total worker output and reduced production cost. Ford spoke glowingly of the social benefits of a shorter workweek, couched firmly in terms of how increased time for consumption was good for everyone. But the core of his argument was that reduced shift length meant more output.

I have found many studies, conducted by businesses, universities, industry associations and the military, that support the basic notion that, for most people, eight hours a day, five days per week, is the best sustainable long-term balance point between output and exhaustion. Throughout the 30s, 40s, and 50s, these studies were apparently conducted by the hundreds; and by the 1960s, the benefits of the 40-hour week were accepted almost beyond question in corporate America. In 1962, the Chamber of Commerce even published a pamphlet extolling the productivity gains of reduced hours.

But, somehow, Silicon Valley didn't get the memo.
Via kottke (check him out, you'll be glad you did, or mad you wasted so much time)

Natural Language Processing, Social Networks and Enron Emails

Enronic UIThis article by Jeffrey Heer at UC Berkeley describes an application called enronic he constructed to analyze the content of Enron's corporate email. He's making use of a several different visualization techniques and algorithms for message categorization, social network inference and community analysis for extracting information from email text.

His reasoning behind doing this work is exactly the right way to attack the seemingly intractable problems with text mining and natural language processing, as well as the slow adoption of data mining technology. Instead of working on improvements in the algorithms to make them better or more usable, he focuses instead on combining visualization techniques with the underlying algorithms to allow easy exploration. The goal is to enable a person to explore data and relationships instead of trying to replace the person with a system that spits out the answer.

Here he talks about a discovery he made while exploring the data with his application:
Such an analysis revealed the role of John Shelk, who regularly reported on Congressional meetings, sending all such meeting reports to Tim Belden. In fact, the visualization reveals that their conversation is completely one-sided, with John sending reports to Tim, with no back-traffic occurring. This is a bit suspicious. Clicking on Tim Belden then reveals that according to the database he hasn't sent ANY e-mails, but receives various legal reports from throughout the company.

This is an interesting occurrence to flag for increased investigation. It could be that Tim Belden's e-mails simply were not included in the annotation set given to the class, and this would be easy to check out (of course, ideally the interface would support this check already, a point for future improvement). Otherwise, might have his e-mails been expunged? If so why? It is clear that Tim Belden was receiving important information from throughout the organization -- what happens to this information subsequently? Is it being utilized outside of e-mail? This might highlight Tim Belden as a possibly interesting person for prosecutors investigating Enron.

After performing this analysis, I did a web search on Google for "Tim Belden". On my honor, I had never heard his name before doing this analysis exercise. Little did I know he was the first person charged by prosecutors, considered the "mastermind" of Enron's manipulation of California's markets, and was found guilty on charges of federal conspiracy. The full story is available at CBS News. Already, this bodes well for the potential of the system!


Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.

Subscribe to XML feed

Bio / About Me

Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from

Search this site or  the web

Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
A Bronx Tale

Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily

War in Context
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
Clickstream Data Warehousing
Technorati Profile

04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013

Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.