style="margin-top:70px;" Clickstream


Unintended Meaning in Your URL?

At least I didn't make this mistake when I chose Third Nature as a company name. I think my favorite is Go Tahoe.

Backups are a Good Thing

Company taken out by bad database backups. My guess is that they didn't test the other half, wherein things are restored from backup. Reminds me of the joke song that was circulating a few years ago:
All those backups seemed a waste of pay.
Now my source files have all gone away.
Oh I believe in yesterday.

There's not half the files there used to be,
And there's a milestone hanging over me.
The system crashed so suddenly.

I pushed something wrong
What it was I could not say.
Now all my data's gone
and I long for yesterday-ay-ay-ay.

The need for back-ups seemed so far away.
I knew my data was all here to stay,
Now I believe in yesterday.

Sun and Greenplum's Overstated Appliance Throughput Numbers

I was digging into the technical details of Sun's "Thumper" data appliance (a dual-CPU dual core rack unit that can hold a massive 24 TB of data) and ran into the overblown claims problem that seems rampant in all vendors selling into the high end of the market.

In this case it's overstated system I/O throughput. Here's the clip straight from Greenplum's site on the Bizgres MPP/Sun deal:
  • Scan 1 Terabyte of data in 60 seconds.
  • Leverages the first and only data server that combines a 4-way server with 24TB of storage in a single integrated system.
If you take their claims at face value, scanning 1 TB of data in 60 seconds at low cost is an impressive achievement. The problem is that to get there requires a serious hardware investment and not just one 4-core unit . The facts are pretty simple:

The documented I/O rate is 2 GB/sec from disk to memory for one 4 core unit (third item under "at a glance")
1 TB of data at 2 GB/sec means one unit can process 1 TB in 500 seconds, or 8 minutes 20 seconds. That's a lot more than 1 minute.
We need a little over 8 units (16 GB/sec) to achieve a TB/minute scan rate.
If we use the $32,995 price (from Sun's online store) for a 12 TB unit rather than the 24 TB unit mentioned above, the cost will be $263,960. The 24 TB unit mentioned above would set us back $559,960.

That's a lot of juice to get a 1 TB scan rate. The problem with this marketing message is that they are trying to mix a performance claim at the high end with a capacity claim at the low end. I could get the 1 TB scan rate with a rack of white-box opteron PCs and Bizgres MPP and come in a lot lower than $263,960. All I would give up is about 100 TB of additional capacity.

This is why you should always read the specs before taking performance claims at face value.

Seth Godin on BI (sort of)

Question: Why don’t you check your Technorati ranking?

Answer: Because the data won't change my actions. Getting data for no good reason just drives you crazy. The secret is to get very flexible in the face of data you care about - changing your x every time you see y changes - and incredibly inflexible in the face of data you don't care about.

This is the heart of why a lot of people have problems with BI. It's from ten questions Guy Kawasaki asked Seth Godin.

Java: Land of Nouns

There are some visualization libraries I want to use and that means I need to learn more Java than I know, so I'm working my way through "Head First Java" with periodic breaks to swear at verbosity and the ironic lack of verbs. I stumbled on this post and I think he's described why I like C/Python/YouNameIt better than working with Java (except that I don't like Smalltalk or LISP either).It's kind of long-winded at the beginning, so I suggest skimming down to the halfway point before the substance of functional ordination hits.

Read Execution in the Kingdom of Nouns

Deploying BIRT

Jason Weathersby has a good article describing how to deploy BIRT, something that can be confusing at times due to the number of deployment options. Link


Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.

Subscribe to XML feed

Bio / About Me

Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from

Search this site or  the web

Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
A Bronx Tale

Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily

War in Context
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
Clickstream Data Warehousing
Technorati Profile

04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013

Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.