style="margin-top:70px;" Clickstream


Post-TDWI Technology Highlights and Trends

I always come back from a TDWI conference with a big stack of notes from meetings with vendors and people doing work in BI and performance management. This conference was larger than usual with a lot of interesting things going. Some highlights:

Rapid Increases in Data Volumes - Appliances and More Appliances
Scalability and performance are a big issue and getting bigger. I spoke to several people working on dealing with increasing data volumes. I used to plan about 20-25% annual growth in data when I managed BI and DW applications. Based on this small sample, the amount of data and requests for new information are pushing this to the 30-40% annual growth range for existing warehouses.

The high end just keeps getting higher too. This probably caught people off guard who were expecting the ceiling on data to stay relatively steady. It isn't, so speicalized products to deal with vast amounts of data are still useful.

On that front, there were interesting announcements at the show. DATAllegro had an interesting announcement, in that they're moving away from a hardware appliance to a software-only solution, but decided to partner with specific vendors. This throws more fuel on the "commodity hardware keeps getting better, why go with specialty hardware?" line of questions aimed at appliance vendors.

Nettezza, the most well-known data warehouse appliance vendor, continues to plug away, and they are still solidly in the custom-engineered hardware space. DATAllegro seems to be going down a path similar to that of Greenplum, who position themselves as a less expensive version of Teradata, with Greenplum preferably running on Sun hardware (and the hardware they prefer is pretty nice). Not a peep out of the startup Calpont for close to two years now, though I hear there may be signs of life coming from them again. I think the much older Kognitio is still breathing, though seen in the US about as often as the ivory-billed woodpecker (you might remember their offering better as White Cross).

One interesting company getting ready to launch is Dataupia, who have a very different take on how to deal with large-scale data storage issues. I'd love to talk more but I've been NDAd. I'd like to write more about Paraccel too since they position themselves as a transparent query accellerator. NDAd there as well but I liked what I heard because they tackle one of the problems faced by products based on similar architectures.

So does Hyperroll but via a very different approach. The column-based database Vertica was not in sight, a different approach to scaling query performance and data volumes. Sybase IQ is another example. At this point I'm moving away from dealing with large data volumes and more into query performance whether or not it's on large data, which is a different problem.

Predictive Analytics
Predictive analytics (aka data mining) was another hot topic, rated by the Executive Summit attendees as the number one item expected to have the most impact over the next several years. I see this as big too. A lot of the engineering problems we faced in the 90's have been addressed and raw computing power on the cheap has made broader use feasible (these technologies tend to be CPU and memory intensive).

I'm not completely sold on predicitive analytics for end users yet. It stills takes expertise to understand which techniques work best for what types of problems. Using them requires knowledge of the technology so you avoid making common data mining mistakes. Tool support is better, but still has a ways to go before it can get out of the heavy-duty analyst / IT side.

That said, I've been seeing more PA technologies embedded into applications. My opinion has been that PA is really a back-end technology almost like ETL. The output of the tool is the meaningful element; the processing isn't so interesting to users. You can see evidence of embedded analytics all over the place.

Recommendation engines are probably the most obvious of the buried uses. I think the market is ignoring recommendation engines at the moment, and there are few standalone products out there that aren't part of some e-commerce solution. When you think about the deluge of information and the growing data on the web, recommendation engines make a lot of sense. Companies like Amazon pioneeered this in the e-commerce space, but it's expanded into many other areas.

There are startups all over the place that are really nothing more than specialized recommendations engines disguised as services for consumers. Pandora and in the music space, for example. A common element in companies from Amazon to Netflix to Pandora or is that the recommendation engines are (largely) custom. The tooling for broad use isnt' there yet but the market is going to demand it. I'm seeing more and more in the buzz about recommendations in the web market (where I spend about half my time). I've got plenty more on recommendation engines that I've been putting into a presentation for a conference, so I'll save the topic for another time.

Bottom line: predictive analytics, coming thing, may be heading into overhype mode if the industry analysts and vendors start pushing it as a solid end-user technology.

Clickstream Coming Back?
I had a half-dozen conversations about clickstream data, up from the 1 I usually have. I haven't done a lot of work with web analytics over the past few years because the bottom more or less dropped out of that market. Most people bought web analysis packages or used hosted services.

This is changing, which I think reflects a maturing of company BI efforts. They need to marry the clickstream and internal data to get a view across processes and artificual business divisions. All the companies I talked to were talking about bringing the data in-house, even if they were planning to conitnue with the online analytics provided by their applications. The problem is simplay that web site data alone isn't as valuable as web site data integrated with back office systems.

Note to companies providing product information online: if you have a product data sheet online, I'm looking at it to see if you fit what I'm looking for. Sticking this behind a registration page just annoys me and I'll look elsewhere because you're interrupting my flow. My experience is that about 30% of the email addresses are no good, or are throwaways. It's fun to look through reg-page data for the number of entries from "Heywood Jablowme" at "123 Fake Street". Not that I would enter in that data. I use BugMeNot.

Plenty of other interesting trends continue as well:
  • BI pushing into the mid-market and smaller companies
  • BI use increasing in the mainstream majority
  • commoditization of BI tools (and to some extent integration tools) accelerating
  • on-demand / near-real-time / right-time data needs creating new architectural challenges for IT and vendors alike
Strangely missing was the impact of the web on the BI market and vice versa. The topic is largely ignored until you mention something like mashups, which are really one-off specialized BI applications built with web technology. It's like the movie "When Worlds Collide" Everything is going to be shaken up in the BI and web technology markets as web tooling, BI infrastructure, and the mix of data and text converge. At the moment we're living in parallel worlds and neither world recognizes the other exists or is headed into the same space.

The vendors I saw at TDWI that seem to be thinking about this (either deeply or opportunistically) were Cognos (Celequest really), IBM (deep down in research-land, e.g. ManyEyes and QEDWiki) and oddly enough the EII vendor Denodo.

I've been spending an inordinate amount of time on all things web and BI since I worked on both sides of the aisle. It's fun. I can't wait for MashupCamp this summer. In the meantime, I've got preparations for a web integration talk I'm doing at the Shared Insights Portals conference.

See a trend or something I missed at TDWI? Leave a comment below.

Labels: , , , ,

Comments: Post a Comment


Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.

Subscribe to XML feed

Bio / About Me

Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from

Search this site or  the web

Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
A Bronx Tale

Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily

War in Context
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
Clickstream Data Warehousing
Technorati Profile

04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013

Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.