Post-TDWI Technology Highlights and Trends
I always come back from a TDWI conference with a big stack of notes from meetings with vendors and people doing work in BI and performance management. This conference was larger than usual with a lot of interesting things going. Some highlights:
Rapid Increases in Data Volumes - Appliances and More Appliances
Scalability and performance are a big issue and getting bigger. I spoke to several people working on dealing with increasing data volumes. I used to plan about 20-25% annual growth in data when I managed BI and DW applications. Based on this small sample, the amount of data and requests for new information are pushing this to the 30-40% annual growth range for existing warehouses.
The high end just keeps getting higher too. This probably caught people off guard who were expecting the ceiling on data to stay relatively steady. It isn't, so speicalized products to deal with vast amounts of data are still useful.
On that front, there were interesting announcements at the show. DATAllegro had an interesting announcement, in that they're moving away from a hardware appliance to a software-only solution, but decided to partner with specific vendors. This throws more fuel on the "commodity hardware keeps getting better, why go with specialty hardware?" line of questions aimed at appliance vendors.
Nettezza, the most well-known data warehouse appliance vendor, continues to plug away, and they are still solidly in the custom-engineered hardware space. DATAllegro seems to be going down a path similar to that of Greenplum, who position themselves as a less expensive version of Teradata, with Greenplum preferably running on Sun hardware (and the hardware they prefer is pretty nice). Not a peep out of the startup Calpont for close to two years now, though I hear there may be signs of life coming from them again. I think the much older Kognitio is still breathing, though seen in the US about as often as the ivory-billed woodpecker (you might remember their offering better as White Cross).
One interesting company getting ready to launch is Dataupia, who have a very different take on how to deal with large-scale data storage issues. I'd love to talk more but I've been NDAd. I'd like to write more about Paraccel too since they position themselves as a transparent query accellerator. NDAd there as well but I liked what I heard because they tackle one of the problems faced by products based on similar architectures.
So does Hyperroll but via a very different approach. The column-based database Vertica was not in sight, a different approach to scaling query performance and data volumes. Sybase IQ is another example. At this point I'm moving away from dealing with large data volumes and more into query performance whether or not it's on large data, which is a different problem.
Predictive analytics (aka data mining) was another hot topic, rated by the Executive Summit attendees as the number one item expected to have the most impact over the next several years. I see this as big too. A lot of the engineering problems we faced in the 90's have been addressed and raw computing power on the cheap has made broader use feasible (these technologies tend to be CPU and memory intensive).
I'm not completely sold on predicitive analytics for end users yet. It stills takes expertise to understand which techniques work best for what types of problems. Using them requires knowledge of the technology so you avoid making common data mining mistakes. Tool support is better, but still has a ways to go before it can get out of the heavy-duty analyst / IT side.
That said, I've been seeing more PA technologies embedded into applications. My opinion has been that PA is really a back-end technology almost like ETL. The output of the tool is the meaningful element; the processing isn't so interesting to users. You can see evidence of embedded analytics all over the place.
Recommendation engines are probably the most obvious of the buried uses. I think the market is ignoring recommendation engines at the moment, and there are few standalone products out there that aren't part of some e-commerce solution. When you think about the deluge of information and the growing data on the web, recommendation engines make a lot of sense. Companies like Amazon pioneeered this in the e-commerce space, but it's expanded into many other areas.
There are startups all over the place that are really nothing more than specialized recommendations engines disguised as services for consumers. Pandora and Last.fm in the music space, for example. A common element in companies from Amazon to Netflix to Pandora or Last.fm is that the recommendation engines are (largely) custom. The tooling for broad use isnt' there yet but the market is going to demand it. I'm seeing more and more in the buzz about recommendations in the web market (where I spend about half my time). I've got plenty more on recommendation engines that I've been putting into a presentation for a conference, so I'll save the topic for another time.
Bottom line: predictive analytics, coming thing, may be heading into overhype mode if the industry analysts and vendors start pushing it as a solid end-user technology.
Clickstream Coming Back?
I had a half-dozen conversations about clickstream data, up from the 1 I usually have. I haven't done a lot of work with web analytics over the past few years because the bottom more or less dropped out of that market. Most people bought web analysis packages or used hosted services.
This is changing, which I think reflects a maturing of company BI efforts. They need to marry the clickstream and internal data to get a view across processes and artificual business divisions. All the companies I talked to were talking about bringing the data in-house, even if they were planning to conitnue with the online analytics provided by their applications. The problem is simplay that web site data alone isn't as valuable as web site data integrated with back office systems.
Note to companies providing product information online: if you have a product data sheet online, I'm looking at it to see if you fit what I'm looking for. Sticking this behind a registration page just annoys me and I'll look elsewhere because you're interrupting my flow. My experience is that about 30% of the email addresses are no good, or are throwaways. It's fun to look through reg-page data for the number of entries from "Heywood Jablowme" at "123 Fake Street". Not that I would enter in that data. I use BugMeNot.
Plenty of other interesting trends continue as well:
Strangely missing was the impact of the web on the BI market and vice versa. The topic is largely ignored until you mention something like mashups, which are really one-off specialized BI applications built with web technology. It's like the movie "When Worlds Collide" Everything is going to be shaken up in the BI and web technology markets as web tooling, BI infrastructure, and the mix of data and text converge. At the moment we're living in parallel worlds and neither world recognizes the other exists or is headed into the same space.
- BI pushing into the mid-market and smaller companies
- BI use increasing in the mainstream majority
- commoditization of BI tools (and to some extent integration tools) accelerating
- on-demand / near-real-time / right-time data needs creating new architectural challenges for IT and vendors alike
The vendors I saw at TDWI that seem to be thinking about this (either deeply or opportunistically) were Cognos (Celequest really), IBM (deep down in research-land, e.g. ManyEyes and QEDWiki) and oddly enough the EII vendor Denodo.
I've been spending an inordinate amount of time on all things web and BI since I worked on both sides of the aisle. It's fun. I can't wait for MashupCamp this summer. In the meantime, I've got preparations for a web integration talk I'm doing at the Shared Insights Portals conference.
See a trend or something I missed at TDWI? Leave a comment below.
Labels: appliances, BI, clickstream, performance, TDWI
Posted by Mark Wednesday, February 28, 2007 5:48:00 PM |