Full Day Conference Session on Open Source BI and DW
Open source has little visibility in the mainstream data warehousing community. This is an attempt to change that. We've got four projects involved in the open source business intelligence and data warehouse course at the May TDWI. I'm finalizing the structure and lineup for the afternoon session,. The plan is to do demo-based comparison of the various projects, let them show interesting things they can do, present some short case studies, and do a panel on open source adoption.
The projects who have given me a verbal "yes" are:
The morning session will cover OSS possibilities for the entire data warehouse technology stack. The afternoon is dedicated to BI. This should be a great time and will be a lot different from my usual full-day sessions. I'll be writing more about each project as I work on open source tool evaluations for an upcoming report.
I always come back from a TDWI conference with a big stack of notes from meetings with vendors and people doing work in BI and performance management. This conference was larger than usual with a lot of interesting things going. Some highlights:
Rapid Increases in Data Volumes - Appliances and More Appliances Scalability and performance are a big issue and getting bigger. I spoke to several people working on dealing with increasing data volumes. I used to plan about 20-25% annual growth in data when I managed BI and DW applications. Based on this small sample, the amount of data and requests for new information are pushing this to the 30-40% annual growth range for existing warehouses.
The high end just keeps getting higher too. This probably caught people off guard who were expecting the ceiling on data to stay relatively steady. It isn't, so speicalized products to deal with vast amounts of data are still useful.
On that front, there were interesting announcements at the show. DATAllegro had an interesting announcement, in that they're moving away from a hardware appliance to a software-only solution, but decided to partner with specific vendors. This throws more fuel on the "commodity hardware keeps getting better, why go with specialty hardware?" line of questions aimed at appliance vendors.
Nettezza, the most well-known data warehouse appliance vendor, continues to plug away, and they are still solidly in the custom-engineered hardware space. DATAllegro seems to be going down a path similar to that of Greenplum, who position themselves as a less expensive version of Teradata, with Greenplum preferably running on Sun hardware (and the hardware they prefer is pretty nice). Not a peep out of the startup Calpont for close to two years now, though I hear there may be signs of life coming from them again. I think the much older Kognitio is still breathing, though seen in the US about as often as the ivory-billed woodpecker (you might remember their offering better as White Cross).
One interesting company getting ready to launch is Dataupia, who have a very different take on how to deal with large-scale data storage issues. I'd love to talk more but I've been NDAd. I'd like to write more about Paraccel too since they position themselves as a transparent query accellerator. NDAd there as well but I liked what I heard because they tackle one of the problems faced by products based on similar architectures.
So does Hyperroll but via a very different approach. The column-based database Vertica was not in sight, a different approach to scaling query performance and data volumes. Sybase IQ is another example. At this point I'm moving away from dealing with large data volumes and more into query performance whether or not it's on large data, which is a different problem.
Predictive Analytics Predictive analytics (aka data mining) was another hot topic, rated by the Executive Summit attendees as the number one item expected to have the most impact over the next several years. I see this as big too. A lot of the engineering problems we faced in the 90's have been addressed and raw computing power on the cheap has made broader use feasible (these technologies tend to be CPU and memory intensive).
I'm not completely sold on predicitive analytics for end users yet. It stills takes expertise to understand which techniques work best for what types of problems. Using them requires knowledge of the technology so you avoid making common data mining mistakes. Tool support is better, but still has a ways to go before it can get out of the heavy-duty analyst / IT side.
That said, I've been seeing more PA technologies embedded into applications. My opinion has been that PA is really a back-end technology almost like ETL. The output of the tool is the meaningful element; the processing isn't so interesting to users. You can see evidence of embedded analytics all over the place.
Recommendation engines are probably the most obvious of the buried uses. I think the market is ignoring recommendation engines at the moment, and there are few standalone products out there that aren't part of some e-commerce solution. When you think about the deluge of information and the growing data on the web, recommendation engines make a lot of sense. Companies like Amazon pioneeered this in the e-commerce space, but it's expanded into many other areas.
There are startups all over the place that are really nothing more than specialized recommendations engines disguised as services for consumers. Pandora and Last.fm in the music space, for example. A common element in companies from Amazon to Netflix to Pandora or Last.fm is that the recommendation engines are (largely) custom. The tooling for broad use isnt' there yet but the market is going to demand it. I'm seeing more and more in the buzz about recommendations in the web market (where I spend about half my time). I've got plenty more on recommendation engines that I've been putting into a presentation for a conference, so I'll save the topic for another time.
Bottom line: predictive analytics, coming thing, may be heading into overhype mode if the industry analysts and vendors start pushing it as a solid end-user technology.
Clickstream Coming Back? I had a half-dozen conversations about clickstream data, up from the 1 I usually have. I haven't done a lot of work with web analytics over the past few years because the bottom more or less dropped out of that market. Most people bought web analysis packages or used hosted services.
This is changing, which I think reflects a maturing of company BI efforts. They need to marry the clickstream and internal data to get a view across processes and artificual business divisions. All the companies I talked to were talking about bringing the data in-house, even if they were planning to conitnue with the online analytics provided by their applications. The problem is simplay that web site data alone isn't as valuable as web site data integrated with back office systems.
Note to companies providing product information online: if you have a product data sheet online, I'm looking at it to see if you fit what I'm looking for. Sticking this behind a registration page just annoys me and I'll look elsewhere because you're interrupting my flow. My experience is that about 30% of the email addresses are no good, or are throwaways. It's fun to look through reg-page data for the number of entries from "Heywood Jablowme" at "123 Fake Street". Not that I would enter in that data. I use BugMeNot.
Plenty of other interesting trends continue as well:
BI pushing into the mid-market and smaller companies
BI use increasing in the mainstream majority
commoditization of BI tools (and to some extent integration tools) accelerating
on-demand / near-real-time / right-time data needs creating new architectural challenges for IT and vendors alike
Strangely missing was the impact of the web on the BI market and vice versa. The topic is largely ignored until you mention something like mashups, which are really one-off specialized BI applications built with web technology. It's like the movie "When Worlds Collide" Everything is going to be shaken up in the BI and web technology markets as web tooling, BI infrastructure, and the mix of data and text converge. At the moment we're living in parallel worlds and neither world recognizes the other exists or is headed into the same space.
The vendors I saw at TDWI that seem to be thinking about this (either deeply or opportunistically) were Cognos (Celequest really), IBM (deep down in research-land, e.g. ManyEyes and QEDWiki) and oddly enough the EII vendor Denodo.
I've been spending an inordinate amount of time on all things web and BI since I worked on both sides of the aisle. It's fun. I can't wait for MashupCamp this summer. In the meantime, I've got preparations for a web integration talk I'm doing at the Shared Insights Portals conference.
See a trend or something I missed at TDWI? Leave a comment below.
A Simple Description of Net Neutrality and Why It's Important
Net neutrality is simple: we all use the same pipes. The telcos can't mess with what's inside. Everybody gets the same speed and quality of service. To quote from the Daily Show, "It's as though the richer companies get no advantage at all."
There's no development that's so successful that people driven by greed will not try to screw it up. Telcos and their partners pushing pay-for-preference schemes is like reverting to the old days of AT&T.
Save The Internet has a great short video about what's happening and you can help. They stopped one corporate power grab last year and now they're on the offensive, trying to preserve a key design principle that's made the web possible.
Head of Appropriations Committee Doesn't Believe Copernicus or Galileo
One of the most powerful politicians in congress (Warren Chisum, R Texas) doesn't believe the earth revolves around the sun. Reversing 500 years of science is an amazing feat. I found this so hard to believe I had to go hunting to make sure it wasn't a hoax. Turns out it's true. I wonder what he thinks of the space program? Obviously the moon landings and mars missions were faked. We all saw how realistic Total Recall was so they must have used the same sets as the mars rover.
"Still, it's enough to set the world a-spinning that the chairman of the House Appropriations Committee, the most powerful committee in the House, distributed to legislators a memo pitching crazed wingers who believe the earth stands still -- doesn't spin on its axis or revolve around the Sun -- that Copernicus was part of a Jewish conspiracy to undermine the Old Testament."
With this sort of monumental stupidity in our elected representatives, it's no wonder the US educational system is falling apart and science research is blossoming overseas. Here's the blog post I first read and his link to Chisum's memo promoting the nutjobs who say the earth stands still using god-designed magnetic levitation (it's right on the home page). Posted by Mark Friday, February 16, 2007 10:32:00 AM |
Lessig on Innovation and the Read-Write Web
If you think web 2.0, YouTube, public policy, innovation and anime have nothing to do with each other, think again. If you think law and public policy are dull, he may change your mind. I stumbled on this lecture by Lawrence Lessig on the topics of innovation and how we got what we have with the read-write web. It's a great talk that could be sort of summed up as web 1.0 vs. web 2.0, consumer versus creator, and consume versus consume-create-share (and recreate). And how web 3.0 could end up being web 0.5 if we aren't careful.
Funny note: Lessig used the example of John Philips Sousa trashing the infernal music machine, an example I've used in open source talks I've given. His version is better because he concludes that Sousa was right :-) At least now the infernal machines can give us back what they took away a hundred years ago. Of course Lessig is more pessimistic. His official title for the talk is "The Withering of the Net: How DC Pathologies are Undermining the Growth and Wealth of the Net." and his view is that the last 4 years have eroded the economic and innovation progress in this country.
I just wrapped up the enterprise information integration (EII) webcast for TDWI so here are the slides. The general goal was to explain a little about what modern EII tools are like and scenarios where they are a good fit. EII went through a small hype cycle in 2004-2005 and kind of fell flat, but the technology has matured enough to fill gaps that ETL and EAI tools aren't well suited to. The slides are available as a PDF from Third Nature.
Next month I'll be doing a webcast on a similar topic, hybrid data warehouse architecture and options for providing access to on-demand/"right time" data. This is more focused on physical versus virtual data consolidation and when to use virtual techniques.
Aqua Teen Hunger Force Filmed Planning Boston Strike
"We will disrupt their workday with a mildly offensive blinking light!" Enough said. Go watch it.
I've avoided the topic of the idiotic Boston police response to advertisements for Aqua Teen Hunger Force made from Hasbro's Lite Brite toys, mistaking them for bombs and shutting down the city for a day. Honestly, if the anti-terrorist squads can't tell the difference between a flashing, blinking, attention-getting toy and a bomb we're wasting our money. Ironic that this paranoia should strike the home state of the witch trials. Via Boing Boing
I've noticed an uptick in hits on this blog from searches looking for BI vendor evaluations and criteria. While I don't publish much on this topic, an excellent resource is Cindi Howson's BI Scorecard where she sells research reports and subscriptions that evaluate all the major business intelligence vendors. If you want specific evaluation criteria, she has a blank BI scorecard that she uses to evaluate the products available from her free resources page.
I'm currently working on an open source BI tool evaluation report and will be using Cindi's evaluation criteria with extensions specific to open source. The report will be available through BI Scorecard in late spring of this year. Posted by Mark Monday, February 12, 2007 12:46:00 PM |
Data Federation and EII are Underutilized Integration Tools
I'm working my way through the current EII and data federation products as part of research I'm doing for on-demand/real-time data warehousing. The new features in these products - ease of use, non-relational sources, ERP connectors, performance / scaling enhancements, multiple input and output protocols - really make them worth a second look. It's underutilized in current environments that have to deal with on-demand or "right time" data access and multi-format data.
This coming Wednesday (February 14) I'll be doing a short webcast for TDWI on EII covering a little bit of how EII and federation tools work and how they can be used both in and outside the BI environment.
In the past I haven't been a big fan of EII for a few good (I think) reasons:
EII/federation vendors are wedged between big ETL vendors on one side and big EAI vendors on the other side (long term product viability).
One class of products tended to focus on XML-only interchange, and generally sucked for relational data, particularly in the areas of performance, scalability and ease-of-use.
The other class of products focused on distributed queries but didn't work well for anything else, and were mostly limited to relational output.
The IT usage scenarios were/are not quite mature enough to provide a solid market.
Too expensive relative to the value based on use cases.
I now find myself liking the latest releases of the products. The vendors who survived the mini-hype wave circa 2005 have largely dealt with points 2, 3 and 4 but points 1 and 5 are still true.
Long term viability of standalone vendors is still an issue, though I expect the best few may survive. We're more likely to see EII/federation slip into the data integration stacks or platforms of major vendors and squeezing the smaller niche products out. We're seeing some of that happen already.
The better products still cost too much unless you have the perfect use case. The ROI simply isn't there for real-time data delivery when companies already have alternatives that meet "good enough" criteria; technologies like replication, EAI/queuing software, and CDC/ETL combinations. But...
At the same time, I'm seeing more use cases, and more urgency around some of the existing uses. For a time dashboard hype was lifting EII vendors, but the value/cost ratio of the dashboard use case was poor. Many EII vendors are now chasing the SOA market because EII shows more value there than as an adjunct to support dashboards. Try using EAI tools to make distributed multi-format queries some time. It's not fun.
I'll be writing up more as I run products through their paces. Vendors I'll be doing in-depth work with include IBM, Composite, BEA, Business Objects and several others. Look here over the coming months for more detailed technical writeups. I'm going to be tying portals and mashup servers into the mix so this will be fun.
The paper edition, anyway. The magazine will live on in online form. Sadly, CMP laid off Dave Stodder, who's been writing, editing, speaking and publishing in the technology realm for years. He was my first editor back before IE was formed from the merger of DBMS and Database Programming & Design. I still miss Database Programming and Design. It was the only magazine on database topics that ever catered to heavily technical developers and DBAs.
DM Review is the last magazine standing that covers the whole data warehousing and business intelligence spectrum. They've been getting decidedly thinner, so it's probably a matter of time before they shift to online-only distribution. These Internets can be hell on a business model.
I like the design of this graphic from the New York Times comparing the ginormous annual cost of the Iraq war to what we could have spent the money on instead. Pretty amazing what the War That Would Pay For Itself has cost us. (click thumbnail for a larger image).
I also found out that all the libraries in my county will be closing due to lack of funds. Seems the way to pay for the cost of bombs is with our children's education. (although the direct cause of this cut is because of a sleazy federal budgeting rule where clearcutting all our public forests gives us library, police and school money; however, funding to cover the cuts would be possible, were it not for the giant sucking sound that is the Iraq war)
I was listening to a talk on continuous partial attention (CPA)and it's goodness/badness and it made me want to re-read Neil Postman's "Amusing Ourselves to Death" because it's a parallel discourse written on a similar topic before the web existed. Postman asserts that different communications mediums shape communications and public discourse. A big part of his assertion is that print carries rational argument effectively but television (and to some extent radio) hampers or removes it. I wonder what he'd say to CPA and the interactive written word?
Forward from the book:
What Orwell feared were those who would ban books. What Huxley feared was that there would be no reason to ban a book, for there would be no one who wanted to read one. Orwell feared those who would deprive us of information. Huxley feared those who would give us so much that we would be reduced to passivity and egoism. Orwell feared that the truth would be concealed from us. Huxley feared the truth would be drowned in a sea of irrelevance. Orwell feared we would become a captive culture. Huxley feared we would become a trivial culture, preoccupied with some equivalent of the feelies, the orgy porgy, and the centrifugal bumblepuppy. As Huxley remarked in Brave New World Revisited, the civil libertarians and rationalists who are ever on the alert to oppose tyranny "failed to take into account man's almost infinite appetite for distractions". In 1984, Huxley added, people are controlled by inflicting pain. In Brave New World, they are controlled by inflicting pleasure. In short, Orwell feared that what we hate will ruin us. Huxley feared that what we love will ruin us.
This book is about the possibility that Huxley, not Orwell, was right.
Game Design and System Design: Get Your Hands Dirty
The final conclusion of this post on game design is that "you have to get your hands dirty", which applies equally to anyone calling themselves an architect or designer in the IT world. It's one of the reasons that so many big IT projects fail; there's typically some conceptual architect/design person or team who passes off the work to the next phase and moves on to the next $300 per hour engagement, only coming back and offer advice or tweak the design a few months later but with zero learning in between. The game design post takes a few paragraphs to get up to speed but I think he makes some excellent points. Posted by Mark Thursday, February 01, 2007 11:50:00 AM |