style="margin-top:70px;" Clickstream


Helpful Resources for Scaling Web Sites and Applications

There's a good collection of presentations about high-end scaling of web applications at Peter Van Dijck's blog. Most are hosted on slideshare, but there are some mp3s and PDFs as well. It's hard to find quality information on scaling applications of any sort. I used to deal with scalability a lot in the early days of the web. Technologies have changed quite a bit since then so I focus on scalability mostly in the data warehouse and business intelligence world now. I always pay attention when someone goes through the trouble of collecting a set of good quality material like this. I've only seen two of the presentations he's got listed, and never read the O'Reilly book he mentioned, which means more summer reading in the works.


Real Estate Data Visualization

Trulia, a real estate search service, has an interactive map of US homes animated based on when the homes were built. It's fun to play with. The search service (the first link) has a well-designed navigation and display interface, better than others I've used. Worth looking at to see how information displays can be made interactive.

Labels: , ,

LOLTrek: This Meme Has Jumped the Shark

If you don't know what this means, you probably need to stay in more. In this case, go find the "I can has cheezburger?" image, then visit a tribute to tribbles, LOLTrek. If don't know what "jump the shark" means then you're definitely having too much life and not enough computer and need to spend some time on Wikipedia.

Even funnier is Google correcting "I can has cheezburger" to "I can has cheezeborger" when I tried to track down one of the original images:

To finish off my news reading for today, "I'm in ur cellz killin ur mitochondrias" (Fanta screws with your mitochondria?) Good thing I don't drink those fizzy drinks.

Labels: , ,

Story About the First Computer

The New Yorker has a story about the Antikythera Mechanism, possibly the first mechanical computer, built by hand in ancient Greece. I liked the conclusion best, that the device embodied the world view of an era, in much the way that our current world view is embodied in our own technology.
"One day in the spring of 1900, a party of Greek sponge divers returning from North Africa was forced by a storm to take shelter in the lee of the small island of Antikythera, which lies between Crete and Kythera. After the storm passed, one of the divers, Elias Stadiatis, put on a weighted suit and an airtight helmet that was connected by an air hose to a compressor on the boat, and went looking for giant clams, with which to make a feast that evening."
continue reading story


US Terrorism Data: Worst Practices for Data Quality and Governance

The Department of Homeland Security is a perfect counter-example of how to deal with data. A way to learn good practices is to do the opposite of what the government does. Here's a list of worst practices to not follow from the DHS.

Worst practice #1: load all the data you can lay your hands on. Terror Database Has Quadrupled In Four Years discusses the "Terrorist Identities Datamart Environment" or TIDE, the source of data for airline, police, border and consulate watch lists. Their policy is to shovel everything possible in there, needed or not, because it might be useful some day. This runs counter to years of data warehouse practices.
"Each day, waves of new information are fed into terrorism-suspect databases... Ballooning from fewer than 100,000 files in 2003 to about 435,000, the growing database threatens to overwhelm the people who manage it."
Aside from this problem, do we really believe there are 435,000 suspected terrorists in the US? That's almost four times larger than our military in Iraq. What are they waiting for? They could take over the US now. Obviously something is wrong with all that data.

Worst practice #2: don't do anything about data quality. "The single biggest worry that I have is long-term quality control," said Russ Travers, in charge of TIDE at the National Counterterrorism Center in McLean." Yet he's doing nothing to QA the data before dumping it into the system. To make things worse, there's no way to remove or correct data once it's loaded.
"The bar for inclusion is low, and once someone is on the list, it is virtually impossible to get off it."

"In 2004 and 2005, mis-identifications accounted for about half of the tens of thousands of times a traveler's name triggered a watch-list hit."

"Sen. Ted Stevens (R-Alaska) said last year that his wife had been delayed repeatedly while airlines queried whether Catherine Stevens was the watch-listed Cat Stevens. The listing referred to the Britain-based pop singer who converted to Islam and changed his name to Yusuf Islam. The reason Islam is not allowed to fly to the United States is secret."
TSA can't tell the difference between a 70's musician and a senator's wife? That's a serious data quality problem. Given the 50% miss rate, it would be better for the police to toss a coin each time they arrest someone instead of consulting the watch lists.

The data management practices go one step further. They actually have a process to include names on the list that are no longer valid. Their idea is that if someone is dead (for example), then a terrorist might use that name since it won't be on the list. Perhaps all the dead people account for the growth of the database. Why not go one step further and put the names of everyone who died this year onto the list?

Worst practice #3: ignore the problems of your users.
"TIDE is a vacuum cleaner for both proven and unproven information, and its managers disclaim responsibility for how other agencies use the data. "What's the alternative?" Travers said."
Multiple agencies are complaining about wasted man-hours due to mis-identification. Airlines are routinely stopping people in airports, like Ted Steven's wife, even though the data is obviously wrong. God forbid that we should have any data management processes in place. Travers should probably get out to a data warehousing conference once in a while. Particularly since he said earlier that data quality is a problem. His current alternative of doing nothing isn't feasible.

But the problems don't stop with TIDE...

Worst practice #4: if the users aren't sold on the concept, build it and they will come.
TIA ? Congress killed the program, so the people involved did it for the government of Singapore instead. Son of TIA: Pentagon Surveillance System Is Reborn in Asia tells how Snowden and company built a system for tracking people in a totalitarian state. Now they they want to sell the system they developed back to the US government. It's like doing an IT-driven data warehouse project, only with Orwellian overtones.

Worst practice #5: ignore the users. DHS has a monumental mess on their hands. Different government departments need different data, just like the finance department has different needs than the marketing department. Yet the DHS systems are being centrally mandated by (mostly) intelligence people with no idea of how other groups like the police or the border patrol need information. Combine this with poorly managed data integration and no governance and you have data being copied and misapplied all over the place.

A number of the articles I linked to came up via Bruce Schneier's crypto-gram mailing list. Always interesting reading, even if you aren't a security professional. Choice items from this month's issue:
" on preventing terrorism" indeed. (Tip #7: When transporting nuclear wastes, always be sure to padlock your truck.)
YOU are big brother: Control and track your car from the 'net Or, as Schneier says, have someone hack into their web site and control it for you.

AMEX is Watching You AMEX has a patent application titled "Method and System for Facilitating a Shopping Experience," that:
"describes a Minority Report style blueprint for monitoring consumers through RFID-enabled objects, like the American Express Blue Card.

According to the patent, RFID readers called "consumer trackers" would be placed in store shelving to pick up "consumer identification signals" emitted by RFID-embedded objects carried by shoppers. These would be used to identify people, track their movements, and observe their behavior."
Breaches of personal data: blaming the myth and punishing the victim Can we stop blaming hackers for theft of information already?
The report states that "60 percent of the incidents involve missing or stolen hardware, insider abuse or theft, administrative error, or accidentally exposing data online."
Given that its data suggests that a significant portion of the blame should go to those who hold the data, the report argues forcefully for legislation that requires they meet minimum data safety standards.
Windows Vista code-signing to keep out evil spyware? I don't think so: VBootkit bypasses Windows Vista's code-signing mechanisms Microsoft spent a lot trying to secure Vista. So far, no dice.

Local Sheriff Suspects Al-Qaeda Or Teens
"This activity matches up with the M.O. of a terrorist casing a potential target," Steinhorst said. "It also matches the M.O. of a group of teens drinking beer and fooling around."
I don't read The Onion enough.

Labels: , , ,

Feel Free to Carry Explosives on Planes, You Won't Get Caught

I just got done with 6 weeks of continuous travel and I wasn't surprised to read that airport security missed 90% of improvised and hidden explosives during security tests, proving our system doesn't work. Yet we still have to take off our shoes, fork over our mouthwash and listen to "the threat alert has been raised to orange - report your neighbors". My favorite quote is this from a former TSA inspector:
"There's very little substance to security," said former Red Team leader Bogdan Dzakovic. "It literally is all window dressing that we're doing. It's big theater on TV and when you go to the airport. It's just security theater.
Dzakovic, who testified that the FAA ordered the Red Team to "not write up our findings," said the TSA is also trying to hide its results.
"The last thing TSA wants to do is look bad in front of congress and in front of the public, so rather than fix the problem, they'd rather just keep them quiet," said Dzakovic.
Sure would be nice the idiots running homeland "security" would stop trying to keep their evil overlords in power by pretending we're under imminent attack and instead focus on doing their jobs. I know that's a lot to ask of this administration.

Labels: , ,

Web Integration Talk in Las Vegas

I'm off to Las Vegas for the Portals, Collaboration and Content Conference to give a talk on data integration for the web. Title of the talk is "Web Data Integration: Methods to Extract and Deliver Data for Portals and Web Applications." I'll be doing a run-through of integration architecture and technology choices to get data from where it is to where you want it. I had to cut back on the part that I find most interesting, getting data off web pages. Scraping, scrAPIs, RSS and the rest of the fun web stuff gets about 10-15 minutes towards the end. I'll probably repurpose the unused information into posts. The slides will make their way online some time in the next couple weeks. I can say one thing - they're pretty relative to some of the other presentations.

Labels: , , ,

Microsoft Patent FUD Appears to be Exactly That

Since I just finished running a day on open source at TDWI, I thought it would be worthwhile to comment on this. It's always hard to tell what's going on when they toss out FUD, but the authors of the study about potential Linux infringements say that Microsoft is misrepresenting their conclusions:
"The point of the study was actually to eliminate the FUD about Linux's alleged legal problems by attaching a quantifiable measure versus the speculation," he said. "And the number we found, to anyone familiar with this issue, is so average as to be boring; almost any piece of software potentially infringes at least that many patents."
This looks like another case of bogus interpretation similar to the Linux TCO study, where they concluded that Linux was ten times more expensive to run than Windows. It's true, when you look at what they compared: an Intel PC running Windows vs. an IBM mainframe running Linux.

Labels: , ,

Open Source BI Case Study Slides Are Posted

We had a good open source session at the May TDWI conference. After I spent some time reviewing history, projects and adoption practices we got to the good part: short case studies, demos and a panel session with representatives from BIRT, JasperSoft, Pentaho and SpagoBI. I particularly enjoyed the panel session where we had some great insights from the panel on what's happening in this space and some of the problems people are facing. I also liked Cindi Howson (the most knowledgeable person on the subject of BI products I know) asking some basic but pointed questions that were on the minds of the mostly non-OSS audience.

Special thanks to Paul Clenahan and Jason Weathersby from Actuate/BIRT, Nick Halsey and Beth Mazur of JasperSoft, Lance Walter and Nicholas Goodman of Pentaho, and Grazia Cazzin and Daniela Tura representing SpagoBI. They did a lot of work and came at their own expense to represent their projects at this conference.

All of their overview and case study slides are available at Third Nature.

Labels: , ,


Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.

Subscribe to XML feed

Bio / About Me

Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from

Search this site or  the web

Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
A Bronx Tale

Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily

War in Context
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
Clickstream Data Warehousing
Technorati Profile

04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013

Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.