style="margin-top:70px;" Clickstream

Clickstream

     
US Terrorism Data: Worst Practices for Data Quality and Governance

The Department of Homeland Security is a perfect counter-example of how to deal with data. A way to learn good practices is to do the opposite of what the government does. Here's a list of worst practices to not follow from the DHS.

Worst practice #1: load all the data you can lay your hands on. Terror Database Has Quadrupled In Four Years discusses the "Terrorist Identities Datamart Environment" or TIDE, the source of data for airline, police, border and consulate watch lists. Their policy is to shovel everything possible in there, needed or not, because it might be useful some day. This runs counter to years of data warehouse practices.
"Each day, waves of new information are fed into terrorism-suspect databases... Ballooning from fewer than 100,000 files in 2003 to about 435,000, the growing database threatens to overwhelm the people who manage it."
Aside from this problem, do we really believe there are 435,000 suspected terrorists in the US? That's almost four times larger than our military in Iraq. What are they waiting for? They could take over the US now. Obviously something is wrong with all that data.

Worst practice #2: don't do anything about data quality. "The single biggest worry that I have is long-term quality control," said Russ Travers, in charge of TIDE at the National Counterterrorism Center in McLean." Yet he's doing nothing to QA the data before dumping it into the system. To make things worse, there's no way to remove or correct data once it's loaded.
"The bar for inclusion is low, and once someone is on the list, it is virtually impossible to get off it."

"In 2004 and 2005, mis-identifications accounted for about half of the tens of thousands of times a traveler's name triggered a watch-list hit."

"Sen. Ted Stevens (R-Alaska) said last year that his wife had been delayed repeatedly while airlines queried whether Catherine Stevens was the watch-listed Cat Stevens. The listing referred to the Britain-based pop singer who converted to Islam and changed his name to Yusuf Islam. The reason Islam is not allowed to fly to the United States is secret."
TSA can't tell the difference between a 70's musician and a senator's wife? That's a serious data quality problem. Given the 50% miss rate, it would be better for the police to toss a coin each time they arrest someone instead of consulting the watch lists.

The data management practices go one step further. They actually have a process to include names on the list that are no longer valid. Their idea is that if someone is dead (for example), then a terrorist might use that name since it won't be on the list. Perhaps all the dead people account for the growth of the database. Why not go one step further and put the names of everyone who died this year onto the list?

Worst practice #3: ignore the problems of your users.
"TIDE is a vacuum cleaner for both proven and unproven information, and its managers disclaim responsibility for how other agencies use the data. "What's the alternative?" Travers said."
Multiple agencies are complaining about wasted man-hours due to mis-identification. Airlines are routinely stopping people in airports, like Ted Steven's wife, even though the data is obviously wrong. God forbid that we should have any data management processes in place. Travers should probably get out to a data warehousing conference once in a while. Particularly since he said earlier that data quality is a problem. His current alternative of doing nothing isn't feasible.

But the problems don't stop with TIDE...

Worst practice #4: if the users aren't sold on the concept, build it and they will come.
Remember
TIA ? Congress killed the program, so the people involved did it for the government of Singapore instead. Son of TIA: Pentagon Surveillance System Is Reborn in Asia tells how Snowden and company built a system for tracking people in a totalitarian state. Now they they want to sell the system they developed back to the US government. It's like doing an IT-driven data warehouse project, only with Orwellian overtones.

Worst practice #5: ignore the users. DHS has a monumental mess on their hands. Different government departments need different data, just like the finance department has different needs than the marketing department. Yet the DHS systems are being centrally mandated by (mostly) intelligence people with no idea of how other groups like the police or the border patrol need information. Combine this with poorly managed data integration and no governance and you have data being copied and misapplied all over the place.


A number of the articles I linked to came up via Bruce Schneier's crypto-gram mailing list. Always interesting reading, even if you aren't a security professional. Choice items from this month's issue:
"...tips on preventing terrorism" indeed. (Tip #7: When transporting nuclear wastes, always be sure to padlock your truck.)
YOU are big brother: Control and track your car from the 'net Or, as Schneier says, have someone hack into their web site and control it for you.

AMEX is Watching You AMEX has a patent application titled "Method and System for Facilitating a Shopping Experience," that:
"describes a Minority Report style blueprint for monitoring consumers through RFID-enabled objects, like the American Express Blue Card.

According to the patent, RFID readers called "consumer trackers" would be placed in store shelving to pick up "consumer identification signals" emitted by RFID-embedded objects carried by shoppers. These would be used to identify people, track their movements, and observe their behavior."
Breaches of personal data: blaming the myth and punishing the victim Can we stop blaming hackers for theft of information already?
The report states that "60 percent of the incidents involve missing or stolen hardware, insider abuse or theft, administrative error, or accidentally exposing data online."
...
Given that its data suggests that a significant portion of the blame should go to those who hold the data, the report argues forcefully for legislation that requires they meet minimum data safety standards.
Windows Vista code-signing to keep out evil spyware? I don't think so: VBootkit bypasses Windows Vista's code-signing mechanisms Microsoft spent a lot trying to secure Vista. So far, no dice.

Local Sheriff Suspects Al-Qaeda Or Teens
"This activity matches up with the M.O. of a terrorist casing a potential target," Steinhorst said. "It also matches the M.O. of a group of teens drinking beer and fooling around."
I don't read The Onion enough.

Labels: , , ,


Comments:
A great post and well researched. I had started writing a similar post about the same system. Senator Ted Kennedy got stopped several times in 2004 and his staff had difficulty getting him removed from the list.

The Cat Stevens incident was ludicrous, he was on a flight to Washington in 2004 and they diverted the plane to Maine, took him off the flight, questioned him and sent him back to Britain. How did someone on the no fly list get on a Washington flight? Why is he on the list? A bunch of Cuban musicians are also on the list included a Grammy nominated Buena Vista Social Club musician. Because they compiled the list from so many disparate sources there were radically different levels of threat from people on that list.

It gets worse in June when Canada start their own no fly list.

It looks like the Secure Flight system will finally be implemented in 2008 where the feds will screen passengers against the watch list instead of airlines.
 
Thanks for the note.

Secure Flight in 2008 is probably when we can expect a total meltdown with nobody boarding, given the track record so far. I didn't know Canada was following the path the US has taken. Hopefully they have better people on the development than DHS.
 
Post a Comment

Home

Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.


Subscribe to XML feed


Bio / About Me


Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from Amazon.com

Search this site or  the web



Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
Quicksilver
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
Hero
A Bronx Tale

Blogroll
Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily
Futurismic
Fafblog
Kottke.org

Miscellany
War in Context
Salon.com
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
WorldChanging
/.
Clickstream Data Warehousing
Technorati Profile

Archives
04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013


Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.