US Terrorism Data: Worst Practices for Data Quality and Governance
The Department of Homeland Security is a perfect counter-example of how to deal with data. A way to learn good practices is to do the opposite of what the government does. Here's a list of worst practices to not follow from the DHS.
Worst practice #1: load all the data you can lay your hands on. Terror Database Has Quadrupled In Four Years discusses the "Terrorist Identities Datamart Environment" or TIDE, the source of data for airline, police, border and consulate watch lists. Their policy is to shovel everything possible in there, needed or not, because it might be useful some day. This runs counter to years of data warehouse practices.
"Each day, waves of new information are fed into terrorism-suspect databases... Ballooning from fewer than 100,000 files in 2003 to about 435,000, the growing database threatens to overwhelm the people who manage it."Aside from this problem, do we really believe there are 435,000 suspected terrorists in the US? That's almost four times larger than our military in Iraq. What are they waiting for? They could take over the US now. Obviously something is wrong with all that data.
Worst practice #2: don't do anything about data quality. "The single biggest worry that I have is long-term quality control," said Russ Travers, in charge of TIDE at the National Counterterrorism Center in McLean." Yet he's doing nothing to QA the data before dumping it into the system. To make things worse, there's no way to remove or correct data once it's loaded.
"The bar for inclusion is low, and once someone is on the list, it is virtually impossible to get off it."TSA can't tell the difference between a 70's musician and a senator's wife? That's a serious data quality problem. Given the 50% miss rate, it would be better for the police to toss a coin each time they arrest someone instead of consulting the watch lists.
"In 2004 and 2005, mis-identifications accounted for about half of the tens of thousands of times a traveler's name triggered a watch-list hit."
"Sen. Ted Stevens (R-Alaska) said last year that his wife had been delayed repeatedly while airlines queried whether Catherine Stevens was the watch-listed Cat Stevens. The listing referred to the Britain-based pop singer who converted to Islam and changed his name to Yusuf Islam. The reason Islam is not allowed to fly to the United States is secret."
The data management practices go one step further. They actually have a process to include names on the list that are no longer valid. Their idea is that if someone is dead (for example), then a terrorist might use that name since it won't be on the list. Perhaps all the dead people account for the growth of the database. Why not go one step further and put the names of everyone who died this year onto the list?
Worst practice #3: ignore the problems of your users.
"TIDE is a vacuum cleaner for both proven and unproven information, and its managers disclaim responsibility for how other agencies use the data. "What's the alternative?" Travers said."Multiple agencies are complaining about wasted man-hours due to mis-identification. Airlines are routinely stopping people in airports, like Ted Steven's wife, even though the data is obviously wrong. God forbid that we should have any data management processes in place. Travers should probably get out to a data warehousing conference once in a while. Particularly since he said earlier that data quality is a problem. His current alternative of doing nothing isn't feasible.
But the problems don't stop with TIDE...
Worst practice #4: if the users aren't sold on the concept, build it and they will come.
Remember TIA ? Congress killed the program, so the people involved did it for the government of Singapore instead. Son of TIA: Pentagon Surveillance System Is Reborn in Asia tells how Snowden and company built a system for tracking people in a totalitarian state. Now they they want to sell the system they developed back to the US government. It's like doing an IT-driven data warehouse project, only with Orwellian overtones.
Worst practice #5: ignore the users. DHS has a monumental mess on their hands. Different government departments need different data, just like the finance department has different needs than the marketing department. Yet the DHS systems are being centrally mandated by (mostly) intelligence people with no idea of how other groups like the police or the border patrol need information. Combine this with poorly managed data integration and no governance and you have data being copied and misapplied all over the place.
A number of the articles I linked to came up via Bruce Schneier's crypto-gram mailing list. Always interesting reading, even if you aren't a security professional. Choice items from this month's issue:
"...tips on preventing terrorism" indeed. (Tip #7: When transporting nuclear wastes, always be sure to padlock your truck.)YOU are big brother: Control and track your car from the 'net Or, as Schneier says, have someone hack into their web site and control it for you.
AMEX is Watching You AMEX has a patent application titled "Method and System for Facilitating a Shopping Experience," that:
"describes a Minority Report style blueprint for monitoring consumers through RFID-enabled objects, like the American Express Blue Card.Breaches of personal data: blaming the myth and punishing the victim Can we stop blaming hackers for theft of information already?
According to the patent, RFID readers called "consumer trackers" would be placed in store shelving to pick up "consumer identification signals" emitted by RFID-embedded objects carried by shoppers. These would be used to identify people, track their movements, and observe their behavior."
The report states that "60 percent of the incidents involve missing or stolen hardware, insider abuse or theft, administrative error, or accidentally exposing data online."Windows Vista code-signing to keep out evil spyware? I don't think so: VBootkit bypasses Windows Vista's code-signing mechanisms Microsoft spent a lot trying to secure Vista. So far, no dice.
Given that its data suggests that a significant portion of the blame should go to those who hold the data, the report argues forcefully for legislation that requires they meet minimum data safety standards.
Local Sheriff Suspects Al-Qaeda Or Teens
"This activity matches up with the M.O. of a terrorist casing a potential target," Steinhorst said. "It also matches the M.O. of a group of teens drinking beer and fooling around."I don't read The Onion enough.
Labels: data quality, data warehouse, DHS, TSA
Posted by Mark Saturday, May 26, 2007 1:52:00 PM |