Last year, with the presidential campaign in full swing, Gov. Mitt Romney's campaign went looking for donors in an unusual place: a data center just off of Interstate 35W in north Fort Worth. From that data center, owned by Buxton, a Fort Worth firm that specializes in mining consumer data for clients ranging from Wal-Mart to FedEx to TCU, analysts were able to identify untapped potential campaign donors from across the United States. CEO Tom Buxton said they culled those donors from a massive internal database that has information on more than 120 million households, which have up to 75,000 data points on everything from brand loyalties to hobbies to media preferences.
"Probably 17 or 18 years ago, we already had accumulated more consumer data than the Library of Congress has in all of its wealth of data," Buxton said.
Buxton is one of several Tarrant County organizations, both public and private, that is creating value from the massive amounts of digital data that, in our digital world, we generate as part of our daily routines.
Almost everything we do throughout a day creates some sort of a digital trail. Quite a bit of that information is bought and sold on the open market. And increasingly, the data that isn't sold is often being analyzed internally by the company that owns it.
From all that data, companies like Buxton can purchase data that will tell them everything from the magazines you read to the restaurants you eat at to what your favorite soft drink is. Buxton has been buying such data and using it to answer clients" questions for years. In fact, the company is considered a pioneer in mining that data for insights.
"Data has been around forever. It's the use of data that people are trying to figure out," Buxton said. "Data is of no value unless you know how to understand it and make it give you an answer."
And Buxton says his company delivers that value, whether it's telling a retailer where to place a store or identifying donors for a political campaign.
The company's data is so precise that analysts were once able to tell Wal-Mart in which of its thousands of stores to sell a particular purple fishing lure, based on how likely shoppers at those stores were to buy purple things. And they were right, according to Buxton.
Buxton is now being joined by everyone from city governments to health care providers to law enforcement. Sharp drops in the cost of digital storage over the years have made it easier to store data, and increased computer processing power has made it easier and cheaper to process it.
Buxton, for example, said that 20 years ago he'd require more than 1,000 employees to do what his 120 employees do today.
Big Data
To illustrate the size of these massive data sets, one only has to look at Buxton's data warehouse, which currently houses data on about 1 billion customer transactions. Buxton says that data is taking up about 90 terabytes of server space.
That 90 terabytes, or 90,000,000,000,000 bytes of data, can store enough MP3 files to keep you listening for 180 years. Don't even try to print that data. You're easily looking at 20 million pages - double-sided.
Data like Buxton's is the kind that's often referred to as "big data." Data that is so unwieldy and hard to manage that it's in a class of its own. It's way too big to simply drop into your desktop spreadsheet program and draw some sort of business insight.
It's data that holds considerable value. Romney Victory, Inc., the campaign's joint fundraising committee, paid Buxton $276,500 for its services, according to Federal Election Commission records. And Buxton is constantly expanding into new market sectors, leveraging its data to answer questions that are simply unanswerable without big data.
Buxton's data is purchased from outsiders. They draw on more than 250 outside sources and, according to CEO Buxton, buy every piece of quality data they can get their hands on.
Such analysis is only a small part of the big data economy that is emerging. Other firms, such as Fort Worth-based Digital Recognition Network, are investing in data collection.
DRN has partners across the country equipped with car-mounted, license plate-reading cameras. Those cameras are scanning about 50 million plates a month. And the scans, along with the location where the plate was scanned, are being stored in DRN's database.
They've amassed more than 1 billion scans in the last four years, according to CEO Chris Metaxas.
Those scans are used to track down vehicles for repossession and by law enforcement to track down suspects.
He said DRN is moving into a new building soon and is expected grow 25 percent this year. And he doesn't see that growth slowing as the company tries to find ways to use its data in industries outside of banking and auto finance.
Some day, for example, the company may be able to tell a mall what neighborhood its customers come from. Or it may be able to tell a bank that a customer who is applying for a loan may not really live where they say they do.
Inside Track
Buxton and DRN have one thing in common: They're big data businesses at their core. Other Tarrant County organizations, however, have integrated data analysis into their operations.
Jeff Abee, president of the DFW Data Management Association, which has about 500 members, says most companies have been recording data for years and sitting on it. What's new is that they're beginning to realize that there's significant value in that data if it can be analyzed properly.
That was the case at the Baylor Health Care System, where administrators realized that administrative records that had been rigorously coded over the years for billing purposes could be analyzed along with electronic health records and other internal data sets.
That's not atypical among health care providers. In fact, a study by the McKinsey Global Institute in 2011 argued that U.S. healthcare providers could use big data to create more than $300 billion in value each year, two-thirds of which would come from cutting health care spending by about 8 percent.
At Baylor, the hospital's analysts use algorithms derived from IBM's Watson, the supercomputer that famously won Jeopardy in 2011, according to Dr. Donald Kennerly, the vice president of patient safety and chief patient safety officer at Baylor.
Baylor reorganized groups from across the system that stored and dealt with data into a single group to better equip them to make the most of it.
"I used to think my intuition was pretty good, but increasingly I'm humbled by the data suggesting that the things I thought were more important turned out to be relevant but much less important than something else," Kennerly said. "That speaks to the power of data."
Kennerly said that clinicians are more likely to listen to advice that's drawn from hard data, which makes it an effective tool in improving the hospital's practices.
"It's like the mirror," he said. "The mirror helps you know how you look, and it's key having a mirror that's flat and doesn't distort because you want to see what other people are seeing."
For Kennerly, the data is that mirror because it removes any preconceived notions or biases and approaches questions from a purely objective place.
At Tarrant County Public Health, experts are using big data analytics for biosurveillance - processing close to 100,000 data components from emergency rooms and ambulance services a day from 49 Texas counties, according to Dave Heinbaugh, the surveillance systems manager for Tarrant County Public Health.
By processing those millions of emergency room visits a year, health officials can detect and assess disease outbreaks across the region and work to contain them, if necessary. The data is shared among area health departments and epidemiologists. It's also pushed to the Centers for Disease Control and Prevention.
Heinbaugh said the regional outbreak of swine flu in 2009 was an ideal use case for the technology, which allowed health officials to track cases across the region. But he said the tools they've developed would also be ideal for spotting a bioterrorism attack, which is one of the reasons the CDC originally funded the technology.
Private or Public
All of this data sitting on all of those servers has some worried.
Most recently, the Federal Trade Commission in December launched an inquiry into the data collection policies of nine of the country's biggest data brokers, the types of companies that sell to the Buxtons of the world.
Eight members of Congress, including Rep. Joe Barton of Arlington, also began looking into data brokers in July.
There have been calls to allow consumers to examine the data that brokers have on file, but that's often seen as futile because the data is already living in so many places and owned by so many people.
Buxton said he isn't worried about regulation ever cutting into his bottom line. One of his reasons: Politicians won't pass legislation that keeps them from getting elected. And now that campaigns are using the same data-mining techniques as for-profit companies, there's a shared interest in keeping the consumer data flowing.
"Without the money, they can't get elected," Buxton said.
He also points out that Facebook, Twitter and many other everyday Web tools are collecting the same types of data and using it everyday.
That doesn't mean he isn't aware that sometimes what his company does isn't viewed in the most positive light. When the work the company did for the Romney campaign was reported by the national media, Buxton said he was characterized as an "evil genius."
DRN's Metaxas is similarly aware that allegations of privacy violations may be around the corner. But he points out that the law is firmly on DRN's side, despite an attempt in California last summer to regulate the use of license-plate scanners by both private companies and law enforcement.
To Metaxas, what his company does is no different than someone walking around town with a notepad, logging license plates. Technology has just made it more efficient, he says.
That's a common theme throughout the privacy debate awakened by the big data economy. Not only is technology an enabler of the field, it has accelerated it to the point that its sheer efficiency is seen as a threat by some groups. The license-plate scanners are the perfect example of that.
"We meet standards that banks have to meet by maintaining and keeping data," Metaxas said. And for every query run against its database, he said requestors must prove they have a legal right to view the data.
Every organization interviewed for this article outlined an exhaustive list of safeguards and security practices in place to secure the data.
At Baylor, for example, every record has an extensive audit trail. Tarrant County Public Health only ingests anonymized data. And DRN doesn't tie license plate scans to owner information.
That doesn't mean that organizations like the American Civil Liberties Union, the Federal Trade Commission and even, recently, the Obama administration, aren't concerned about the issue.
As for now, though, the data continues to flow. As one analysis recently put it, storing such information about one's everyday activities is now the norm rather than the exception.
A recent IBM study estimated that 2.5 quintillion bytes of data are created every day and that creation is accelerating. It has accelerated so much in past years that the same study indicated that 90 percent of the world's data has been created in the last two years.
That's enough to fill Buxton's data warehouse 25,000 times, which means there's plenty of untapped data still out there. And that's why much of the big data revolution is still ahead.
by Andrew Chavez •Photography by Jason Kindig