- Date published:
- Author:Brian Wood
Truly, the summary and the original articles say it all.
Summary by Pam Baker in FierceBigData, original by Sharon Fisher in IT Knowledge Exchange.
Emphasis in red added by me.
Brian Wood, VP Marketing
Experts calculate storage capacity of an NSA data center
After all the leaks that spell out just how much data the NSA is collecting from, well, everywhere, it’s natural to wonder how much storage it would take to hold all that information. Even if such huge data loads are stored for only a short time, the need for storage would presumably be astronomical.
Sharon Fisher wrote a compelling post at IT Knowledge Exchange on the subject and included the estimations and thoughts of leading experts. It’s a compelling read with lots of information to ponder on the subject. Beyond satisfying general curiosity on the matter, her post will likely help many IT leaders to put into perspective their own storage needs for their big data projects.
Especially since the notion that “size doesn’t matter” was put forth “since the NSA is likely to be using state-of-the-art deduplication and compression technologies to reduce the amount of data stored. The capacity for storing data is not nearly as important as being able to process data and derive valuable information from it.”
That becomes a doubly interesting thought to big data users, but also to the public at large. The NSA recently responded to a Freedom of Information Act request by saying it didn’t have the capability to search its own employees’ email in bulk. That certainly casts doubt on how efficient the NSA will be in other projects, including analytics on its massive collection of data.
What’s the Storage Capacity of an NSA Data Center?
You know how people periodically like to figure out the bandwidth of a station wagon loaded with storage media? Now we have a new one: How much storage will the NSA data center in Utah actually have?
“Much has been written about just how much data that facility might hold, with estimates ranging from ‘yottabytes’ (inWired) to ‘5 zettabytes’ (on NPR), a.k.a. words that you probably can’t pronounce that translate to ‘a lot,’” writes Kashmir Hill in Forbes. “For some sense of scale, you would need just 400 terabytes to hold all of the books ever written in any language.”
However, Hill obtained what she said were actual blueprints for the data center that belied such figures.
“Within those data halls, an area in the middle of the room – marked ‘MR – machine room/data center’ on the blueprints – is the juicy center of the information Tootsie pop, where the digital dirt will reside. It’s surrounded by cooling and power equipment, which take up a goodly part of the floor space, leaving just over 25,000 square feet per building for data storage, or 100,000 square feet for all four buildings, which is the equivalent of a Wal-Mart superstore.”
Hill went to Brewster Kahle, who invented the precursor of the World Wide Web called WAIS, and who went on to found the Internet Archive.
“Kahle estimates that a space of that size could hold 10,000 racks of servers (assuming each rack takes up 10 square feet). ’One of these racks cost about $100,000,’ says Kahle. ‘So we are talking $1 billion in machines.’
Kahle estimates each rack would be capable of storing 1.2 petabytes of data. Kahle says that voice recordings of all the phone calls made in the U.S. in a year would take up about 272 petabytes, or just over 200 of those 10,000 racks.
If Kahle’s estimations and assumptions are correct, the facility could hold up to 12,000 petabytes, or 12 exabytes – which is a lot of information(!) – but is not of the scale previously reported. Previous estimates would allow the data center to easily hold hypothetical 24-hour video and audio recordings of every person in the United States for a full year. “
Other experts, such as Paul Vixie, had even lower numbers. “Assuming larger 13 square feet racks would be used, factoring in space between the racks, and assuming a lower amount of data storage per rack, he came up with an estimate of less than 3 exabytes of data capacity for the facility,” Forbes writes.
Hill isn’t the only one who’s been thinking about the storage capacity of that Utah data center.
“To put this into perspective, a yottabyte would require about a trillion 1TB hard drives and data centers the size of both Rhode Island and Delaware,” writes security consultant Mark Burnett. “Further, a trillion hard drives is more than a thousand times the number of hard drives produced each year. In other words, at current manufacturing rates it would take more than a thousand years to produce that many drives. Not to mention that the price of buying those hard drives would cost up to 80 trillion dollars–greater than the GDP of all countries on Earth.”
Even looking at a zettabyte, or .1 percent of a yottabyte, is unrealistic, Burnett continues. “Let’s assume that if you buy 250 million hard cheap consumer-grade drives you get a discount, so they get them at $150 each which would come to a $37.5 billion for the bare hard drives alone (well, and a billion tiny screws).”
That might sound familiar. You may recall that Backblaze powers its backup service (disclaimer: I use it) with commodity drives in that way. You may also recall that it occasionally has a hell of a time finding enough drives.
As it turns out, Backblaze has also examined the NSA claims — and it did so back in 2009:
“The cost per GB has dropped consistently 4% per month for the last 30 years. Assume the trend continues for the next 5 years, by when the NSA needs their yottabyte of storage. The costs in 2015 then would be:
* $8 trillion for the raw drives
*$80 trillion for a storage system
Well, that’s getting closer – a bit less than today’s global GDP.
Per historical metrics, a drive should hold 10 TB by 2015. The NSA would require:
* 100 billion hard drives
* 2 billion Backblaze storage pods
And of course, they would probably want this data backed up. That might really test our offer of $5 for unlimited storage.”
Backblaze isn’t the only vendor doing back-of-the-envelope calculations (perhaps practicing for an RFP?) NetApp technologist Larry Freeman is as well:
“Assuming that 40% of the 25,000 sq ft floor space in each of the 4 data halls would be used to house storage, 2,500 storage racks could be housed on a single floor (with accommodations for front and rear service areas). Each rack could contain about 450 high capacity 4TB HDDs which would mean that 1,125,000 disk drives could be housed on a single data center floor, with 4.5 Exabytes of raw storage capacity.”
And that’s not even getting into the power consumption aspect. The Utah data center is reportedly slated to use up to 65 megawatts of power, or as much as the entire city of Salt Lake itself. Forbes quoted Kahle’s estimate of $70 million a year for 70 megawatts, while Wired reportedly estimated $40 million a year for 65 megawatts. (And recall that Utah passed a law earlier this year that would enable it to add a new 6% tax to the power used, which could tack on up to $2.4 million annually on to $40 million.)
Burnett’s power calculation is even higher. “250 million hard drives would require 6.25 gigawatts of power (great Scott!). Of course, drives need servers and servers need switches and routers; they’re going to need a dedicated nuclear power plant. They’re going to need some fans too, 4.25 billion btu definitely would be uncomfortable.” Of course, there are other options, he notes. “Another option that would use much less electricity and far less space would be 128 GB microSDXC cards. Except that you would need 9,444,732,965,739,290 of them. At $150 each.”
Freeman’s power calculation is high as well.
“HOWEVER, each storage rack consumes about 5 Kilowatts of power, meaning the storage equipment alone would require 12.5 Megawatts. On the other hand, servers consume much more power per rack. Up to 35 Kilowatts. Assuming an equivalent number of server racks (2,500), servers would eat up 87.5 Megawatts, for a total of 100 Megawatts. Also, cooling this equipment would require another 100 Megawatts of power, making the 65 Megawatt power substation severely underpowered — and so far we’ve only populated a single floor. Think that the NSA can simply replace all those HDDs with Flash SSDs to save power? Think again, an 800GB SSD (3 watts) actually consumes more power per GB than a 4TB HDD (7.8 watts).
Something I haven’t seen anyone address is what buying that much storage would do to the revenues of the lucky hardware vendor — or vendors. How in the world would Seagate, or any of the component vendors, be able to keep a purchase of that size secret?
Moreover, with many hard drive component manufacturers located outside the U.S., and with there already being concern that computer components might have malware baked in, how would the NSA guarantee the integrity of non-U.S. components? (For that matter, with so many NSA whistleblowers wandering around, could it trust the integrity of U.S.-built components?)
Meanwhile, Datacenter Dynamics notes that, in this case, “size doesn’t matter,” particularly since the NSA is likely to be using state-of-the-art deduplication and compression technologies to reduce the amount of data stored. “The capacity for storing data is not nearly as important as being able to process data and derive valuable information from it,” writes Yevgeniy Sverdlik. “Making sense out of data is a lot harder than storing it, so the NSA’s compute capacity, in terms of processor cores, and the analytics methods its data-miners use are much more interesting questions.”
Incidentally, the NSA recently responded to a Freedom of Information Act request by saying it didn’t have the capability to search its own employees’ email in bulk.