Big Data + Hadoop

ENTERPRISES

Are generating, collecting and storing data at an astounding rate, in large part because they’re collecting data from more sources than ever. Newer sources include social media streams; sensors of various types; and even call centers, which generate a seem-ingly never-ending stream of audio files. And, of course, enterprises still have all the traditional sorts of data they have long produced, of both a historical and transactional nature. It all adds up to big data.

The challenge is for enterprises to turn that big data into valuable information by mining it to find useful nuggets or analyzing it in new ways to answer questions and make predictions that previously were simply not possible. More and more, enterprises are finding that they can indeed extract value from big data by using a tool that makes the chore feasible: Hadoop, a platform for processing, storing and analyzing large volumes of unstructured data. Large volumes of data are exactly what organizations are dealing with. Oracle last year estimated that data was growing at a 40 percent compound annual rate and would reach 45 zettabytes (ZB) by 20201. One ZB is about one thousand exabytes, or a billion terabytes.

When you consider the sources of all that data, it’s not hard to see how it can quickly add up. The sources include feeds from social media sites such Facebook, which can be heavy on photographs; Twitter; and even YouTube videos. Increasingly, companies are also storing more audio from their call centers, in hopes of mining it for tidbits that can help them improve customer service, sales and operational efficiency. There’s also video from surveillance cameras. All of these sources produce unstructured data—data that doesn’t come neatly wrapped in rows and columns like that in a relational database. That makes the data far more difficult to analyze in meaningful ways. In fact, it was previously not economically feasible to collect and store all this unstructured data, never mind analyze it.

Hadoop is changing that equation. It is an open source framework that makes it possible to store huge volumes of data on many commodity computers, so there’s no need for expensive, massive data stores. What’s more, by dividing up processing chores into smaller chunks that can run simultaneously, Hadoop supports dramatically increased data analytics speeds (see “Hadoop FAQ”).

Big Data + Hadoop

Consider, for example, how an insurance company is using Hadoop to improve service in its telephone call center. The company takes all the audio files from its call center and uses Hadoop to analyze them in search of ways to route calls more efficiently so callers will more quickly get to an agent who can address their issues. The company is also analyzing social media sites in an effort to improve service. If there’s a storm in the Northeast, for example, it will search to find out if any customers have posted about damage to their home or automobile. If the company finds a hit, it will proactively reach out to help those customers. Finding such proverbial needles in haystacks was simply not feasible or cost- effective prior to Hadoop. Companies such as credit card providers are also finding Hadoop valuable in offering promotions to customers going about their daily routine. For example, a credit card company might see transactions coming in from stores where a customer is shopping at midday on a Saturday. The company could push out a promotion to the customer’s mobile device from nearby restaurants that accept the credit card. Such highly targeted, location-based, time-sensitive offers are relatively likely to meet with success—and may not even be feasible to deliver without Hadoop.