Hadoop: Moving Beyond the Big Data Hype - let’s face it. There is a lot of hype surrounding Big Data and Hadoop, the de facto Big Data technology platform. Companies want to mine and act on massive data sets, or Big Data, to unlock insights that can help them improve operational efficiency, delight customers, and leapfrog their competition. Hadoop has become popular to store massive data sets because it can distribute them across inexpensive commodity servers. Hadoop is fundamentally a file system (HDFS or Hadoop Distributed File System) with a specialized programming model (MapReduce) to process the data in the files. Big Data has not lived up to expectations so far, partly because of limitations of Hadoop as a technology:
• A file system, not a database – Even with MapReduce, Hadoop does not provide easy access to individual records or record sets that are a small subset of the total data. It lacks many of the capabilities that a typical database has to organize and ensure data consistency.
• Designed for batch analysis – To find even one record, MapReduce must scan all records. Thus, it is really designed for a large, batch analysis that aggregates or processes most, if not all, of a massive data set. As such, Hadoop cannot support the interactive, ad-hoc queries required for many applications.
• No updates – As an immutable file system, no data can be updated or deleted in our program. So if you have 100 TB of data, all of it must be rewritten (like through a daily ETL process) if any of it changes. This can be a big issue for many companies. If there is any constant in business, it’s that everything changes, and both the data and analysis need to stay up-to-date to remain relevant.
• Requires Java programs to manage data – our program requires complex, specialized MapReduce programs, typically written in Java, to manage its data. However, Java programmers, let alone ones trained in MapReduce, are rare at most companies. This often becomes a huge bottleneck, as every data request from a data scientist or business analyst must go through a Java programmer.
As a result of these limitations, Hadoop has become the “roach motel” of data for many companies – easy to get data in, hard to get it out.
All information that you supply is protected by our privacy policy. By submitting your information you agree to our Terms of Use.
* All fields required.