How YARN Opens Doors to Easier Programming Tools for Hadoop 2.0 Users

White Papers > How YARN Opens Doors to Easier Programming Tools for Hadoop 2.0 Users

The emergence of YARN for the Hadoop 2.0 platform has opened the door to new tools and applications that promise to allow more companies to reap the benefits of big data in ways never before possible with outcomes possibly never imagined. By separating the problem of cluster resource management from the data processing function, YARN offers a world beyond MapReduce: less encumbered by complex programming protocols, faster, and at a lower cost.  Yet while many Hadoop applications have migrated and other migrations are in process, most of these applications still cling to the original Hadoop paradigm: MapReduce. That’s like putting lipstick on a pig (no pun intended). These programs basically dress up the same functionality without taking advantage of the new capabilities of YARN. Why is YARN important? Some background may help.  Hadoop was first developed in 2005 by Doug Cutting and Mike Carafella with the help and blessing of Yahoo, which to this day runs the largest Hadoop cluster in the world. Hadoop was open-sourced under the auspices of Apache, and major contributors include Hortonworks, Yahoo, Cloudera, and many others. Throughout Hadoop’s development, until October 2013 with the release of Hadoop 2.0, MapReduce was the computational framework. If you wanted to crunch data under Hadoop, you wrote or generated MapReduce code. Hadoop 2.0 changed that.  Under Hadoop 2.0, MapReduce is but one instance of a YARN application, where YARN has taken center stage as
the “operating system” of Hadoop. Because YARN allows any application to run on equal footing with MapReduce, it opened the floodgates for a new generation of software How YARN Opens Doors to Easier Programming Tools for Hadoop 2.0 Users by John Lilley | February 19, 2014 applications with these kinds of features: More programming models. Because YARN supports any application that can divide itself into parallel tasks, they are no longer shoehorned into the palette of “mappers,” “combiners,” and “reducers.” This in turn supports complex data-flow
applications like ETL and ELT, and iterative programs like massively-parallel machine learning and modeling.  Integration of native libraries.  Because YARN has robust support for any executable –not limited to MapReduce, and not even limited to Java – application vendors with a large mature code base have a clear path to Hadoop integration.  Support for large reference data.
YARN automatically “localizes” and caches large reference datasets, making them available to all nodes for “data local” processing. This supports legacy functions like address standardization, which require large reference data sets that cannot be accessed from the Hadoop Distributed File System (HDFS) by the legacy libraries.  Despite these innovations, most Hadoop software
developers are stuck in the Hadoop 1.0 mindset. They’ve sacrificed a “bigger leap” to broader availability and greater usability of Hadoop 2.0’s powerful resources in exchange for early market entry. The effect for users: Hadoop still has a tall fence around it. Most Hadoop applications still suffer from one or more of these deficiencies:

    Contact Info

    Work Email*
    First Name*
    Last Name*
    Zip/Postal Code*

    Company Info

    Company Size*
    Job Role*

    All information that you supply is protected by our privacy policy. By submitting your information you agree to our Terms of Use.
    * All fields required.