In this video from the SpringOne 2012 event, Costin Leau presents: Building Big Data Pipelines with OSS.
Hadoop is not an island. To deliver a complete Big Data solution, a data pipeline needs to be developed that incorporates and orchestrates many diverse technologies. A Hadoop focused data pipeline not only needs to coordinate the running of multiple Hadoop jobs (MapReduce, Hive, Pig or Cascading), but also encompass real-time data acquisition and the analysis of reduced data sets extracted into relational/NoSQL databases or dedicated analytical engines.