In this special guest feature, Tom Phelan of BlueData shares his thoughts about extending the agile DevOps model and private cloud infrastructures to Big Data. Tom Phelan is Chief Architect and co-founder of BlueData, a company that makes Big Data accessible to enterprises of all sizes by helping to democratize data by taking the complexity out of data infrastructure and pioneering Big Data private clouds. Tom has spent the last 25 years as a senior architect, developer and team lead in the computer software industry in the Silicon Valley.
Today’s enterprise IT teams, and the developers in their ranks, like to fail fast. They don’t have the luxury of spending months developing and testing a new application only to find out it does not or no longer will meet the needs of the business. That’s something they need to find out ASAP, and it requires agility. Big Data applications are no exception. To be successful, organizations need to extend the agile, DevOps model to Big Data and allow data scientists and the developers they work with to get to the answers they need as quickly as possible. A private cloud infrastructure can be the answer for many organizations.
An agile environment is one that’s adaptive and promotes evolutionary development and continuous improvement. It fosters flexibility and champions fast failures. Perhaps most importantly, it helps software development teams build and deliver optimal solutions as rapidly as possible. That’s because in today’s competitive market chock-full of tech-savvy customers used to new apps and app updates every day and copious amounts of data with which to work, IT teams can no longer respond to IT requests with months-long development cycles. It doesn’t matter if the request is from a product manager looking to map the next rev’s upgrade or a data scientist asking for a new analytics model.
Agile development is closely related to DevOps, which is the evolving integration between the developers who build and test IT capability and the organizations that are responsible for deploying and maintaining IT operations. A relatively new concept, DevOps can help any organization dramatically speeding up application and delivery cycles. It focuses on the communication and collaboration between developers and IT operations.
Typically, to build an enterprise-grade application, multiple teams would work independently on the components of the application. When all the individual building and testing is done, the pieces are combined and tested together. There’d be issues (usually) and the pieces would go back to the teams for rework, more testing, etc., and this can happen multiple times. Finally the application is handed off to the operations team to stage and deploy it into product, a process that can take months.
That’s simply not tenable anymore, especially with IT’s mandate for faster delivery, faster response, and strategic outcomes, all as efficiently as possible.
Now look at how organizations doing Big Data on-premises typically work. The IT department makes a guess about the capacity, in other words the volume and velocity, of Big Data that their organization will require in the coming year or so. They will requisition, purchase, deploy and configure the physical servers. They will select and install a set of Big Data analytics, maybe with input from the data scientists. They will install the analytics and the software infrastructure required to run them. They will then turn the system loose to the data scientists. The system will either lie there underutilized, meaning that company money is being wasted because IT assumed there would be more demand of the system than there really was, or more likely, the data scientists will saturate the existing hardware and will immediately demand the latest analytics packages. This requires the IT department to buy more hardware, software, etc. This cycle continues ad infinitum.
Some enterprises are considering moving to public clouds to get the agility they need. However, public clouds come with their own issues, including security and compliance concerns. The ideal would be the ability to get the agility of the cloud, with its consumer-like consumption model, with the control and security of an on-premises deployment.
A private cloud is an ideal infrastructure for supporting agile Big Data environments and for extending a DevOps model to Big Data. After all, a cloud is scalable and elastic and offers shared, virtualized and physical infrastructure. With a private cloud, IT organizations can offer data scientists and developers individual clusters rapidly so they can do the development, tear down the clusters and then build new ones to continue to development. Collaboration—a fundamental tenet of DevOps—can be readily enabled because cloud services are easily shared and can scale up or down as needed.
Private clouds designed from the ground up for Big Data applications, such as the engineering-heavy and complex Hadoop, ideally work with all types of data, structured and unstructured, and use virtualization techniques to eliminate data silos by centralizing that data. By running Big Data private clouds, enterprises can keep data where it has always resided while creating a “centralized” experience for all that data so it can be easily managed, governed and accessed. The underlying software-defined infrastructure creates a virtualization platform for Big Data that separates analytics processing from the data storage.
With such a scalable and flexible infrastructure, software developers can deliver sophisticated applications such Hadoop distributions or Spark distributions that can be spun-up very quickly and that can be shared across different business units. They can fail fast, but more importantly they can deliver the value their business expects and needs.
Sign up for the free insideAI News newsletter.