Interview: Sean Suchter, CEO and Co-founder of Pepperdata

Sean SuchterIn observance of Hadoop’s 10 year anniversary, I recently caught up with Sean Suchter, CEO and cofounder of Pepperdata, to reflect back on the past decade of Hadoop as well as take a look to the future. Sean has been involved with Hadoop since its earliest days. His Search Technology Team at Yahoo was the first alpha user of Hadoop — and ultimately, its first production user. In 2006, Sean helped finance the first Hadoop cluster ever used for production business applications, which initially ran on 10 nodes. Within a year, that cluster grew to hundreds of nodes and was used to analyze web page content and links — jobs necessary for ranking Yahoo! search results. From the beginning, Sean noticed that contention problems inevitably arise in multi-tenant, multi-workload deployments. These early insights on performance issues at scale led Sean to co-found Pepperdata.

Daniel – Managing Editor, insideAI News

insideAI News: At this 10 year milestone for the Hadoop platform, can you reflect back to your early days at Yahoo when you were the first production user, and whether you had any idea Hadoop would gain such notoriety a decade later?

Sean Suchter: Well, Hadoop was certainly Doug and Eric’s plan. At the time, everyone was impressed (and worried) about Google’s impressive systems papers. Google had an infrastructure that allowed a large number of people to work on even larger amounts of data, without doing complex systems engineering every time (today these people are called data scientists, but that term had not yet been born). Yahoo didn’t have the same massive team of engineers that Google did, so we figured we couldn’t catch up by doing it all ourselves. The early reasoning was that we had to have an open source package with multiple companies to have a chance of catching up to Google, and of reaping the benefits that could come from large groups of people working on massive data sets.

From my perspective, I wanted to jump on the bandwagon before anyone else because it meant for many months my team was the only deployment target. Because of this we had the advantage of extra attention and support from the Hadoop team.

insideAI News: History notes that you inadvertently crashed Yahoo Search after putting Hadoop into production. What did this teach everyone?

Sean Suchter: With the technology that existed at the time, the only choice was to do hard resource isolation, which was kind of silly. The crash was caused by Hadoop using 100% of the available network bandwidth, even though Yahoo Search only needed around 5% to function properly. The crash never would have happened if Hadoop was using only 95%, but that technology (and that awareness of resource requirements) didn’t exist yet. Afterwards, no one got fired thankfully, but I remember spending an hour or two in a 40 person meeting led by Jeff Weiner, then EVP of Search, who’s now the CEO of LinkedIn. It was an excellent learning experience on what not to do.

insideAI News: From your unique perspective, can you say a few words about how far the technology has come since its early days?

Sean Suchter: In the very early days of Hadoop we were happy if a 10 node cluster stayed up throughout the day. The advances over the past decade have made it dramatically better and you can be sure Hadoop is reliable and stays up — but performance is still a challenge.

insideAI News: Tell us why you founded Pepperdata after seeing some of the hurdles of large scale production use.

Sean Suchter: As one of the earliest adopters of the technology, I saw great promise in Hadoop’s scalable approach to data processing. We also saw the ability to drive tangible bottom line results with distributed computing – it was behind a significant chunk of revenue for us at Yahoo. At the same time, I struggled with Hadoop’s key limitation: the lack of a distributed cluster supervisor to control hardware use in real time. My cofounder Chad Carson and I were both passionate about helping companies use and have access to the same kinds of data driven growth we experienced at Yahoo. Pepperdata was born out of that vision.

insideAI News: If you can look into your crystal ball, what do you see for the future of Hadoop?

Sean Suchter: I believe the future of Hadoop is bright. We are seeing more and more customers not only adopt the technology but significantly increase the workloads and use cases that Hadoop can solve for. I hope that in the future, it will be as easy to get off the shelf distributed applications as it is to install an application on your laptop. No technology is perfect, but the distros have made great strides over the past decade to provide a much more stable, secure and reliable platform. To truly drive enterprise adoption Hadoop and distributed computing in general must be able to solve the performance problems that develop from the advanced production use cases we are now seeing. To be enterprise-grade, companies need to be able to set and manage SLAs and provide consistent QoS to the business.

Sign up for the free insideAI News newsletter.