Welcome back to our series of articles sponsored by Intel – “Ask a Data Scientist.” Once a week you’ll see reader submitted questions of varying levels of technical detail answered by a practicing data scientist – sometimes by me and other times by an Intel data scientist. Think of this new insideAI News feature as a valuable resource for you to get up to speed in this flourishing area of technology. If you have a data science question you’d like answered, please just enter a comment below, or send an e-mail to me at: daniel@insidehpc.com. This week’s question is from a reader who asks about the importance of “storytelling” in data science.
Q: Is “storytelling” important in data science?
A: This may be the most important question posed in this insideAI News series. Yes, in the final analysis, data science is about storytelling and communicating. A data scientist has to be a team player – as the data scientist serves the role of information feedback loop that helps initiate, iterate, and drive decision making in the enterprise. The data scientist, out of necessity, will be working with a broad array of staff in an organization, convincing thought-leaders of actionable insights and driving product and business decisions.
Our challenge as data scientists is to translate collections of information that track an organization’s performance into guidance for staff so they can make informed decisions. In short, we’re tasked with transforming data into directives. Successful data science parses numerical predictors into an understanding of the organization. We work to humanize the data by turning raw numbers into a story describing performance.
As the cost of collecting and storing data continues to decrease, the volume of raw data an organization has available can be overwhelming. Following this trend, of all the data in existence, 90% was created in the last 2 years. Inundated organizations can lose sight of the difference between what’s statistically significant and what’s important for decision-making. Using big data successfully requires human translation and context whether it’s for the company’s staff or the people the organization is trying to reach. Without a human frame of reference, data will only confuse, and certainly won’t lead to intelligent organizational behavior.
It’s important to remember that data only gives you the what, but humans know the why. The best business decisions come from intuitions and insights informed by data. Using data in this way allows an organization to build institutional knowledge and creativity on top of a solid foundation of data-driven insights.
Below is a summary for how to go from data deluge to organizational change, and how organizations coping with huge amounts of information can manage the process through storytelling:
Use presentation methods allowing everyone to grasp the insights
A data scientist may appreciate a plot provided by a machine learning algorithm, but I’ve learned never to show a residuals plot of a regression analysis from R. In fact, my final presentations typically have very few numbers. It is important to focus on telling a clear story with simple slides and visuals. While you can use regression or classification algorithms to find a list of predictor variables, you must also visualize data to find trends. For example, most people, even data analysts, are much better at discerning underlying demographic trends on maps rather than with regression plots.
By presenting the data visually, in a recognizable form, the entire staff is able to quickly grasp and contribute to the conversation. This method often leads to significant insights. For example, in one of my client projects, someone outside the analytics team noticed that customers in the Pacific Northwest region had a higher churn rate than customers in North East cities. This simple notion became a rallying point in further discussions, and my effort to create an environment for discussion through data-storytelling led to actionable insights.
Use only data that affect the enterprise’s key goals
During one of my client engagements, I recall a meeting with senior management where they stressed a top goal as a SaaS company – increasing sales wins while minimizing the cost of closing sales. So when I did a deep dive in the company’s sales contact management data, I found some interesting aspects while trying to determine how to optimize the goal. I found that the sales department was spending way too much time on sales calls that were past the point of diminishing returns – 5 touch points turned out to be optimal for this company. But many sales people would continue working on a prospect with 10, 15, even 25 touches, a vast majority with no closing success. During EDA, I did a quick boxplot that exposed this fact and as a result the company made some policy changes to address it. I told the “story” of the reality and the company’s sales process was improved.
Keep asking new questions
Data science projects are highly cyclical, so going back to the data to ask and answer new questions is important. As an example from another client engagement, once we learned who the most frequent buyers were, I urged the process to keep going and return to the data to see what marketing campaigns those customers liked best, i.e. what led those customers to buy more. The answer turned out to be campaigns offering free or low cost shipping options. This “story” allowed the company to structure future campaigns to take advantage of what customers favored.
Data scientists want to believe that data has all the answers. But the most important part of our job is qualitative: asking questions, devising directives from the data, and telling its story. It is up to enterprise thought leaders to take our insights and translate them into business decisions that make sense for their priorities.
If you have a question you’d like answered, please just enter a comment below, or send an e-mail to me at: daniel@insidehpc.com.
Data Scientist: Daniel D. Gutierrez – Managing Editor, insideAI News