In this special guest feature, Kevin Safford, Sr. Director of Engineering for Umbel offers a no-nonsense look at how to answer the proverbial question “How can I become a data scientist.” To understand how to become a data scientist, it’s best to get on the same page on what data science is. And if this is your career path, get accustomed to always defining your domain before you begin. Kevin is an entrepreneurial tech and data science leader with a decade of experience in big data and analytics. He holds multiple patents, has started and grown data science teams, and has destroyed 3 microwaves while sciencing.
There might not be any topic a data scientist is asked more about than “how can I get into data science.” I get it. It’s a great career and every week in the last few years there’s a new article about the unmet demand for “the best job in America.” Working on some of the most exciting new technologies like self-driving cars and AI powered chatbots is understandably appealing. And yet, it seems hard to find these jobs. If you’re baffled by this paradox, you’re not alone.
What is data science?
To understand how to become a data scientist, it’s best to get on the same page on what data science is. And if this is your career path, get accustomed to always defining your domain before you begin.
Data science, fundamentally, is about using the scientific method to solve practical problems in a business setting. Things like: “what steps can we take to measurably reduce customer churn?” or “how much of our inventory losses are due to fraud, and how can we reduce that?” The tools involved will evolve and snazzy but vague buzzwords can be confusing and misleading. But data science isn’t defined by deep learning networks, using Bayesian statistics, or however we define ‘AI’ this week. Data science is a practice, not a particular skill set.
To be successful, you’ll need to bring a variety of skills and experience to bear. For any given question, it may be necessary to write code to collect and clean data, run traditional statistical analysis to verify that your data can answer a given question, build predictive machine learning models, visualize the data in creative and expressive ways, and explain the results to whomever needs to know what you’ve discovered. Beyond the technical pieces, you’ll also need a deep understanding of the business and the topic at hand.
And this is why there are so few job postings for beginner data scientists. Doing this kind of research day in, day out requires diverse knowledge and experience.
Break in through practicality
So how can anyone get started? The straight answer? There aren’t many opportunities for entry-level data scientists. An advanced degree in a mathematical discipline, substantial training in statistics, or experience in rigorous experimental practices aren’t a must but you will have to be strong with these fundamentals.
Every year, PhD graduates in statistics, econometrics, hard sciences, and computer science—many focusing specifically on machine learning—discover they have zero interest in academia and enter the workforce. You’ll have to stand out from that cohort when going for your first data science gig.
Many people recommend doing Kaggle competitions. It’s not a bad idea, but doesn’t make a huge splash. In practice, you will almost never be handed a data set and be asked to optimize some decision function to the third decimal place. There is only a weak correlation between being great at Kaggle and being good at data science, professionally.
Instead, demonstrate that you know how to answer practical questions. Find an actual problem that exists in the world and solve it with data. For example: ‘Could your city reduce traffic with different policies?’ or ‘How can you build a twitter bot will get the most replies?’ Tell a story with the data. Your audience is rarely going to know what an F1 score is—though you must, and you must also explain it in a way they can connect with. Show that you understand that you’re solving a business problem with math, don’t just show me you can solve a math problem.
Finally, data science is a social profession. It might not seem that way on the surface, but it’s not a field for people who want to work in isolation, optimizing algorithms. No matter how profound your analysis, it’s wasted if no one knows about it. Get out there and network. Figure out where the data scientists are hanging out at conferences and meetups, present your work, get feedback, and improve on it.
This isn’t a profession conducive to fresh-out-of-school college grads. But with an eye constantly fixed on the story behind the data (and usually, a bit of luck), aspiring data scientists can make a career of solving practical business problems with data.
Sign up for the free insideAI News newsletter.
“There is only a weak correlation between being great at Kaggle and being good at data science, professionally.”
This is one of the most brazenly incorrect things I’ve read in a long time.
I’ve never met a data scientist in real life who’s spent much time on Kaggle.
Actually, I think he is spot on. In the professional practice of data science, you do need a deep understanding of the domain at hand, as well as the specific business needs of the client. Knowing how to extract a few more percentage points of performance in your classifier may be essential for being great at Kaggle, but it is of relative little importance in real-life projects.