Ask a Data Scientist: Becoming a Data Scientist

Welcome back to our series of articles sponsored by Intel – “Ask a Data Scientist.” This is the last of our reader submitted questions of varying levels of technical detail answered by a practicing data scientist – sometimes by me and other times by an Intel data scientist. Think of this new insideAI News feature as a valuable resource for you to get up to speed in this flourishing area of technology. This week’s question is from a reader who asks about becoming a data scientist.

Q: What advice do you have in becoming a data scientist?

A: Great question to wind up this Ask a Data Scientist series! This is the big question many are asking as it becomes clear that data science as a profession is a real winner. To answer this question properly, we need to consider what hiring companies consider requirements for being a data scientist. Here is a short list for an honest assessment:

  • Are you really good at math – undeterred with calculus, differential equations, and linear algebra? Are you also strong in statistics and probability theory?
  • Do you also know R and/or Python for developing machine learning algorithms?
  • Do you have deep domain knowledge of a particular industry?

If you can answer affirmative to the above questions and you possess this collection of skills, you might have what it takes to be a data scientist. These are good times for wearing the data scientist hat. LinkedIn recently ranked “statistical analysis and machine learning” the top 2014 technology skill-set. Glassdoor reports that the average salary for a data scientist is a hearty $118,709. A McKinsey & Company study predicts the U.S. could face a shortage of 140,000 to 190,000 “people with analytical expertise” as well as 1.5 million “managers and analysts with the skills to understand and make decisions based on the analysis of big data.”

Why the Surge in Data Science?

Data science is really nothing new. Its core constituents of computer science, statistics, and mathematics have been around for quite some time! What’s changed is that tech firms like Google, Amazon, Netflix have added data science groups to analyze growing data stores. Further, the use of such technology is now filtering down to non-tech companies like Target, Walmart, AT&T, Intel and Disney.

As a result, companies across a wide swath of industries are looking to hire data scientists. Data science is the means to intelligently consume an ever-increasing amount of structured and unstructured data, arriving at a quicker pace, and having increased complexity. The hope is that data science professionals will uncover new insights that will prompt new revenue streams and/or let a company streamline its business.

Getting a Data Science Education

Although it helps to have a Ph.D. in mathematical statistics, not every data aficionado needs to go through a rigorous graduate degree program to become a practicing data scientist. There are a number of new educational resources available that can kick start your career.

For instance, many traditional educational institutions are gearing up fast to address the demand for data science. We’re now seeing many new masters programs, such as the Master of Science in Predictive Analytics program at Northwestern University. As an option to formal degree programs, we’re also seeing a number of University Extension programs offering new “data science certificates” that serve as professional designations for transitioning from other fields. For example, UC Davis is currently designing a whole new Extension program offering an on-line certificate in data science.

Other learning opportunities are available via massive open online courses (MOOCs) such as Coursera and EdX. For example, The Johns Hopkins Data Science Specialization program offered through Cousera includes 9 free online courses (or for a modest fee, you can get a certificate). There also are a number of “boot camp” style options that use an immersion type of learning experience. In addition, you shouldn’t discount the value of attending one or more Meetup groups in your area that cater to the data science community as they often attract high-quality speakers from leading industry companies, not to mention networking opportunities to connect up with potential employers.

The alternative educational resources are not a replacement for college, but attending a traditional college or university isn’t necessarily a requirement to being a successful data scientist. There’s a personality type that does well in the field – innately curious, having ingenuity, analytical thinking, problem solver, and good data storyteller ability. Plus, being a quick study is an important attribute for this line of work.

How Data Scientists Spend Their Time

An average day for a data scientist can be highly rewarding. Working with senior enterprise staff to define the project goals and locating data sources within the organization (e.g. e-commerce and ERP systems, or even Hadoop) allows you to gain intimate knowledge of how the business operates. Performing data munging (cleaning and transformation), can be drudgery since it can consume a majority of your time spent on a project, but on the positive side the process gives you much insight into the enterprise’s data assets. Doing exploratory data analysis (EDA) with a refined data set lets you dig in deep into what the data is saying about company operations. Many times, there are actionable insights that come from EDA.

But the most exciting part of data science is data modeling, selecting an appropriate algorithm, training the algorithm and evaluating the test error. The intriguing part of data science is that you’re never really done. All of the steps in the data science process typically are repeated to develop further insights and improve predictive accuracy. A data scientist’s work is never done!

Is Data Science the Right Career for You?

If the idea of spending much of your work day searching for and acquiring new data sets, munging data, exploring data, and modeling data with a variety of statistical learning algorithms appeals to you, then you might have the makings of a data scientist. If you’re only motivated by the high salaries though, you may have a rough time of it. Consider that people who fall into this line of work often spend their spare time analyzing data just for the fun of it. You really have to be in love with data to be a successful data scientist. If you don’t love data for its own sake, then you might find it hard to compete with colleagues who do. Regardless, every data scientist should learn to love data, but if after a few years you still don’t feel the passion for data you might consider making lateral move outside the field.

The timing for becoming a data scientist if perfect as the demand is high for this hot field. Most practicing data scientists get contacted by technical recruiters on a regular basis. This high level of employability is a definite plus and being able to use your analytical skills, curiosity, and creativity to solve problems is very rewarding. A lot of time also is spent researching the problem domain because you’ll undoubtedly come across new problems which will require you to have to study the latest materials in that particular field and confer with experts on those areas for insight.

In practice, data science requires a healthy mix of both science and “art.” The science part is clear – mathematics, statistics, computer science, etc. But the art part is equally important – creativity, finesse, storytelling, etc. Both the ingredients combined make for a successful data scientist.

Working in data science can be as sexy and attractive as it is touted these days, it all depends on how you apply yourself. The field is definitely gaining significance, and respect across the enterprise.

Data Scientist: Daniel D. Gutierrez – Managing Editor, insideAI News

Comments

  1. narender singh says

    Hello sir
    I read your article. It is very nice and also it has cleared so many doubts of mine regarding data science.
    I love mathematics(mainly Algebra and Calculus) but I dont know that do I love data or not, because never worked on it.
    Then please guide me that should i start preparing to get into this filed or not?

  2. Hi,

    My name is Sam. I’m a programmer and recently I started looking into machine learning, because I enjoy programming challenges. I quickly ran into a couple of walls. First, while I take some pride in my math skills, I am by no means an expert mathematician, and statistics was never my forte. Second, I realized that I didn’t have anything specific I wanted to use machine learning for. So that went on the backburner for a while.

    Now though, I have a job at a startup. Without going into too much detail, we analyze and compare 3d models. To do this, we gather a lot of data about the models. The thing is, I realized recently that we’re collecting all this data, but only using it in a few very specific ways.

    In my imagination, I started having visions of analyzing the data of individual models to find some truths about those models, and then using that data to find truths about 3d models in general. Then, I realized that I had no idea what to look for. All I have is a lot of data and a belief that more could be learned from it.

    I suppose my questions are these: Where does one begin analyzing a massive amount of data, particularly if one doesn’t know what they’re looking for? And am I putting too much faith in the powers of data analysis and machine learning?

    Thank you for your time,
    Sam Beasley

    • Hello Sam,

      Thanks for posting your question, which is very relevant to this “Ask a Data Scientist” article. First, it is true that a true understanding of machine learning will require you to brush up on your mathematics, specifically Calculus, linear algebra, and probability theory. Check out this online class over at Coursera that addresses this need:

      https://www.coursera.org/specializations/mathematics-machine-learning

      RE: your imagination, this is great! Imagination is a prime requirement for success in data science and machine learning. Many data scientists start out with a new data set and engage “exploratory data analysis.” EDA will point out patterns that can lead you to machine learning solutions for prediction. In addition, unsupervised machine learning is great to further explore your data. So at the end of the day, you need some starter classes in the field to get your feet wet. There are many educational options online. Try Coursera. Best of luck.

      Daniel