CrowdFlower, a leading data enrichment platform for data scientists, today released its 2015 Data Scientist Report. Findings revealed that data scientists saw messy, disorganized data as a a major hurdle preventing them from doing what they find most interesting in their jobs: predictive analysis and data mining for behavioral patterns and future trends. The majority of data scientists surveyed also acknowledged the skills shortage within their field.
Salient findings of the report uncover what is and isn’t working in the data science field. These findings include:
- “Data science” is a new term for something that’s been around for a while. While the term “data science” is relatively new, 16 percent of data scientists reported that they have worked in this field for 10 years or more. This suggests that “data science” is a new term that describes something that people have been doing for many years.
- Messy, disorganized data is the number one obstacle holding data scientists back. Two-thirds of respondents say cleaning and organizing data was the least interesting and most time-consuming task, taking time away from more preferred tasks, such as predictive analysis and data mining.
- There are not enough data scientists. Nearly 80 percent of respondents indicate there is a shortage of data scientists, suggesting that an increase in qualified data scientists would enable companies to balance workload and improve overall breadth and depth of their data science capabilities.
- Data scientists want more support from their companies. Nearly 79 percent of respondents are satisfied in their jobs, with almost one-third finding their position “totally awesome,” but noting that their organizations can still do more to better equip them. Data scientists said that organizations can empower data science teams by providing the proper tools to do their job better (cited as a solution by 54.3 percent of survey respondents) and setting clearer goals and objectives on projects (cited by 52.3 percent of respondents).
- Data scientists use a diverse toolkit dominated by open source. The survey found that although Excel is still the most commonly used tool (by 55.6 percent of respondents), data scientists also use at least 47 other tools and languages to do their jobs. Nearly all data scientists (98 percent) use open source software, and tried-and-true open source languages such as R remain major parts of data scientists’ toolbox.
- The most in-demand data science skill set is programming and coding. In addition to the survey that was conducted, CrowdFlower used its own data enrichment platform to collect and analyze 1,024 LinkedIn data scientist job postings and found that the top two skills companies are looking for are programming and coding (seen in 55.3 percent of job postings) and statistical tools (seen in 52.1 percent of job postings).
We know that data scientists are valuable for their companies, but there’s still a disconnect between what they actually do and what they want to do,” said Lukas Biewald, co-founder and CEO of CrowdFlower. “At the end of the day, the time they invest in cleaning data is time that could be better spent doing strategic, creative work like predictive analysis or data mining. If companies can give data scientists some of that data cleaning time back, they’ll have happier teams that can focus on really exciting things.”
Survey Methodology
A total of 153 General Population respondents from CrowdFlower’s online research panel completed the survey. Respondents work for companies of varied sizes and sectors, mostly in the U.S. All respondents have “data scientist” in their job title or job description on LinkedIn.
Sign up for the free insideAI News newsletter.