A few weeks ago I posted a piece on San Francisco, CA (News - Alert)-based CrowdFlower whose technology platform takes large, data-intensive projects and divides them into small tasks that are distributed to a multi-million person, on demand global workforce. In that article were CrowdFlower’s views on data science trends for 2015, and topping the list was the emerging importance of Chief Data Scientists.
In fact, while employees whose jobs for years have been to deal with various seemingly mundane but important data cleansing and security challenges, aka “data wrangling,” both the nature and importance of data scientists has changed markedly. The position is now considered a “must have” for enterprises around the world and has a level of cachet surrounding it that was not evident even in just the past several months.
The reasons for this are quite evident. As we all are aware, it is hard to review any tech publication these days without running into the words “Big Data” and “sophisticated analytics.” These are seen as the keys to creating “actionable insights” that will enable organizations to do a host of thing, which range from better profiling all of us (including anticipating our needs before we even know them) to being critical in transforming business processes and practices to exponentially improve an increasingly diverse array of operational challenges.
Where the data scientists, who now must be part mathematical genius and part business wizard, fit is in making all of that data—structured and unstructured, siloed and hopefully increasingly shared—literally and figuratively come to life.
The question that arises is where are we in terms of what current data scientists do, what they like to do and what tools they need to better perform what are becoming invaluable functions. It is the subject of the recently released CrowdFlower 2015 Data Scientist Report. As Lukas Biewald, co-founder and CEO of Crowdflower told TMCnet, “We had no agenda for the report except to provide more information on the critical importance of data scientists to their organizations as well as context as to why. We hope that this will enable them to have more meaningful discussions across their organizations and in particular with C-levels.”
Minding the reality versus needs gaps
Data scientists that fit the most recent job description profile are a rare breed. However, CrowdFlower was able to survey 153 General Population respondents from CrowdFlower's online research panel who all work for companies of varied sizes and sectors, mostly in the U.S. and have "data scientist" in their job title or job description on LinkedIn (News - Alert).
What was fascinating from the survey was how many of the respondents were satisfied with their job (79 percent) including 30.1 percent who said it was “totally awesome.” Plus, the diversity of the roles outlined by respondents also holds some clues about their value and their skills now and going forward. There were:
There is a nice infographic that highlights the survey responses. They include:
"Data science" is a new term for something that's been around for a while. In fact, as noted, while the term "data science" is seems new, 16 percent of data scientists reported that they have worked in this field for 10 years or more.
Messy, disorganized data is the number one obstacle holding data scientists back. Two-thirds of respondents say cleaning and organizing data was the least interesting and most time-consuming task, taking time away from more preferred tasks, such as predictive analysis and data mining.
In regards to the last point, three graphics illustrate a gap between what data scientists do and their wish list of what they would like to do. It starts with their challenges. A as can be seen they believe they are spending too much time cleaning dirty data and doing so with limited tools, human as well as non-human.
Source (News - Alert): CrowdFlower 2015 Data Scientist Report
This compares with their wish list.
And, look at the chart on what they are happiest doing which speaks to the gap point.
There were also a couple of other findings of interest that are more than food for thought. The first is that while data scientists use a diverse toolkit dominated by open source. The survey found that although Excel is still the most commonly used tool (by 55.6 percent of respondents), data scientists also use at least 47 other tools and languages to do their jobs. Nearly all data scientists (98 percent) use open source software, and tried-and-true open source languages such as R remain major parts of data scientists' toolbox.
In addition, and not surprisingly, the most in-demand data science skill set is programming and coding. In addition to the survey results, CrowdFlower used its own data enrichment platform to collect and analyze 1,024 LinkedIn data scientist job postings and found that the top two skills companies are looking for are programming and coding (seen in 55.3 percent of job postings) and statistical tools (seen in 52.1 percent of job postings).
"We know that data scientists are valuable for their companies, but there's still a disconnect between what they actually do and what they want to do," said Biewald. "At the end of the day, the time they invest in cleaning data is time that could be better spent doing strategic, creative work like predictive analysis or data mining. If companies can give data scientists some of that data cleaning time back, they'll have happier teams that can focus on really exciting things."
If a data scientist is not in your present organization there is a very strong likelihood they will be in your future. Obviously providing them the support they need to enable them to better help organizations succeed will be key. This includes obviously providing them the tools to free up time now spent on data cleaning. As Biewald noted, “It will be interesting to see when we do this again next year how much the responses change in terms of closing the gap.”