If you go by the latest headlines, the data scientist is the most coveted and scarce enterprise commodity on the employment market. But you could just as easily argue that it’s a trendy title that has not nearly lived up to the hype — and could eventually be automated out of a job.
According to a recent McKinsey Global Survey, 86 percent of executives said their organizations have been “at best only somewhat effective in meeting the primary objective of their data and analytics programs,” and one-quarter said they’ve been “ineffective.” Less than four years after being named the “sexiest job of the 21st century,” it’s time to confront some cold, hard realities. Data scientists are both extremely important and set up to fail unless they adapt to a new model for delivering value.
Recently, I was invited by the UC Berkeley School of Information to host a conversation with students and alums on the real-world applications of data science. During the Q&A, my favorite question was whether I thought software like ours at Alpine Data might eventually replace the data scientist all together. My answer at the time was “no,” more or less: Data science will always be something of an art.
You need to intimately understand the problems that can be solved by data science first, which involves a very human process of interacting with the business. Crafting models will always require the subtle translation of real-world phenomena into mathematical expressions. And there is a human element to interpreting and presenting results that would be difficult to automate.
But it’s still true that, over time, more aspects of a data scientist’s work will be done by software. Feature generation has already become less important as models become more sophisticated. Model parameter selection will become increasingly automated — model deployment entirely so. It seems inevitable that the job description is going to evolve.
Consider how the work of the software engineer has changed fundamentally in the last 20 years. They no longer need to write their own logging module or database access layer or UI widget. And agile methods have brought the “customer” more immediately into the development process. More and more, the job of the engineer is to stitch together higher-level components and collaborate with product managers and UX designers.
It is very easy for data science teams to lose sight of whether, and how, data science is actually solving real-world business problems.
Similarly, the job of the data scientist will be to take advantage of pre-built components in order to solve a greater variety of business problems. Instead of a few six-month analytics projects that focus on model accuracy and algorithmic niceties, business and analytics teams will be able to work on hundreds of projects that emphasize making concrete changes in the way business is done. And as the software available for analytics becomes more powerful, the result should be a continued steady demand for data scientists, playing a different but more prominent role in the day-to-day working of an organization.
But this shift isn’t happening nearly as fast as it should. Why? Time and again, the No. 1 failure point I see is that data scientists are mired in technical details, and not connecting analytics to business action. While it seems obvious to engage with business teams — “the business guys defined the scope; isn’t that enough?” — far too often big data projects get lost in the weeds of the science, statistics and technology. As a result, it is very easy for data science teams to lose sight of whether, and how, data science is actually solving real-world business problems.
In a recent conversation with a large financial organization, I was told that it typically took six months to deploy a new model. Why? Ask your analytics management about their everyday work and you’ll get a depressing list of technical minutiae, clustered around the data platforms (“we can’t connect to the CRM database” or “we’re waiting for the security on the cluster to be upgraded”), the math (“we need our own version of hierarchical models that can handle a million variables” or “we think we can get greater accuracy if we use neural networks”) and the deployment (“we have a separate team of engineers who manually convert the R scripts into SQL”).
There is a satisfaction with the status quo that can often be challenged by simply asking, “What would stop you from deploying a good-enough model tomorrow?”
The reality is that data scientists have more power than they think to affect and lead the change that’s needed. The most successful data science teams I know operate like agile software development teams. The focus is on rapid deployment followed by additional rounds of incremental improvement. They realize that analytics is not an objective addressed by a single project, but rather an ongoing organizational process — with a clear methodology that seeks to continually improve performance — that may serve a number of business objectives over time.
While adopting a process may seem at odds with practicing the “art” of data science, the reality is that there’s been far too much data science art and not enough action. Those days are now over. The artistic qualities will remain, but the work will deliver far more value and will find itself impacting far more people, far more frequently during a given day. So maybe the role of data scientist will no longer be the sexiest job, but rather the most powerful.