4 data science predictions for 2023

Data science has long been the domain of hardcore data professionals who understand the complex frameworks and languages involved, but those professionals are in notoriously short supply.

Fortunately, the landscape of tools and frameworks is constantly evolving, and in 2023 I predict new developments that will alleviate challenges for data teams and businesses alike.

On the one hand, the long-heralded citizen data scientists will finally play a greater role in analytics thanks to sheer necessity and a simplification of the tools and platforms involved. On the other hand, data professionals will start to benefit from some of these simpler tools to accelerate their work and a push for greater standardization will help the industry as a whole.

Data science is perhaps the most exciting area in all of enterprise technology right now, and it’s evolving at a lightning pace.

Here are four predictions for data science in the new year and how businesses can take advantage of them.

Python use will expand beyond data professionals to citizen developers

Business people can’t afford to wait for data scientists to provide the analytics they need, so they’re taking matters into their own hands. Python has become more approachable for non-professionals with the availability of preconfigured cloud runtimes and accessible tools like NumPy for numerical data, Prophet for forecasting and H3 for geospatial data. As a result, in 2023, Python use will expand beyond data professionals and into the hands of business analysts and other less technical users.

Novice Python users should not attempt to build their own runtime environments but should opt for any of the modern cloud platforms that provide built-in security and governance. Anaconda offers a popular Python distribution that helps ensure updates and dependencies are managed properly, and Snowflake installs these packages in our cloud-based Python runtime.

There are numerous online resources for non-professionals to get started with Python, including this comprehensive beginner’s guide from RealPython.

Citizen developers also need a way to share the outputs from their work with colleagues who don’t want to learn Python, and I expect to see more tools for sharing Python results in productized form evolve and improve in 2023. These tools allow Python code to be wrapped in a meaningful user experience for non-IT users such as a marketing team. Just as self-service business intelligence tools went mainstream 15 to 20 years ago, Python is now starting to put even more powerful analytics capabilities into the hands of business users.

AutoML will ease the tension between citizen developers and data scientists

In 2023, the artificial barrier that exists between citizen developers and hardcore data scientists will start to break down. These groups have been at odds historically over the merits of automated machine learning (AutoML), which allows citizen developers to build their own simple AI models.

Data scientists and machine learning engineers traditionally favor hand-coded ML models and have been skeptical of the benefits and efficacy of AutoML.

Like Python, AutoML is quickly maturing and gaining wider acceptance. In 2023, I predict that data professionals will explore AutoML as a faster way to achieve an initial draft of their own ML models, which they can then refine.

I encourage them to also be more supportive of less technical employees using AutoML since this widens the universe of people who can work on data projects. AutoML essentially becomes the car that contains the ML engine — so when citizen developers use AutoML, data scientists and machine learning engineers can peek under the hood to understand and fine-tune the complex ML engine inside. Greater acceptance and adoption of AutoML by data professionals will turn tension between these two groups into synergies where both sides can win.

Data scientists will adopt more pre-built industry- and domain-specific ML models

In 2023, we’ll see an increased number of pre-built machine learning models becoming available to data scientists. These models encapsulate area expertise within an initial ML model, which speeds time-to-value and time-to-market for data professionals and their organizations. For instance, pre-built ML models can reduce the time data scientists have to spend training and fine-tuning models for specific vertical industry use cases.

New sources for these models are emerging all the time. The Hugging Face AI community has done significant work creating a marketplace for ready-to-use ML models, and next year I expect more to be released by Hugging Face and other groups.

Data scientists should embrace these industry- and domain-specific models because they allow them to work on targeted problems using an existing set of well-defined data and avoid spending time becoming subject matter experts in fields that may not normally be core to their organization’s needs.

The data science and ML community will embrace more standardization

The market for data science and machine learning tools is highly fragmented, partly because the pace of innovation is so rapid. In 2023, two forces in particular will drive greater standardization — the traditional Python community wanting more and better ways to productize Python code and the expanding number of enterprises that are becoming important stakeholders in Python.

Both these groups will benefit from a more stable and consistent platform on which they can build. Standardization has already started to happen among the four leading ML frameworks — skikit-learn, XGBoost, PyTorch and TensorFlow — which means innovators will be drawn to these frameworks over less standardized alternatives. In 2023, we will see further standardization in areas like ML operations and feature stores for ML. This will benefit the entire market, similar to how standardization around Linux helped that community.

Data science is perhaps the most exciting area in all of enterprise technology right now, and it’s evolving at a lightning pace.

These predicted developments next year will be crucial to the continued maturation of the industry, bringing advanced analytics to a wider audience of users and helping data professionals become more efficient and effective in their work.