Product and Technology

Break Data Silos: Build, Deploy and Serve Models at Scale with Snowflake ML

Digital illustration of Snowflake ML linking to data sources and external outputs.

Despite the best efforts of many ML teams, most models still never make it to production due to disparate tooling, which often leads to fragmented data and ML pipelines and complex infrastructure management. Snowflake has continuously focused on making it easier and faster for customers to bring advanced models into production. In 2024, we launched over 200 AI features, including a full suite of end-to-end ML features in Snowflake ML, our integrated set of capabilities for machine learning model development, inference and operationalization. We are thrilled to continue the momentum this year by announcing that the following capabilities for GPU-powered ML workflows are now generally available for production workloads: 

  • Development: Snowflake Notebooks on Container Runtime — now generally available in AWS and in public preview in Azure — optimizes data loading and distributes model training and hyperparameter tuning over multiple CPUs or GPUs in a fully managed container environment that runs within your Snowflake security perimeter with secure and instant access to your data. Snowflake ML now also supports the ability to generate and use synthetic data, now in public preview. 

  • Inference: Model Serving in Snowpark Container Services, now generally available in both AWS and Azure, offers easy and performant distributed inference with CPUs or GPUs for any model, regardless of where it was trained. 

  • Monitoring: ML Observability, now generally available in all regions, provides built-in tools to monitor and set alerts on quality metrics such as performance and drift for models running or storing inference in Snowflake. 

  • Governance: ML objects and workflows are fully integrated with Snowflake Horizon’s governance capabilities, including data and ML Lineage, now generally available.

From November 2024 to January 2025, over 4,000 customers used Snowflake’s AI capabilities every week. One of these customers is Scene+, a large customer loyalty program in Canada that uses Snowflake ML to streamline and improve its ML workloads.

“Snowflake ML has been a game changer for bringing ML models to production at Scene+. We’ve eliminated all cross-platform data movement, reduced project timelines and lowered cost by using end-to-end capabilities in Snowflake ML including notebooks, feature store, model registry and ML observability. By building and deploying models in Snowflake ML, Scene+ has cut time to production by over 60% and cut costs by over 35% for more than 30 models."

Chris Kuusela
Director of Data Science, Scene+
Image of logos of companies building on Snowflake ML
Figure 1. Select examples of customers building on Snowflake ML.

Development

Snowflake Notebooks on Container Runtime are purpose-built for large-scale ML development without any infrastructure management or configuration with competitive training performance. 

For training using default settings out of the box for Snowflake Notebooks on Container Runtime, our benchmarks show that distributed XGBoost on Snowflake is over 2x faster for tabular data compared to a managed Spark solution and a competing cloud service. For image data, running distributed PyTorch on Snowflake ML also with standard settings resulted in over 10x faster processing for a 50,000-image dataset when compared to the same managed Spark solution. By using Snowflake ML, data scientists and ML engineers spend significantly less time on infrastructure and scalability and can spend more time developing and optimizing their ML models and focusing on rapid business impact.

Snowflake Tabular and Image Classification Performance chart
Figure 2. Benchmark shows that ML training in Snowflake Notebooks on Container Runtime is over 10x faster on a 50,000-image dataset and almost 3x faster on a 30GB tabular data set compared to managed Spark. Lower means faster performance.

In just a few clicks, Container Runtime abstracts away the infrastructure management and speeds up ML training by providing:

  • A simple notebook configuration of selecting a compute pool, enabling data scientists to choose from CPU or GPU pools to meet the needs of their training tasks. All customer accounts are automatically provisioned to have access to default CPU and GPU compute pools that are only in use during an active notebook session and automatically suspended when inactive. See more details in the documentation.

  • A set of CPU- and GPU-specific images, pre-installed with the latest and most popular libraries and frameworks (PyTorch, XGBoost, LightGBM, scikit-learn and many more) supporting ML development, so data scientists can simply spin up a Snowflake Notebook and dive right into their work.

  • Secure access to open source repositories via pip and the ability to bring in any model from hubs such as Hugging Face (see example here).

  • Optimized data ingestion APIs that offer efficient materialization of Snowflake tables as pandas or PyTorch DataFrames. Data is efficiently ingested in parallel and surfaced in the notebook as a DataFrame by parallelizing across multiple CPUs or GPUs. See more details in the documentation.

  • Distributed model training and distributed hyperparameter optimization APIs that extend the familiar open source interfaces provided by XGBoost, LightGBM and PyTorch but distribute the processing across multiple CPUs or GPUs without the need for orchestrating any of the underlying infrastructure (see example here).

Many enterprises are already using Container Runtime to cost-effectively build advanced ML use cases with easy access to GPUs. Customers include CHG Healthcare, Keysight Technologies and Avios.

CHG Healthcare

CHG Healthcare, a healthcare staffing company with over 45 years of industry expertise, uses AI/ML to power its workforce staffing solutions across 700,000 medical practitioners representing 130 medical specialties. CHG builds and productionizes its end-to-end ML models in Snowflake ML. 

“Using GPUs from Snowflake Notebooks on Container Runtime turned out to be the most cost-effective solution for our machine learning needs," said Andrew Christensen, Data Scientist, CHG Healthcare. "We appreciated the ability to take advantage of Snowflake's parallel processing with any open source library in Snowflake ML, offering flexibility and improved efficiency for our workflows.”

Keysight Technologies

Keysight Technologies is a leading provider of electronic design and test solutions. With over $5.5 billion in global revenues and over 33,000 customers in 13 industries, Keysight holds over 3,800 patents for its innovations. Keysight builds scalable sales and forecasting models in Snowflake ML with Container Runtime.

“Having tried Snowflake Notebooks on Container Runtime, we can say the experience has been remarkable," said Krishna Moleyar, Analytics and Automation for IT Global Applications, Keysight Technologies. "The flexible container infrastructure supported by distributed processing on both CPUs and GPUs, optimized data loading and seamless integration with [Snowflake] Model Registry have improved our workflow efficiency.”

Avios

Avios, a leader in travel awards with more than 40 million members and 1,500 partners, uses Snowflake Notebooks on Container Runtime to perform deeper analysis and data analysis tasks with the flexibility the business needs.

“I have really enjoyed using Snowflake Notebooks on Container Runtime for the flexibility and speed they offer," said Olivia Brooker, Data Scientist at Avios. "I am able to run my code without worrying about it timing out or variables being forgotten. Enabling PyPI integration, I also have the added benefit of using a wider range of Python packages, making my analysis and data science tasks more flexible.”

To build models while maintaining the privacy of sensitive data sets or to easily generate new data to enrich training, Snowflake also supports easy and secure synthetic data generation (public preview). This is a powerful capability that allows data scientists to build pipelines and models on data without compromising sensitive attributes and without waiting on lengthy and cumbersome approval processes. The synthetic data set has the same characteristics as the source data set, such as name, number and data type of columns, and the same or fewer number of rows.

Serving models in production

No matter where your model is built, Snowflake ML makes it easy to run production-scale inference and manage the model lifecycle with built-in security and governance. After a model is logged in the Snowflake Model Registry, it can be seamlessly served for distributed inference using Model Serving in Snowpark Container Services (SPCS). With this capability, your inference workloads can take advantage of GPU compute clusters, run large models such as Hugging Face embeddings or other transformer models and use any Python packages from open source or private repositories. You can also deploy models to a REST API endpoint for your applications to invoke your model inference for low-latency applications (online endpoint is in public review). With model registry and inference solutions, users can now easily use any ML model trained within or outside Snowflake, using one of the built-in model types, or using the custom model API to bring in any other type of model, including pre- and post-processing pipelines and partitioned models, to run scalable, distributed inference either in virtual warehouses or in SPCS depending on the workload needs. 

Diagram showing Snowflake ML, Cortex and other sources connecting to Snowflake Model Registry.
Figure 3. Bring any model for scalable inference in Snowflake.

Jahez Group, a Saudi Arabia-based online food delivery company, uses model serving in SPCS to productionize models that optimize logistics and maximize customer satisfaction by ensuring deliveries to customers within 30 minutes of ordering. 

“Model Serving in Snowpark Container Services tremendously helped with our iteration cycle between model versions, enabling rapid updates and reducing deployment delays," said Marwan AlShehri, Senior Data Engineer, Jahez Group. "With the support for auto-scaling capabilities as well, productionizing models is as easy as ever. The incredible support of the Snowflake team helped us achieve sub-one-second online inference for real-time predictions in our Estimated Time of Arrival use case. This has led to improved courier order assignment and optimization of the delivery process, which enabled us to cut costs and increase efficiency.”

Monitoring and alerting

In production, model behavior can change over time due to incomplete understanding of the world in training data, input data drift and data quality issues. Changes in data or the environment can have tremendous impact on the model quality.

Snowflake’s ML Observability provides the ability to monitor models for performance, model score drift and feature value drift when inferences/prediction logs are stored within a Snowflake table, regardless of where the model was trained or deployed. Monitoring results can be queried using Python or SQL APIs and viewed from the UI tied to your model registry, with the ability to easily set alerts on your custom thresholds.

Diagram of an end to end workflow from data lake to transformation to training model to inference to observabillity.
Figure 4. End-to-end ML workflow in Snowflake ML with integrated observability.

Storio Group, a European leader in personalized photo products and gifts delighting more than 11 million customers, productionizes models with integrated MLOps features in Snowflake, including ML observability. 

“At Storio, we've built a usable, scalable and well-governed MLOps platform in just a few months from concept to production in Snowflake ML," said Dennis Verheijden, Senior ML Engineer, Storio Group. "By combining the new ML Observability feature with existing Snowflake features, like Dynamic Tables and ML Lineage, we were able to automate model observability for models trained on our platform. The result is that for each deployed model we have automated dashboards, outlining live model evaluation and comparison, and feature drift over time. This enables data scientists to focus on unlocking value, while offloading the implementation of observability and monitoring to the platform.” 

Underlying governance 

At the backbone of Snowflake ML is full integration with Snowflake Horizon Catalog, the built-in data governance and discovery solution that includes compliance, security, privacy and collaboration capabilities. All data, features and models in Snowflake are governed by role-based access controls (RBAC) across clouds, which allows organizations to manage access at scale, restricting confidential access to appropriate business roles. Snowflake Model Management builds on this strong data governance foundation and provides flexible and secure ways of managing model lifecycle in production.  

To track the full lineage and access history and logs for ML data and artifacts, Snowflake’s data and ML Lineage helps to easily visualize the flow of data from its source to its final destination. The lineage graph fully supports all ML objects created in Snowflake — features, data sets and models — enabling full traceability of ML pipelines, which helps with regulatory compliance and audit, as well as reproducibility and improved robustness of ML workloads.

Screenshot of a data lineage diagram.
Figure 5. Lineage of ML assets from the Snowsight UI enables easy reproducibility, debugging and auditing.

Getting started

With these latest generally available announcements, data scientists and ML engineers can confidently scale out workflows in production in Snowflake ML

The following resources are the easiest way to get started with these new capabilities: 

For additional advanced use cases, check out the following solutions: 

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.