AI & ML

Meta’s Llama 4 Large Language Models Now Available on Snowflake Cortex AI

At Snowflake, we are committed to providing our customers with industry-leading LLMs. We’re pleased to bring Meta’s latest Llama 4 models to Snowflake Cortex AI! 

Llama 4 models deliver performant inference so customers can build enterprise-grade generative AI applications and deliver personalized experiences. The Llama 4 Maverick and Llama 4 Scout models can be accessed within the secure Snowflake perimeter on Cortex AI. According to Meta, Llama 4 Scout is the best multimodal model in the world in its class and supports an industry-leading context window of up to 10M tokens. According to Meta, these models are trained with large amounts of unlabeled text, image and video data for rich end-user experiences. These models are designed for native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone. This design accommodates a range of use cases and developer needs. This allows developers to build enterprise-grade AI applications. 

Faster and high-quality inference with a Mixture of Experts Architecture (MoE)

Llama 4 are the first models from Meta to use a MoE architecture — a single token activates only a fraction of the total parameters. As a result, MoE architectures are more compute efficient for both model training and inference and deliver higher quality inference compared to other architectures. Within Snowflake, Llama 4 Maverick and Llama 4 Scout can be integrated with gen AI applications.

  • Llama 4 Maverick offers industry-leading performance in image and text understanding with support for 12 languages to bridge language barriers. As a general-purpose LLM, Llama 4 Maverick contains 17 billion active parameters (400 billion total parameters), offering high-quality inference compared to Llama 3.3 70B. The model is well suited for precise image understanding and creative writing. It provides state-of-the-art intelligence with high speed, optimized for best response quality on tone, and refusals.

  • Llama 4 Scout is a smaller general-purpose model with 17 billion active parameters (109 billion total parameters) and supports an industry-leading context window size of 10 million tokens. This opens up a world of possibilities, including multi-document summarization, parsing extensive user activity for personalized tasks, and reasoning over vast codebases. 

Snowflake’s commitment to open source

Meta’s open-source Llama models have empowered enterprises to create unique AI experiences. At Snowflake, we’re leveraging these models within Cortex AI to build tailored solutions that meet evolving business needs. Customers can use Llama models to power AI agents that handle complex tasks and integrate with tools like Cortex Analyst and Cortex Search - unlocking the full value of their data on a single platform.

"As the largest travel guidance platform in the world, TripAdvisor helps over 450 million travelers make the best of their trips each month. Through harnessing Llama models in Snowflake, we’ve been able to provide those travelers with highly relevant, personalized recommendations for their trips, while simultaneously driving more engagement and revenue for our business. Our team is excited to start using Llama 4 models in Cortex AI to push the boundaries of what we can achieve in travel personalization and user experience."

— Rahul Todkar
Head of Data and AI, TripAdvisor.

Our AI Research team has been actively developing cutting-edge technologies on top of these Llama models. For example, Arctic Ulysses is a novel technology we developed that’s optimized for low-latency and high-throughput inference, and is beneficial for long sequence tasks. Furthermore, SwiftKV, another recent innovation built upon Meta’s Llama models and available in Snowflake-Llama-3.3-70B and Snowflake-Llama-3.1-405B, achieves a reduction in the inference costs of Llama LLMs by up to 75% on Cortex AI compared to the baseline Meta Llama models in Cortex AI that are not SwiftKV optimized. This directly translates to tangible cost savings and improved performance for our customers, driving scalable deployment of generative AI initiatives. By optimizing the prefill stage of inference, SwiftKV ensures the efficient processing of lengthy input prompts, a critical requirement for many enterprise applications.

Integrated access via SQL and Python

The Llama 4 series now available in preview on Cortex AI offer easy access through established SQL functions and standard REST API endpoints. Customers can use Llama 4’s advanced inference capabilities into existing applications and data pipelines without complex integration procedures. The new Llama 4 models can be called using a simple COMPLETE function within Cortex AI. 

SELECT SNOWFLAKE.CORTEX.COMPLETE('llama4-maverick',
       [{'role':'user','content':CONCAT('Summarize this customer feedback in bullet points:<feedback>',content,'</feedback>')}]
       ,{'guardrails':true})
FROM my_table;

Integrated access via REST API

To enable services or applications running outside of Snowflake to make low-latency inference calls to Cortex AI, the REST API interface is the way to go. Here is an example of what that looks like:

curl -X POST \
    -H "Authorization: Bearer <jwt>" \
    -H 'Content-Type: application/json' \
    -H 'Accept: application/json, text/event-stream' \
    -d '{
    "model": "llama4-maverick",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in San Francisco?"
      }
    ],
    "max_tokens": 4096,
    "top_p": 1,
    "stream": true
    }' \
https://<account_identifier>.snowflakecomputing.com/api/v2/cortex/inference:complete

The trusted path to advanced inference capabilities

Snowflake is the only cloud data platform with native integration to premier models from both OpenAI and Anthropic, as well as others. By integrating Llama 4 into Snowflake Cortex AI, we are providing our customers with access to leading-edge AI models so they can build intelligent applications and data agents, all within the security, governance and unified environment of Snowflake. This powerful combination will enable enterprises to automate repetitive tasks, gain deeper insights from their data, and deliver more value to their customers.

Stay tuned for more updates on how you can start building the next generation of AI applications with Llama 4 on Snowflake Cortex AI.

Learn more

  • Join us at Summit 2025 to learn more about our latest AI innovations.

  • Get the guide to industry-leading AI and data use cases — download now.

  • Read more about Meta’s latest announcements here.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.