Core Platform

Asynchronous Execution Accelerates Snowflake's SQL Stored Procedures

Efficiency is key in the world of data analytics. Snowflake is committed to enhancing SQL capabilities to empower today's data professionals. Today, we're thrilled to share the general availability of asynchronously executing embedded statements in SQL Stored Procedures that can help to further streamline data workflows.

What are SQL Stored Procedures?

SQL Stored Procedures, also known as Snowflake Scripting, enables developers to incorporate procedural code elements directly into SQL syntax. This includes features like variable declaration, conditional logic, loops and error handling, which enhance the capabilities and flexibility of SQL coding within the Snowflake environment. They are often used for tasks, such as data insertion, transformation or analytics, involving large volumes of data and schedules.  

So, what’s new?

A SQL Stored Procedure may have any number of embedded statements, such as SELECT, UPDATE, DELETE, CALL, etc. Previously, all these embedded statements were executed sequentially, but this could result in inefficient resource usage and decreased performance. You now have the ability to execute the embedded statements within your SQL Stored Procedures in parallel. This allows for concurrent execution of those statements, often decreasing the overall runtime.

Asynchronous execution of embedded statements can be specified by prefixing the statements in your SQL stored procedures with the ASYNC keyword. In addition, using the AWAIT keyword ensures that these embedded statements finish execution prior to the main procedure terminating. Below is the syntax for ASYNC and AWAIT.

....

-- define asynchornous execution for embedded statements -- 
let <result_set> resultset := [ ASYNC ] ( <query1> ) ;
[ ASYNC ] ( <query2> ) ;
[ ASYNC ] ( <query3> ) ;


-- awiting on on or more embedded statement -- 
AWAIT [ALL | <result_set>]
....

In the pseudo script that follows, Snowflake will execute the two INSERT statements asynchronously, likely speeding up the time taken to run this script.

CREATE OR REPLACE PROCEDURE test_sp_async_child_jobs_insert()
RETURNS VARCHAR
LANGUAGE SQL
AS
  BEGIN
	ASYNC (INSERT INTO my_table VALUES ('Parallel 1'));
	ASYNC (INSERT INTO my_table VALUES ('Parallel 2'));
	
-- Ensures all insertions finish before returning ---- 
	AWAIT ALL

	RETURN 'Done (Async)';
  END;

We support this feature by using a non-blocking, parallel execution model. When Snowflake encounters a SQL statement that starts with the ASYNC prefix, the cloud service layer queues and schedules a child query. The statement that has the ASYNC prefix is assigned a unique query handler so that it can be AWAIT -ed later on, and the parent stored procedure proceeds without waiting for the embedded statement to finish. This allows for efficient resource utilization and concurrent execution.

Let’s look at an example 

Consider a real-world use case, where we need to ascertain the daily average product prices over the preceding 30 days, and this task is executed monthly. The source table contains raw price data, encompassing product details, price and date. Our objective is to compute the aggregated results (specifically, the average price per product per day) and subsequently store them in the target table. To achieve this, we define a procedure, named process_data_30days, which iterates through the past 30 days of raw data and asynchronously calculates the daily average product price and populates a target table with the aggregated results.

First, let’s create some sample data

CREATE OR REPLACE TABLE target_table (
    PRODUCT_ID VARCHAR,
    DATE_COL   DATE,
    AVG_PRICE  NUMBER
);

CREATE OR REPLACE TABLE source_table (
    PRODUCT_ID VARCHAR,
    DATE_COL   DATE,
    PRICE      NUMBER
);

INSERT INTO source_table 
    (PRODUCT_ID, DATE_COL, PRICE)
VALUES
    ('PROD_A', CURRENT_DATE(),     1000),
    ('PROD_A', CURRENT_DATE(),     1100),
    ('PROD_B', CURRENT_DATE(),     1100),
    ('PROD_A', CURRENT_DATE() - 1, 1200),
    ('PROD_C', CURRENT_DATE() - 1, 1300),
    ('PROD_C', CURRENT_DATE() - 1, 1400),
    ('PROD_B', CURRENT_DATE() - 2, 1400),
    ('PROD_A', CURRENT_DATE() - 2, 1500);

This is how the sample data looks:

PRODUCT_ID DATE_COL PRICE
PROD_A 2025-02-27 1000
PROD_A 2025-02-27 1100
PROD_B 2025-02-27 1100
PROD_A 2025-02-26 1200
PROD_C 2025-02-26 1300
PROD_C 2025-02-26 1400
PROD_B 2025-02-25 1400
PROD_A 2025-02-25 1500

Using ASYNC to enable parallel INSERTION

In the following stored procedure, data for each of the last 30 days is asynchronously processed, thus allowing for embedded INSERT statements to execute in parallel and significantly reducing the total processing time.

CREATE OR REPLACE PROCEDURE process_data_30days()
RETURNS STRING
LANGUAGE SQL
AS
DECLARE 
    total INTEGER DEFAULT 0;
    maximum_lookback INTEGER DEFAULT 30;
BEGIN
    FOR i IN 0 TO maximum_lookback DO 
        LET current_lookback INTEGER := i;

    	 -- fire child jobs to ascynchronoulsy insert into target_table -- 
        ASYNC (
            INSERT INTO target_table(date_col, product_id, avg_price) 
            SELECT date_col, product_id, AVG(price) 
FROM source_table 
WHERE date_col = CURRENT_DATE() - :current_lookback 
GROUP BY date_col, product_id
); 
        total := total + 1;
    END FOR;

    -- Ensures all insertions finish before returning ---
    AWAIT ALL;

    RETURN 'Done, INSERTED: ' || total;
END;
CALL process_data_30days();

To demonstrate the performance gains you might observe, we scaled the sample dataset containing raw price data (source_table) to 10 million records. We observed that the procedure process_data_30days that executed embedded statements asynchronously (i.e., the INSERT statements were prefixed by ASYNC keyword), executed nearly five times faster1 than the stored procedure that didn’t (i.e., the INSERT statements weren’t prefixed by ASYNC keyword). While your results may vary depending on factors like table size, warehouse size and the complexity of the child job, the performance gain you will observe is undeniable.

Comparison of runtime of a SQL Stored Procedure with asynchronously running embedded statements vs. synchronously running embedded statements.
Figure 1. Comparison of runtime of a SQL Stored Procedure with asynchronously running embedded statements vs. synchronously running embedded statements.

Conclusion

The addition of asynchronous execution in Snowflake SQL Stored Procedures represents improvements to both simplify and speed up your data workflows and analytics in Snowflake. We encourage you to try this feature and experience the benefits firsthand. Please refer to Snowflake documentation to learn more.


Running both versions using an XS warehouse.

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.