The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 3

In the final article in this series, you will see how new behavior of table variables in SQL Server 2019 CTP2 called Table Variable Deferred Compilation can break existing performance workarounds.

As mentioned in the earlier posts, table variables had fixed cardinality of 1 for more than 18 years. This property caused some bad execution plans, but on the other side, it was also used (or misused) for fixing cardinality estimation issues for complex queries. In this article, I will demonstrate, how table variables can be used to reduce an extreme overestimation in a simple two-joins-query.

The query uses two large sample tables in a parent-child relation: the parent table A has 100M of rows, child table B has about 335M of rows.

TVDPart3_1

On the id columns in both tables there is a clustered index, and on both pid columns exists a nonclustered index.

Consider the following query that returns parent and child rows for a specific pid value in the parent table:

SELECT * FROM A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC;

Note that I have instantiated the child table twice in order to create a more complex query, and to produce more data volume (since I have no a third large table on my laptop).
Anyway, the above query returns 3.222 rows only. Therefore, we can say, this is a very selective query since it deals with 100M+ tables. However, although appropriate indexes exist, and query is high selective, when you click on Display Estimated Execution Plan, SQL Server suggests a very expensive plan based on Hash Match Join and Merge Join operators, with extremely high estimations:

TVDPart3_2When you click on the enlarge figure of the execution plan, you can see that SQL Server expects more than 51 billion of rows with 16 TB data volume! The reason for this overestimation is out of scope of this article, and will be discussed in one of the future articles; here is important that a workaround is required for overcoming a huge overestimation and this terrible execution plan.

Since I know from business point of view, that the query is selective and cannot return more than 10K rows, I can try to force an execution plan based on Nested Loop Joins.

SELECT * FROM A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC;

With this change, the query is executed very fast, with the appropriate execution plan:

SQL Server Execution Times:
CPU time = 31 ms,  elapsed time = 197 ms.

TVDPart3_3.pngHowever, the LOOP hint does not affect estimations and the optimizer decisions related to them; it just replaces join operators chosen by the optimizer by Nested Loop Joins specified in the hint. SQL Server still expects billions of rows, and therefore the query got more than 2 GB memory grant for sorting data, although only 3.222 rows need to be sorted. The hint helped optimizer to produce a good execution plan (which is great; otherwise this query would take very long and probably will not be finished at all), but high memory grant issue is not solved.

As you might guess, now it’s time for table variables. At the beginning of the Nested Loop based plan, you can see, where the overestimation starts:

TVDPart3_4

In the initial Index Seek operator, you can see that there are only 288 rows in the table A, but SQL Server expects more than 32.000!  You can extract this in a single query:


SELECT * FROM A
WHERE A.pid = 413032;

Here is the execution plan for this very simple query:

TVDPart3_5

This is a significant overestimation – more than 100 times more rows are expected, and this for a single predicate!

The plan of our initial query starts with this overestimation in the outer join branch, and thus it is multiplied with the inner side’s estimation and propagated to the left side of the plan. At the end, we have a huge overestimation  How can we reduce it? Well, we can decompose the big query and extract above query in a table variable, and then join with the other tables. By taking this approach, we would start with 1 instead of 32.078. Let’s ensure that the database is running in the CL 140 and perform the above described change:


ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 140;
GO
DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

Let’s check the execution plan:

TVDPart3_6

It looks very good! Nested Loop Joins and only 2MB of memory grant! From 2GB to 2MB with a little help of a table variable! Note, that we did not use LOOP hints anymore; the table variable’s underestimation has corrected the input estimations so that the optimizer decided to use Nested Loop Join operators! The initial estimation in the outer join branch was very wrong, the one in inner branch too, but then we have introduced an underestimation as corrective factor. You can see, that after the first join Actual and Estimated number of rows are very similar (856 vs. 1269). The final estimation is still far away from the actual number of rows, but this maneuver was enough to enforce the optimal execution plan with appropriate memory grant.

As you might guess, new behavior of table variables in SQL Server 2019 will undo our cardinality correction efforts. Let’s change the compatibility level and run the query again:

ALTER DATABASE DB2019 SET COMPATIBILITY_LEVEL = 150;
GO
DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER JOIN B ON A.id = B.pid
INNER JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

I have executed this query with the Include Live Query Statistics option, and after two minutes, the execution looked like this:

TVDPart3_7

SQL Server estimates that 3% of processing is done; with this tempo in ca. 60 minutes, the query execution will be finished. And it will still return 3.222 rows only.

So, my workaround does not work anymore, and I have to change the code again, to introduce the LOOP hints again in order to be able to run this query at all:

DECLARE @t TABLE(id INT PRIMARY KEY, pid INT, c1 CHAR(100));
INSERT INTO @t
SELECT * FROM A
WHERE A.pid = 413032;
SELECT * FROM @t A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON C.pid = B.pid
ORDER BY B.C1 DESC;

Now, the plan is good again, but the estimations are significantly discrepant from the actual numbers (463M rows), and the memory grant is again greater than 2 GB!

TVDPart3_8

But wait, we can maybe use a hint OPTION (USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')), to enforce the old table variables behavior?! Yes, we can, but the optimizer comes with the same execution plan. If you want the old behavior of table variables, the entire database must be in CL 140. What a bitter disappointment!

Thus, with CL 150, the old workaround is broken. You could reduce the estimation discrepancy by introducing another table variable to store the result of the first join and then join it with the instance of the B table, but the fact is, your solution does not work anymore; you need to find another way to correct the cardinality.

In this example, a table variable does not need to be used for getting an optimal plan. I have used it to demonstrate how new behavior of table variables can break your existing solutions. A better way to enforce an acceptable plan would be to use LOOP hints followed by a hint about maximum memory grants:


SELECT * FROM A
INNER LOOP JOIN B ON A.id = B.pid
INNER LOOP JOIN B C ON B.pid = C.pid
WHERE A.pid = 413032
ORDER BY B.C1 DESC
OPTION (MAX_GRANT_PERCENT = 0.001);

Under the both CLs 140 and 150, you’ll get the same, Nested Loop Joins based execution plan, with a limited, but sufficient 13MB memory grant. This is also a workaround; the real solution is to find the root cause of the cardinality estimation issues and to understand why SQL Server dramatically overestimates. This will be discussed in one of the future articles.

Final Thoughts

Cardinality for table variables has been fixed in SQL Server 2019 CTP2. Accurate number of rows instead of 1 row regardless of number of rows in a table variable should improve future execution plans. However, for existing solutions, you should not expect a lot of improvements. Table variables with estimation of 1 exist since SQL Server 2000, and most of the queries that had a cardinality issue due table variables have been already fixed or rewritten.

Therefore, very few of them will be improved with the Table variable deferred compilation. On the other side, some solutions that used (or misused) this 1-row-estimation could be broken with the new improved estimation. In addition to this, non-fixed estimations make queries with table variables more prone to parameter sniffing issues. That does not mean automatically significant performance regressions, but most probably, you would need to change your code and find new workarounds or solutions for cardinality issues, and monitor execution plans, previously considered as stable.

Another important and conservative thing with Table variable deferred compilation is the fact that the feature is available as a part of the 150 compatibility level, and cannot be turned off, without turning off the entire database to an < 150 compatibility level. I would rather have  this feature with an option that let me soft enable or disable it, maybe with the hint OPTION (USE HINT ('TABLE_VARIABLE_DEFERRED_COMPILATION')).

EDIT:
At the time of writing and publishing this article, I did not know about undocumented options and hints related to Table Variable Deferred Compilation, but in the meantime, thanks to Dmitry Pilugin, I learned how you can play with table variables behavior under CL 150. You can find details about appropriate undocumented options and hints in this article.

Thanks for reading!

Advertisements

Trackbacks

  1. […] disable the feature at the statement level. Let’s execute the query from the previous article The Good, the Bad and the Ugly of Table Variable Deferred Compilation – Part 3, but with the mentioned […]

  2. […] Milos Radivojevic wraps up a series on deferred compilation for table variables by looking at a hack…: […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: