New Cardinality Estimator Part 8 – Bug No. 2

In this article I will describe another query pattern where new CE in SQL Server 2014 brings significant regression. And, similar to the case in this post I would interpret the behavior of new cardinality estimator in this case as a bug, too.

Let’s recreate our usual sample table and populate it this time with 10M rows. Use the following code to create and populate the test table (the help function dbo.GetNums is developed by Itzik Ban-Gan and you can download it here).

IF OBJECT_ID('dbo.Orders', 'U') IS NOT NULL DROP TABLE dbo.Orders;
CREATE TABLE dbo.Orders(
id INT IDENTITY(1,1) NOT NULL,
custid INT NOT NULL,
orderdate DATETIME NOT NULL,
amount MONEY NOT NULL,
note CHAR(100) NULL,
CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (id ASC)
)
GO
-- Fill the table with 1M rows
DECLARE @date_from DATETIME = '19980101';
DECLARE @date_to DATETIME = '20131231';
DECLARE @number_of_rows INT = 10000000;
INSERT INTO dbo.Orders(custid,orderdate,amount, note)
SELECT 1 + ABS(CHECKSUM(NEWID())) % 10000 AS custid,
(
SELECT(@date_from +(ABS(CAST(CAST( NEWID() AS BINARY(8) )AS INT))%CAST((@date_to - @date_from)AS INT)))) orderdate,
50 + ABS(CHECKSUM(NEWID())) % 1000 AS amount,
'note' AS note
FROM dbo.GetNums(@number_of_rows)
GO
CREATE INDEX IX_Cust ON dbo.Orders(custid);
GO
CREATE INDEX IX_Date ON dbo.Orders(orderdate);
GO

Please be aware that above script will run few minutes and will create a table with the size about 1.5 GB.

8_1

Now we want to get some data from the table. We’ll provide a small list of Customer IDs and dates and want to get all orders for those customers with the order date after the given date from the input list. Let’s assume that we have the following list:
8_2

We want to get all orders for customer with ID 897 with the order date after 1st October 2013 and all orders for customer with ID 2112 with the order date after 1st September 2013. Let’s write a query:

DECLARE @T AS TABLE(
custid INT NOT NULL,
orderdate DATETIME NOT NULL
);
INSERT INTO @T(custid,orderdate) VALUES(897, '20131001'), (2112,'20130901');
SELECT o.*
FROM @T t
INNER JOIN dbo.Orders o ON o.custid = t.custid AND o.orderdate > t.orderdate;

 

Here are the execution plan and execution details for the above query generated by SQL Server 2012 database engine:

8_3

8_4

It looks good. Whenever you join a large table with a table variable you want to see Nested Loop Join in the execution plan, logical reads and exertion time are OK, too.

Let’s see what happens when we execute the same query against SQL Server 2014 with the database compatibility level 120:

8_5

8_6

You don’t want to see Hash Join with large tables for a query which returns a few rows. The execution took more than 2 seconds with 167K logical reads and all just to return 25 rows!

Similar to the problem discussed in this post, it seems that new CE has a bug when it has to estimate rows for JOIN with multiple predicates when at least one is non-equal or at least one column is null-able. The following queries will end up with Hash (or Merge) Join regardless of table size.

SELECT * FROM A
INNER JOIN B ON A.id = B.id AND A.c1 > B.c1;
SELECT * FROM A
INNER JOIN B ON A.id = B.id AND (A.c1 = B.c1 OR A.c1 IS NULL);

Why I don’t use a term “bug” rather than “regression” in this case? Let’s remove the second part of requirement and ask for all orders for customers with IDs from a given list. This is a superset of the results we got in our original query.

DECLARE @T AS TABLE(
custid INT NOT NULL,
orderdate DATETIME NOT NULL
);
INSERT INTO @T(custid,orderdate) VALUES(897, '20131001'), (2112,'20130901');
SELECT o.*
FROM @T t
INNER JOIN dbo.Orders o ON o.custid = t.custid;

Let’s observe the execution plan under the new SQL Server 2014 database engine:

8_7

It looks good! So, for less restrictive filter it estimates significantly less rows and the estimation is correct with a desirable plan. If we would remove the predicate with equals operator and leave the other one, i.e.

SELECT o.*
FROM @T t
INNER JOIN o.orderdate > t.orderdate;

SQL Server would come up with the cardinality of the large table (10M), which is expected and common for both CEs.

The point of difference is how they combine two predicates. It could be that new CE does not know how to implement exponential back-off algorithm in this case and I would accept some guess value, but this value should be between 1 and the cardinality of the first predicate (1901). Even if the predicates are fully correlated the resulted cardinality (pred1 AND pred2) cannot be greater than the lowest particular predicate cardinality. The resulted cardinality is simply cardinality of the large table which is not only statistically, but also logically incorrect. Therefore I would interpret this as a bug.

Workaround

Is there here any workaround? Yes, we can split our query in two steps. In the first step we’ll process only first predicate and return all orders for all customer IDs from the list and store them in a temp structure and in a separate statement remove the orders from there according to the second predicate:

DECLARE @T AS TABLE(
custid INT NOT NULL,
orderdate DATETIME NOT NULL
);
INSERT INTO @T(custid,orderdate) VALUES(897, '20131001'), (2112,'20130901');
DECLARE @Result AS TABLE(
id INT NOT NULL PRIMARY KEY,
custid INT NOT NULL,
orderdate DATETIME NOT NULL,
amount MONEY NOT NULL,
note CHAR(100) NULL,
orderdate2 DATETIME NOT NULL
);
INSERT INTO @Result
SELECT o.*, t.orderdate AS orderdate2
FROM @T t
INNER JOIN dbo.Orders o ON o.custid = t.custid;
DELETE FROM @Result WHERE orderdate2 >= orderdate;
SELECT id, custid, orderdate, amount, note
FROM @Result;

Let’s observe the execution plan:

8_8

A little bit more code, less intuitive, not exactly fast and elegant as under the old CE, but Nested Loop Join is here again. And generally, when you work with very large tables in OLTP you don’t want to see the other join operators. Of course, another option is to use the old compatibility level for database or to use OPTION(QUERYTRACEON 9481) for the statement execution, both with limits and restrictions already discussed in this post series.

Conclusion

New CE definitely brings some regression, but in few cases it comes up with logically incorrect estimations which can significantly affect queries where large tables are joined with table variables. In the here described case the performance degradation is dramatic and unacceptable. Fortunately, there is a workaround which allows us to solve the problem and still use new compatibility level for our database. And again, it’s very important to test all important queries from your workload. At the end for you it’s more important how the new CE behaves against your workload rather than for most of the workloads of most of Microsoft customers most of the time. Thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: