Monday, December 21, 2009

Table Variable -> Recompilations,Statistics and Parallel Plans



My last post on Table variable vs Temp table ended with comparisons of their performance at different scenarios. This one not essentially a part -3, but will deal with few characteristics of table variable.

By definition, Table variables are never recompiled. Just to provide little intro about Recompilations,Recompilation reloads a query plan into the cache. That is , the current plan prepared for the statement is marked as invalid and a new plan is loaded into the cache and this process is briefly put as recompilation. Recompilations happen when statistics are updated to an object,when the definition of an object changes and a few other scenarios. As stats are never maintained and the structure of a declared table variable can never change, table variables never cause a recompilation. Though Recompilations come at a cost, they are useful in maintaining the correct stats for our objects and hence its a trade off between Recompilation cost vs Updated stats.

Consider the following query:


DECLARE  @dt DATETIME

SELECT @dt = Getdate()

DECLARE  @tb  TABLE(

                    row_number  INT    PRIMARY KEY,

                    supplier_id INT,

                    product_id  INT,

                    trans_city  VARCHAR(100),

                    trans_date  DATETIME,

                    qty         INT ,

                       UNIQUE (supplier_id,row_number ))


INSERT INTO @tb

SELECT Row_number()

         OVER(ORDER BY supplier_xid),

       supplier_xid,

       product_xid,

       trans_city,

       trans_date,

       qty

FROM   supplier_product_customer


SELECT *

FROM   @tb

WHERE  row_number = 100


SELECT *

FROM   @tb

WHERE  supplier_id = 1


SELECT Datediff(ms,@dt,Getdate())


The table contains 110k rows.

As you may notice, I am having a table variable, a Primary Key on the column row_number which implies, it would have a clustered index on it and a Unique key on Supplier_id,row_number which will create a non clustered index on it.There is a myth going around saying that you cant have Non clustered indexes on table variable. It is true that you cant directly create NC index but you can do so indirectly by making unique keys. Though, the NC index on table variable is not all that useful, but just wanted to remind that it is possible to create one.ok.Now for the query plan..



As you may notice the highlighted section, which shows a clear differnce between Estimated rows and actual rows.When you use table variable the estimated count is always 1 as it has no stats for judge the number of rows.

In comparison, consider the same code against a temporary table


SET statistics  io  ON


DECLARE  @dt DATETIME


SELECT @dt = Getdate()


CREATE TABLE #tb (

  row_number  INT    PRIMARY KEY,

  supplier_id INT,

  product_id  INT,

  trans_city  VARCHAR(100),

  trans_date  DATETIME,

  qty         INT)


CREATE INDEX [CIX_supplier_product_customer] ON #tb (

      supplier_id)

WITH (statistics_norecompute = OFF,

      sort_in_tempdb = OFF,

      ignore_dup_key = OFF,

      drop_existing = OFF,

      online = OFF,

      allow_row_locks = ON,

      allow_page_locks = OFF)

ON [PRIMARY]


INSERT INTO #tb

SELECT Row_number()

         OVER(ORDER BY supplier_xid),

       supplier_xid,

       product_xid,

       trans_city,

       trans_date,

       qty

FROM   supplier_product_customer


SELECT *

FROM   #tb

WHERE  row_number = 100


SELECT *

FROM   #tb

WHERE  supplier_id = 1


SELECT Datediff(ms,@dt,Getdate())


one small difference. I am creating NC index directly on supplier_id instead of a unique key,as we can directly create a NC index in temp tables. Query plan for the same provided below.




Few things to infer from the query plan. The Actual rows and estimated rows are perfect, as stats are present. Second inference is that Query uses the non clustered index. Table variable doesnt use the NC index created and uses a cluster scan instead. NC index usage would have been effective as the Number of IOs for executing the query 'Select * from #tb where supplier_id = 1' was 38 and the same query against the table variable was 631.

The query 'Select * from #tb where supplier_id = 1' was repeated at 100/1000/10k/100k rows and at all cases NC index on table variable was never used.I observed that NC index created on table var got used only when the NC index covered the query.ie. When the query was changed to 'Select supplier_id,row_number from #tb where supplier_id = 1'. So, the NCIndex on a table variable is not a great idea as the storage engine finds it effective to do a scan most of the times.
On the other hand, the Temp table uses the NC index and provides results at lesser number of IOs.But still, time taken still marginally favours table variables and so whether indexes are getting used or not, table variables win when the queries are fairly simple.

Now for another aspect of table variables:

BOL states that table variable doesn't use parallel plans for updates and deletes.
Table variables don't use parallel plans as effectively as temp tables do.If your 'select' query would gain from Parallel plans then you are better off going for temp tables. Just a example query:


SELECT   t1.supplier_id,
         t1.product_id,
         Count(1)
FROM     #tb t1,
         #tb t2
WHERE    t1.product_id = t2.product_id
         AND t1.trans_city = t2.trans_city
         AND t1.supplier_id != t2.supplier_id
GROUP BY t1.supplier_id,
         t1.product_id
HAVING   Count(1) > 100
ORDER BY t1.supplier_id,
         t1.product_id

Query plan for the same is provided below.



The query finished in about 5 seconds with a parallel plan . The table had 50,000 rows.

The query plan for the same query when table variable is used is provided below.




Query plan doesnt use a Parallel plan ( Degree of parallelism = 1 ). The query took 16 minutes to complete.5 seconds against 16 minutes makes a lot of difference. So, a Parallel plan for complex queries can provide a huge improvement on performance. So, if the query is likely to use parallel plans and its likely to gain out of it ( there are few exceptions where parallelism can slowdown as well which we will discuss on another day ).

Though the table variable doesnt use stats/parallel plans, for simpler queries with relatively small row counts table variables appear to be a fair choice. For Decision making/Datawarehouse type of applications which involve complex calulations with huge rows the balance shifts sharply towards temp tables.

No comments: