Saturday, December 2, 2017

Data Channel Interview - 09 - Mr.Madhivanan on Best Practices in TSQL Programming

Dear All,

Very happy to share the 9th interview of #DataChannel. Mr.Madhivanan Ramachandran, 10 time Microsoft MVP Award winner and a Data Platform expert from India discusses "Best Practices for TSQL Programming".  Mr.Madhivanan demystifies quite a few concepts with his to the point answers in this interview. Happy Learning!!!





Regards,
Nagaraj 

Wednesday, November 29, 2017

Adaptive Query Processing and Automatic Tuning - Adaptive Joins - 2

Continuing from the introduction written over here, let’s look at one of the aspects of Adaptive Query Processing which is Adaptive Joins.

Before we start Adaptive Joins, SQL Server's Query optimizer internally has 3 join types.

Nested Loop Join - Usually picked by optimizer, when tables participating in the joins are small or when one of the table is big and indexed on joining column
Hash Join: One of the table is big but doesn't have an index
Merge Join: Both the tables are big, and the joining column is indexed

So, the size of the table or the rows participating in the join plays a major role SQL's Query optimizer picking the correct join type. Let’s say, SQL Server estimates few rows participating in the join at compile time and picks "nested join" but during runtime, the join extracts larger number of rows and thereby making the choice of plan ineffective. 

SQL Server 2017's adaptive join addresses this problem. With Adaptive Join, SQL Server comes with plan attribute called "Adaptive Row Threshold" which is a row count threshold estimated by SQL Server.  "Adaptive Row Threshold" will help SQL Server dynamically alter the plan choice at run time. Adaptive join in short will work in the following way

  • If the number of rows participating in the join are greater than "Adaptive Row Threshold" rows, then "Hash Join" operator is used for join operation
  • If the number of rows participating in the join are lesser than "Adaptive Row Threshold" rows, then "Nested loop Join" operator is used for join operation
Consider the following query:

SELECT  [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]
FROM    [Fact].[Order] AS [fo]
INNER JOIN [Dimension].[Stock Item] AS [si]
       ON [fo].[Stock Item Key] = [si].[Stock Item Key]
WHERE   [fo].[Quantity] = 360;

Observe the picture below:

Query Optimizer fixes 68 rows as Adaptive join threshold.
Estimated number of rows for the outer table in the join was 213. As 213 is greater than 68, estimated join type is Hash Match Join. During runtime, 206 rows from outer table participated in the join and as it was greater than "Adaptive join threshold" of 68, Hash match join was selected.
Sharing another example where Adaptive Join switches the join type in runtime. Refer to pic below

In this case, threshold was 46 rows and estimate were 87 rows. However, runtime row count was just 36 rows hence switching the join type to Nested loop join from an estimate of "Hash Join"
 The ability to alter execution plan operator in runtime would make SQL Server's query optimizer weed out most of the incorrect plan choices during runtime.

Saturday, November 25, 2017

Tricks - Copy T SQL Result to Excel

Sometimes, some of the silliest of tricks save so much of time and effort. Very common scenario is to copy a query result to excel. Most take either of the 2 approaches

* Copy the result from result grid of management studio and paste to Excel as shown below


The problem with above approach is sometimes, when one pastes the result to excel, the formatting can go haywire, especially if the result contains carriage return (new line) and commas as shown below.


* Export to excel via export wizard

Little tedious and sometimes getting the mapping right can be a pain.

Over the years, the simplest and effective technique, I have used is running T SQL query directly from excel (Yes, you can do that :)). Not just that, it gives a cleanly formatted result all the time!!! Step by step instructions provided below

How to run TSQL Query from Excel:

Open Excel and move to data tab and click on the icon "Other Data Sources" and select "SQL Server" as shown below.


Enter the server to connect to and the credentials


Select the database against which you want to run the query. Pick a random table (doesn't matter if you are not going to query it. Any table is fine)


Click next and finish (we are not done yet :))



Click on properties



Uncheck background refresh check box



Click on the "Definition" tab and change the command type to "SQL" from "Table"



Paste the query you want to run on the "Command Text" box as shown below



Result would be a nice formatted excel sheet as shown below



From excel once can not only get select query results but stored procedures as well. Excel also servers as an excellent tool to query database when one needs to access the database on a machine that doesn't have SQL Server Management Studio. Hope the tip was useful!!!

Sunday, November 12, 2017

Adaptive Query Processing and Automatic Tuning - Introduction - 1


SQL Server 2017 and Azure SQL Database are being touted as the self learning, artificial intelligent database. One of the main reasons for the same is due to release of the following features

 1) Adaptive Query Processing

2) Automatic Tuning
Adaptive Query Processing is SQL Server's first attempt using artificial intelligence to fix poor performing queries due to incorrect cardinality estimates. Query optimizer produces better plans using "Adaptive Query Processing" which is enabled by default when one is on SQL 2017's compatibility mode (CE 140). Queries can benefit via Adaptive Query Processing in the following scenarios.

1) Adaptive Join -> Query optimizer makes a intelligent choice between Nested Loop and Hash Join in runtime

2) Memory Grant Feedback -> Query optimizer adjusts the memory granted to a query depending upon the previous run's execution results

3) Interleaved Execution -> Portion of the query is executed while generating the query plan to come with the estimates. Applies currently for Multi Statement Table Valued Functions alone

 The above scenarios and operators are likely to expand in upcoming releases or patches

 Other aspect of auto driven query tuning is "Automatic Tuning" feature. Automatic Query tuning helps in the following scenarios
 Automatic Tuning:

1) Force Last Good Plan: Attempts to fix query regression by detecting any sudden change in query performance due to plan change and forcing the last good performing plan. one can turn it on using the command below
ALTER DATABASE DBName SET AUTOMATIC_TUNING ( FORCE_LAST_GOOD_PLAN = ON );
2) Automatic Index creation and deletion: Feature available only in Azure SQL Database. By studying the workload, the database engine automatically deletes and creates indexes!!!
This is just a introduction post. As usual,will be writing a series of posts to cover the individual sections in detail.

Friday, October 27, 2017

SQL Azure Databases - Active Secondary - Failover Groups - Part 4

150th Blog Post . So, the post involves video demo :)

Previous posts on this series was about using "Geo Replication" to configure active secondary. However, it comes with few limitations like

* No transparent failover capabilities like Always on Listener

* No availability groups and group failover possible

So, Azure Failover Groups provides "Read Only" / "Read Write" connection URLs which allows applications transparent failovers. Read only and Read write workloads get different connection strings and upon failover, applications connect to the appropriate node automatically

Also, one can combine group of databases as a unit of failover and make them failover together as well, as done in Always on availability groups. 

Step by step instructions on configuring availability groups provided below

* Configure Primary Server, Secondary Server and Geo Replication as explained in http://strictlysql.blogspot.sg/2017/05/sql-azure-databases-active-secondary.html

* Click on the primary server ( not database ) and select failover groups


* Click on the add group and

          * Provide the failover group a name
          * Pick the existing database "db1" to be added to the failover group
          * Pick the existing secondary server
          * Click on create


* Once done, failover group is listed as shown below


* Click on the failover group, Read only and Read Write URLs can be seen as shown below


* To perform failover click on the failover icon. Post failover, failover group looks like the picture below


To make the understanding easier, I have recorded the video of the demo. Please take a look. Happy learning



Wednesday, September 6, 2017

Dynamic Threshold for Statistic Update - SQL Server 2016

By default, on a SQL Server Database, statistics are updated automatically when one hits a threshold of changes. If the Auto Update Statistics setting is turned on by default, the threshold for number of changes to trigger auto statistic update before SQL Server 2016 was

For tables with more than 500 rows - 20% of tables row count + 500 changes
For tables with less than 500 rows - 500 changes

However, in modern day times, the above formula may not be effective. 10 million row tables is pretty much a norm and not typically considered huge. The threshold of changes for 10 million row table would be 2,000,500 changes, which is a huge number of changes required to trigger auto stat update. This effectively implies, on larger tables one would be running without updated statistics almost all the time as the threshold is too high as the tables get bigger. To address this problem, SQL Server 2016 has a behavior called dynamic statistic threshold,  which is enabled by default.


As the table gets bigger, the threshold for statistic update reduces. For tables greater than 25,000 rows dynamic statistics is applicable. For example consider the following table with 1 million rows

Let’s update statistic before starting the task



Make note the updated stat time (12:12 AM)

Now let’s run the following script to make 100,000 changes (10%)

SET ROWCOUNT 100000
GO
/* SET Rowcount so that we can update just 100K rows */

Update Production.product_1M
SET StandardCost = StandardCost + 5
WHERE  Productid % 5 = 0
GO

Please note on older versions of SQL Server, by default, 10% changes wouldn't suffice to trigger auto stat update.


Now let’s check if statistics are updated. A select is query has to be fired on the table to trigger auto statistic update.


Surprisingly they are updated as the statistic updated time has changed to 12:15 AM.

Though there is no official exact formula how statistic update has been triggered, following pic shows how the threshold percentage reduces with row count of the table.


Picture referred from the following blog: https://blogs.msdn.microsoft.com/saponsqlserver/2011/09/07/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371/

Dynamic Statistic update threshold is not a new feature and has been available since SQL Server 2008 via the trace flag 2371. Meaning, one can enable "Dynamic Statistic update threshold" on SQL Server 2008 and after by turning the trace flag 2371 on. On SQL Server 2016 and after, "Dynamic Statistic update threshold" has been made a default behavior.

Sunday, September 3, 2017

DBCC CloneDatabase for SQL Server 2012

Dear All,

In case you are wondering, title of the post is not a typo :) DBCC CloneDatabase is going to be available for SQL Server 2012 :)

For folks who don't know, DBCC CloneDatabase is a feature introduced in SQL Server 2014 SP2 for creating a skeleton copy of the database. When I mean skeleton copy, I mean without the data but the exact schema, table structure, index, constraints, file and file groups and the best of all - table and index statistics. Using DBCC CloneDatabase, one can actually get a exact copy of the database ( including actual statistics ) but without the data. This is extremely useful in performance troubleshooting as one can clone a production database using DBCC CloneDatabase and backup the cloned database and restore to DEV / UAT to get estimated plans. SQL Server 2016 onwards Querystore's data and reports are also included to the cloned database. One can read more about DBCC CloneDatabase over here

Coming back to the title of the post, Microsoft product team will be extending DBCC CloneDatabase to SQL Server 2012 as a part of SQL Server 2012 Service Pack 4. SQL Server 2012 Service Pack 4 ( SP4 ) is due to be released in September 2017. A excellent move by product team to extend cloned database SQL Server 2012. Check out the official announcement of SP4 over here. More importantly, observe the comments section :)