Sunday, November 12, 2017

Adaptive Query Processing and Automatic Tuning - Introduction - 1

SQL Server 2017 and Azure SQL Database are being touted as the self learning, artificial intelligent database. One of the main reasons for the same is due to release of the following features

 1) Adaptive Query Processing

2) Automatic Tuning
Adaptive Query Processing is SQL Server's first artificial intelligent attempt to fix poor performing queries due to incorrect cardinality estimates. Query optimizer produces better plans using "Adaptive Query Processing" which is enabled by default when one is on SQL 2017's compatibility mode (CE 140). Queries can benefit via Adaptive Query Processing in the following scenarios.

1) Adaptive Join -> Query optimizer makes a intelligent choice between Nested Loop and Hash Join in runtime

2) Memory Grant Feedback -> Query optimizer adjusts the memory granted to a query depending upon the previous run's execution results

3) Interleaved Execution -> Portion of the query is executed while generating the query plan to come with the estimates. Applies currently for Multi Statement Table Valued Functions alone

 The above scenarios and operators are likely to expand in upcoming releases or patches

 Other aspect of auto driven query tuning is "Automatic Tuning" feature. Automatic Query tuning helps in the following scenarios
 Automatic Tuning:

1) Force Last Good Plan: Attempts to fix query regression by detecting any sudden change in query performance due to plan change and forcing the last good performing plan. one can turn it on using the command below
2) Automatic Index creation and deletion: Feature available only in Azure SQL Database. By studying the workload, the database engine automatically deletes and creates indexes!!!
This is just a introduction post. As usual,will be writing a series of posts to cover the individual sections in detail.

Friday, October 27, 2017

SQL Azure Databases - Active Secondary - Failover Groups - Part 4

150th Blog Post . So, the post involves video demo :)

Previous posts on this series was about using "Geo Replication" to configure active secondary. However, it comes with few limitations like

* No transparent failover capabilities like Always on Listener

* No availability groups and group failover possible

So, Azure Failover Groups provides "Read Only" / "Read Write" connection URLs which allows applications transparent failovers. Read only and Read write workloads get different connection strings and upon failover, applications connect to the appropriate node automatically

Also, one can combine group of databases as a unit of failover and make them failover together as well, as done in Always on availability groups. 

Step by step instructions on configuring availability groups provided below

* Configure Primary Server, Secondary Server and Geo Replication as explained in

* Click on the primary server ( not database ) and select failover groups

* Click on the add group and

          * Provide the failover group a name
          * Pick the existing database "db1" to be added to the failover group
          * Pick the existing secondary server
          * Click on create

* Once done, failover group is listed as shown below

* Click on the failover group, Read only and Read Write URLs can be seen as shown below

* To perform failover click on the failover icon. Post failover, failover group looks like the picture below

To make the understanding easier, I have recorded the video of the demo. Please take a look. Happy learning

Wednesday, September 6, 2017

Dynamic Threshold for Statistic Update - SQL Server 2016

By default, on a SQL Server Database, statistics are updated automatically when one hits a threshold of changes. If the Auto Update Statistics setting is turned on by default, the threshold for number of changes to trigger auto statistic update before SQL Server 2016 was

For tables with more than 500 rows - 20% of tables row count + 500 changes
For tables with less than 500 rows - 500 changes

However, in modern day times, the above formula may not be effective. 10 million row tables is pretty much a norm and not typically considered huge. The threshold of changes for 10 million row table would be 2,000,500 changes, which is a huge number of changes required to trigger auto stat update. This effectively implies, on larger tables one would be running without updated statistics almost all the time as the threshold is too high as the tables get bigger. To address this problem, SQL Server 2016 has a behavior called dynamic statistic threshold,  which is enabled by default.

As the table gets bigger, the threshold for statistic update reduces. For tables greater than 25,000 rows dynamic statistics is applicable. For example consider the following table with 1 million rows

Let’s update statistic before starting the task

Make note the updated stat time (12:12 AM)

Now let’s run the following script to make 100,000 changes (10%)

/* SET Rowcount so that we can update just 100K rows */

Update Production.product_1M
SET StandardCost = StandardCost + 5
WHERE  Productid % 5 = 0

Please note on older versions of SQL Server, by default, 10% changes wouldn't suffice to trigger auto stat update.

Now let’s check if statistics are updated. A select is query has to be fired on the table to trigger auto statistic update.

Surprisingly they are updated as the statistic updated time has changed to 12:15 AM.

Though there is no official exact formula how statistic update has been triggered, following pic shows how the threshold percentage reduces with row count of the table.

Picture referred from the following blog:

Dynamic Statistic update threshold is not a new feature and has been available since SQL Server 2008 via the trace flag 2371. Meaning, one can enable "Dynamic Statistic update threshold" on SQL Server 2008 and after by turning the trace flag 2371 on. On SQL Server 2016 and after, "Dynamic Statistic update threshold" has been made a default behavior.

Sunday, September 3, 2017

DBCC CloneDatabase for SQL Server 2012

Dear All,

In case you are wondering, title of the post is not a typo :) DBCC CloneDatabase is going to be available for SQL Server 2012 :)

For folks who don't know, DBCC CloneDatabase is a feature introduced in SQL Server 2014 SP2 for creating a skeleton copy of the database. When I mean skeleton copy, I mean without the data but the exact schema, table structure, index, constraints, file and file groups and the best of all - table and index statistics. Using DBCC CloneDatabase, one can actually get a exact copy of the database ( including actual statistics ) but without the data. This is extremely useful in performance troubleshooting as one can clone a production database using DBCC CloneDatabase and backup the cloned database and restore to DEV / UAT to get estimated plans. SQL Server 2016 onwards Querystore's data and reports are also included to the cloned database. One can read more about DBCC CloneDatabase over here

Coming back to the title of the post, Microsoft product team will be extending DBCC CloneDatabase to SQL Server 2012 as a part of SQL Server 2012 Service Pack 4. SQL Server 2012 Service Pack 4 ( SP4 ) is due to be released in September 2017. A excellent move by product team to extend cloned database SQL Server 2012. Check out the official announcement of SP4 over here. More importantly, observe the comments section :) 

Monday, August 28, 2017

Data Channel Interview - 08 - Leila Etaati on Machine Learning using Microsoft BI

Dear All,

Very happy to share the 8th interview of #DataChannel. Dr. Leila Etaati MVP, MCT from New Zealand discusses Machine Learning using Microsoft Business Intelligence tools in this interview. Leila gives us an insight into Machine Learning process and explains how Azure ML, Power BI, R and SQL Server can be used in it. Thanks Leila for the informative interview

Friday, June 9, 2017

SQL Azure Databases - Active Secondary - Failover - Part 3

Continuing on the series on SQL Azure Databases - Active Secondary, this post will explore the task of failing over to secondary.

Fairly straightforward task, just one needs to take care of few items.

Step 1: Login into and click on Databases. The primary and secondary databases are listed

Step 2: Click on the primary and then click on "Geo Replication"

Step 3: Scroll down and click on the secondary as shown below

Step 4: Click on Failover as shown below

Step 5: Click "Ok" to confirm Failover

Step 6: The dotted lines between primary and secondary, showing the failover progress, turn to solid line upon completion of failover

Step 7: After failover, the replication role changes confirming the successful failover as shown below

Step 8: Secondary server accepts write operations as shown below

Few important things to know

This failover is like a DB Mirroring failover. This means application will need to change their connection strings to secondary server upon failover. Unlike, Always on Availability groups, azure geo replication explained above, doesn't support transparent application failover.

However, Azure Failover Groups (right now in preview) allows transparent failover for applications. More about that on next post :)

Saturday, May 27, 2017

SQL Azure Databases - Active Secondary - How to connect? - Part 2

As a continuation of "SQL Azure Databases - Active Secondary" series started earlier, this post will cover how to connect to a azure SQL Database primary and secondary after configuring "Geo Replication".

To some, it may sound like "what is the big deal in connecting to, after configuring it". If you are new to azure, it can take a while for you figure out how to connect and at sometimes it can turn out to be one of those annoying simple things, which refuse come off in easily :)

As described in the earlier post, we have configured the following

* Database called "rajprimary" on "rajprimarysvr" server
* Geo replicated database copy of "rajprimary" on "rajsecondary" server  

To connect to the primary or the secondary server, follow the steps provided below

Configure the firewall:

* Login to your

* Click on SQL Databases on the left 

* Click on the primary / secondary databases ( as shown under replication role), depending upon the one you would like to connect to

* Click on set firewall as shown below

* Firewall settings are turned on and IP Addresses that are listed alone are allowed to access. Click on "add client IP" for allowing the machine that you are using, to connect to the azure database. 

* You may add other client IP Addresses to, if they are to connect to the database. Click on save after adding.

Connect to the database

* Start sql server management studio on the machine you are logged on, and connect to "". In this case it would be

 if you are not sure of your database servername, you may look at by clicking on "sql databases" -> "databasename" -> "Properties"

 * Provide the user account and password used while setting up the server

Upon connecting, one would notice that only database related components are visible. Server related components like SqlAgent, Server Error logs are not seen as we are using "Sql database" service on azure.

Same steps can be repeated for secondary server too. More on Azure - Geo replication in upcoming posts. 

Thursday, May 11, 2017

Find the number of rows for a value, without querying the table

Dear all,

Being a DBA, developers approach me with questions like can you run this query for me in production?

Select *

WHERE UnitPrice = 32.6

Lets, say the table contains few billion rows and UnitPrice is not an indexed column, then query would take ages to provide the result. Not just that, it causes a massive I/O on a busy production database.  

For the above situation, I would usually ask the developer, "Would you need a accurate value or is it ok if it is approximate?". If approximate numbers are fairly sufficient then, one can answer these type of questions without actually querying the table. How?

Simple - STATISTICS is the word :)

To find the statistic that will be useful for the query, please use the following script and provide the table name and column name. The script provides the statistic name we should be checking.

Declare @table_name varchar(100)
Declare @Column_name varchar(100)

SET @table_name = 'FactInternetSales'
SET @Column_name = 'UnitPrice'

SELECT OBJECT_NAME(s.object_id) AS object_name, 
    COL_NAME(sc.object_id, sc.column_id) AS column_name, AS statistics_name 
FROM sys.stats AS
JOIN sys.stats_columns AS sc 
    ON s.stats_id = sc.stats_id 
AND s.object_id = sc.object_id 
OBJECT_NAME(s.object_id) like @table_name
COL_NAME(sc.object_id, sc.column_id) like @Column_name

After finding the statistic name, just go to the table, expand "Statistics" and double click on the stat that you are interested in. The click on the details section, find the value interested in on "Range_HI_Key" Column and observe the "EQ_Rows" section to get the approximate number of rows. 

Most of you would know that statistics can be used to find query estimates but few (I guess) would use it operationally for these kind of requests. One can also use them to estimate for queries of this type too

Select count(*), 
from [FactInternetSales]
Group by ProductKey

Select count(*) from
WHERE UnitPrice > 40  -- ( Possible but can get little tricky at times, more of it on upcoming posts :) )

Few quick pointers:

The above method relies on the following prerequisites

* Either Auto create Stats should be turned on or the column should be the leading column of any index for the statistic to be present. Auto Create Stats are on by default

* For the stat to be reasonably accurate, one needs to have Auto Update Statistics on ( which is also "on" by default )

* Automatically created statistics are named like '_WA%'. For a column to have a auto created stat, the column should have been queried at least once since database creation. Indexed column would not need auto created statistics

* To figure out when was the statistic last updated, use the following query 

sp.stats_id, name, filter_definition, last_updated, rows,
rows_sampled, steps, unfiltered_rows, modification_counter  
FROM sys.stats AS stat  
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp 
WHERE stat.object_id = object_id('people_Data')
and name like '_WA_Sys_0000000D_1273C1CD';

If the value you are looking for doesn't appear as a "Range_HI_Key" on the histogram then refer to the next biggest value on the "Range_Hi_Key" and observe "Avg_Range_Rows" for the estimates. Please refer to this post for interpreting histograms.

There is always more to statistics. Will cover them in upcoming posts.

Sunday, May 7, 2017

SQL Azure Databases - Active Secondary - How to configure? - Part 1

Dear all,

I have taken the azure dive at last :)
In case if you are new to azure, azure is the Microsoft cloud platform service, offering lots of services for handling data platform tasks. There are tons and tons of material on it and perhaps you can get a good introduction on the following links - here and here

This post is about SQL Azure Database - a service (a PAAS service ) which allows one to host just a database in cloud and a feature of it called "active secondary". "Geo Replication" - Active Secondary is a feature in SQL Azure in premium service tier ( similar to our enterprise edition but certainly not as costly :) ), which synchronizes data asynchronously to another server (either in the same geographical location or different ) and allows one to run read only queries to it.

In my opinion, the below specifications are perhaps the most common requirement of most modern day mid size applications.

  •  A database with good performance to store and retrieve data
  • A simple disaster recovery / high availability solution
  • Readable secondary if possible

All these with a really affordable cost ( will discuss that part soon ) , just in few clicks is what makes SQL Azure tick. This post will run thro the details on how to configure it.

Pre requisite:

1) Grab a Microsoft Azure subscription :)

 Demo :

Now for the step by step  "how to do" screenshots :)

1) Login to and Click on SQL Databases -> Click on Create SQL Database

2) Provide the server name. Though one pays only for the database, one just needs to create a server to host the database. Server will be maintained by azure and no charges are applicable for the server while using  “sql azure database” service.

Provide unique names for
·         Database name
·         Resource group ( more about it later )
·         Server name

 3) Click on "Configure required settings" and provide Server Details to be provided as shown below
4) Set the service level to "Premium"
One needs to pick Premium to have readable secondary
5) Click on the create database to complete primary setup. Creation takes few minutes

6) Configure GEO replication - secondary

Once the notification bar indicates that the primary has been created, click on sql databases to find the primary database created.
Pick “Geo Replication” as shown below

7) Configure GEO Replication

The screen below allows one to pick the geographical location of the secondary server. If one needs lesser latencies, closer locations to primary are preferred

8) Once location is picked, one is prompted to fill in the rest of the details of secondary server
 9) Create a secondary server similar to primary, by providing details like servername and admin login name
10) Create the secondary server

Upon clicking ok, Primary and secondary starts to synchronize as shown below

11) Synchronization completion
The completed tick marks will indicate synchronization completion as shown below
Synchronization can take few minutes or longer depending upon data size. Once complete, click on SQL Databases on Azure Portal to verify as shown below
With that the setup of active secondary on sql azure database is complete. Pretty simple as it takes a few clicks and needs almost zero pre requisites. A similar setup on "on Premises" would take days to months in most places.
 That's a lengthy first post on SQL Azure - Active Secondary.  On subsequent posts, I will cover how to connect to primary and secondary, failover , monitoring and how applications can benefit via readable secondary and of course the cost involved as well.