Tuesday, November 3, 2009

NOT IN VERSUS NOT EXISTS



Logical difference between NOT IN and NOT EXISTS:

Though 'Not in' and 'Not exists' serve the same purpose, there is a logical difference
in their operation other than 'Not Exists' being correlated.

Before I explain the difference let us see an example.


CREATE TABLE #capital (

  capital_id INT,

  city_name  VARCHAR(100))


CREATE TABLE #city (

  city_name VARCHAR(100),

  country   VARCHAR(100))



INSERT INTO #capital

SELECT 1,

       'New Delhi'

UNION

SELECT 2,

       'Singapore'

UNION

SELECT 3,

       'London'


INSERT INTO #city

SELECT 'New Delhi',

       'India'

UNION

SELECT 'Singapore',

       'Singapore'

UNION

SELECT NULL,

       NULL


Two tables are created.. namely #capital,#city. #capital table contains three cities
.. New Delhi,singapore, London. The #city table contains New Delhi,Singapore and a NULL
row.

We would like to write a query to find the city that is present in Capital and not in
city table. Expected result is London.

First let us take 'NOT IN':


SELECT *

FROM   #capital

WHERE  city_name NOT IN (SELECT city_name

                         FROM   #city)


Straight forward query..
To our surprise, the query returns Zero rows.

Now for NOT EXISTS.


SELECT *

FROM   #capital cp

WHERE  NOT EXISTS (SELECT cy.city_name

                   FROM   #city cy

                   WHERE  cy.city_name = cp.city_name)


Hey presto..it works!!! London is returned.

Why this behaviour with NOT in?

The reason lies in the way in which 'NOT IN' handles NULL.
First point to note is 'NOT IN' condition can return three possible values.
They are True,False,NULL. But 'NOT EXISTS' always returns either True/False.

Closer look at how NOT IN works:-

NOT IN condition returns TRUE when condition check is successful
EX: 'London' NOT IN ( 'New Delhi','Singapore').
NOT IN condition returns FALSE when condition check FAILS
EX: 'London' NOT IN ( 'London','Singapore').

NOT IN condition returns NULL when condition check is made against NULL.
EX:
'London' NOT IN
( 'New Delhi','Singapore',NULL)


Returns NULL because 'London' not in ( NULL) will always return NULL and not TRUE as expected. As for the row to be successfully returned the result of the condition checked ( in this case 'not in') should be TRUE. But because of NULL returned by NOT in ( instead of TRUE ) it is discarded. So if the sub query contains NULL then for NOT IN condition no rows will be returned for any value.

Though 'IN' also has a similar behavior the anomaly is realized only in NOT IN because

'London' IN
( 'New Delhi','Singapore',NULL)

Returns NULL and not TRUE, so 'London' is not returned.


'London' IN
( 'New Delhi','Singapore','London',NULL)

Returns True as 'London' in ( 'London' ) is TRUE though 'London' in ( NULL ) is NULL.
Note that 'IN' needs just one TRUE.

But NULL comparison using '=' returns FALSE always ( assuming you haven't changed the default behavior using SET ANSI_DEFAULTS option). ie NULL = 'London' is always False. So, Not Exists always works.

As I have stressed before , another reason to correlate queries.
Performance wise as well NOT EXISTS performs better than NOT IN.

2 comments:

Anita said...

Hey. i was searching for this topic for some time. Thanks for the example.

Nagaraj Venkatesan said...

Welcome :)