If I know an index will have unique values, how will it affect performance on inserts or selects if I declare it as such.
If the optimiser knows the index is unique how will that affect the query plan?
I understand that specifying uniquenes can serve to preserve integrity, but leaving that discussion aside for the moment, what are the perfomance consequences.
Integrity in your database is almost always of paramount importance.
Of course the optimizer will take uniqueness in consideration. It affects the expected row count in query plans.
Yes, it will be taken into consideration by the query engine.
Perhaps more important: the uniqueness will protect the data integrity. Performance would a reason to ignore this.
Performance could be affected positively or negatively or not at all: it would depends on the query, if the index is used etc
Performance is negatively affected when inserting data. It needs to check the uniqueness.
Long story short: if your data are intrinsically
UNIQUE
, you will benefit from creating aUNIQIE
index on them.See the article in my blog for detailed explanation:
UNIQUE
Now, the gory details.
As @Mehrdad said,
UNIQUENESS
affects the estimated row count in the plan builder.UNIQUE
index has maximal possible selectivity, that’s why:almost surely will use
NESTED LOOPS
, whilemay benefit from a
HASH JOIN
if the optimizer thinks thatnon_unique_indexed_field
is not selective.If your index is
CLUSTERED
(i. e. the rows theirselves are contained in the index leaves) and non-UNIQUE
, then a special hidden column calleduniquifier
is added to each index key, thus making the key larger and the index slower.That’s why
UNIQUE CLUSTERED
index is in fact a little more efficicent than anon-UNIQUE CLUSTERED
one.In
Oracle
, a join onUNIQUE INDEX
is required for a such calledkey preservation
, which ensures that each row from a table will be selected at most once and makes a view updatable.This query:
will work in
Oracle
, while this one:will fail.
This is not an issue with
SQL Server
, though.One more thing: for a table like this,
, this query:
will use a
TOP N SORT
, while this one:will use just an index scan.
For the latter query, there is no point in additional sorting on
ival
, sinceuval
are unique anyway, and the optimizer takes this into account.On sample data of
200,000
rows (id == uval == ival
), the former query runs for15
seconds, while the latter one is instant.I’ve just tested this on my machine for a Production table containing more than 1 million rows because I figured it was a good test. The results were interesting, here’s the raw numbers:
— No Index:
— Unique Constraint:
The Setup consisted of getting the Maximum of the field that I was adding the Unique Constraint to – so logically the performance was dramatically increased by adding the constraint. This would also improve performance when searching by this foreign key.
Interestingly the Insert Time improved slightly as well (by 2.7228%), so only positive impacts [in my test case] of adding the Constraint (+ inherent index).
Testing shows only positive impacts from adding the constraint – no performance impact.
NOTE: For our test system I expect the values to almost always be unique, so I didn’t test inserting non-unique values, in this data it truly is an exception – and not something we need to be performant.