jump to navigation

Collation and SQL Server Data management May 2, 2012

Posted by fofo in Sql Server, Sql Server 2005, SQL Server 2008.
Tags:
trackback

In this rather short post I would like to explain the concept of  collation in SQL Server. This is a concept that stills confuses SQL Server DBAs and developers. I know that is a well documented feature (in books & the web) but I will give it a shot.

In this post I will show you how collation affects the storage and behavior of your character data.Data is global nowadays.That means that at some point we will deal with databases that have data from different languages.That it why we must understand collation. Collation affects various factors.It affects the sort order of the data. It also affects the case and accent sensitivity for the various string comparisons.

We will need an instance of SQL Server running in our machine.You can download and install the free SQL Server Express edition,  here.

During the installation of an SQL Server instance you are prompted to select the collation for the instance.

1) Launch SSMS and connect to an instance of SQL Server.Then select the instance of the SQL Server and select Properties.In the General tab you will see the collation of the server.Have a look at the picture below

2)We can specify the collation at a database level. Select the database and then select Properties and then Options from the popup window.

In that window we can specify the collation for the database .If not it will keep the same collation as the server.

Have a look at the picture below

Please note that we can specify the collation at the column level as well.

You cannot specify collation for variable and parameters.

3) In this first example I will show you how search results are affected in a table that uses the same collation (SQL_Latin1_General) but different case sensitivity and accent settings. I will use the TEMPDB database. In an new query window type and execute the following T-SQL code.


USE tempdb
GO

CREATE TABLE products
(
ID int IDENTITY (1,1),
Product_1 nvarchar(50) COLLATE SQL_Latin1_General_Cp1_CS_AS NULL,
Product_2 nvarchar(50) COLLATE SQL_Latin1_General_Cp1_CI_AS NULL,
Product_3 nvarchar(50) COLLATE SQL_Latin1_General_Cp1_CI_AI NULL
)

INSERT INTO products VALUES ('écrire au crayon','écrire au crayon','écrire au crayon'),
('baseball hat','baseball hat','baseball hat')

-- this will not bring back any results since the collation is case sensitive
SELECT * FROM products WHERE Product_1 ='BASEBALL HAT'

-- this will bring back  results since the collation is case insensitive
SELECT * FROM products WHERE Product_2='BASEBALL HAT'

-- this will bring back  results since the collation is case insensitive
SELECT * FROM products WHERE Product_3='BASEBALL HAT'

-- this will  bring back results since we use case sensitive search
SELECT * FROM products WHERE Product_1 ='baseball hat'

-- this will not bring back any results since the collation is accent sensitive
--note we have ecrire au crayon and not écrire au crayon
SELECT * FROM products WHERE Product_1 ='ecrire au crayon'

--this will fail as well since it is accent sensitive
SELECT * FROM products WHERE Product_2 ='ecrire au crayon'

--this will succeed since it is accent insensitive
SELECT * FROM products WHERE Product_3 ='ecrire au crayon'

--this will succeed since it is accent sensitive
SELECT * FROM products WHERE Product_1 ='écrire au crayon'

In this t-sql snippet we create a table that has 3 columns with same collation (SQL_Latin1_General) but different case sensitivity and accent settings.Then we insert some values. Then I perform some SELECT operations and in some cases I am successful and I get back results but in some other cases I am not. Please have a look at the comments in the code.

4) In this example I will show you why we get errors when we join tables with different collations. I will use the TEMPDB database. In an new query window type and execute the following T-SQL code.


USE tempdb
GO

CREATE TABLE footballers(
[id] [int] IDENTITY(1,1) NOT NULL,
[firstname] [varchar](50) NOT NULL,
[lastname] [varchar](50) NOT NULL,
[isActive] [bit] NULL,
teamcode[varchar](50) NOT NULL
)

INSERT [dbo].[footballers] ( [firstname], [lastname], [isActive],teamcode) VALUES ( N'Robbie', N'Fowler', 1,'Liv')
INSERT [dbo].[footballers] ( [firstname], [lastname], [isActive],teamcode) VALUES ( N'Steven', N'Gerrard', 1,'Liv')
INSERT [dbo].[footballers] ( [firstname], [lastname], [isActive],teamcode) VALUES ( N'Kenny', N'Dalglish', 0,'Liv')
INSERT [dbo].[footballers] ( [firstname], [lastname], [isActive],teamcode) VALUES ( N'John', N'Terry', 0,'Che')

CREATE TABLE Team(

code [varchar](50) NOT NULL,
name [varchar](50)  NOT NULL
)

INSERT Team(code,name) VALUES ('Liv','Liverpool')
INSERT Team(code,name) VALUES ('Che','Chelsea')

SELECT * FROM footballers JOIN Team
ON footballers.teamcode = team.code

CREATE TABLE GreekTeam
(
code [nvarchar](50) COLLATE Greek_100_BIN
NOT NULL ,
name [nvarchar](50) COLLATE Greek_100_BIN
NOT NULL
)

INSERT GreekTeam(code,name) VALUES (N'ΠΑΟ',N'Παναθηναικός')
INSERT GreekTeam(code,name) VALUES (N'ΟΣΦΠ',N'Ολυμπιακός')

SELECT * FROM GreekTeam

--this will fail

SELECT * FROM footballers JOIN GreekTeam
ON footballers.teamcode = GreekTeam.code

I create a table Footballers and I insert some data in it. I create a table called Team and I insert some data in it. Then I execute a JOIN statement and I get back results.

Then I create another table GreekTeam that uses the Greek_100_BIN collation.I insert some values in it.Then I try to execute another JOIN statement between Footballers table and the GreekTeam table. This results in an error, “Cannot resolve the collation conflict between “Greek_100_BIN” and “Latin1_General_CI_AI” in the equal to operation.”

This happens because the two tables do not use the same code page and it can’t compare the two values because they are stored differently.

5) Let’s have a look on another example and find out how collation affects the sorting of data.

In a new query window type the following T-SQL statements and execute them


USE TempDB
CREATE TABLE MyTable
(col1 varchar(10) COLLATE Latin1_General_CS_AS NULL,
col2 varchar(10) COLLATE Latin1_General_BIN)
GO

USE TempDB
INSERT MyTable
VALUES ('Alpha','Alpha')
INSERT MyTable
VALUES ('Bravo','Bravo')
INSERT MyTable
VALUES ('Charlie','Charlie')
INSERT MyTable
VALUES ('alpha','alpha')
INSERT MyTable
VALUES ('bravo','bravo')
INSERT MyTable
VALUES ('charlie','charlie')
GO

SELECT col1
FROM MyTable
ORDER BY col1 ASC

SELECT col2
FROM MyTable
ORDER BY col2 ASC

We create a table with 2 different columns and different collations.Then we insert some values. Finally we execute 2 select statements.When you execute them you will see that the data is returned back to us from the relational engine of SQL Server with different sort order.

Hope it helps!!!

Comments»

1. Dot Net Rules : Collation and SQL Server Data management - May 2, 2012

[…] I know that is a well documented feature (in books & the web) but I will give it a shot. (read more) Share Posted: Τετάρτη, 2 Μαΐου 2012 2:52 πμ από το μέλος […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: