SQL Server Index fragmentation

Now this was a new one for me…

If you have had to deal with big systems, you know when a table peaks to a couple of million rows and counting, things tend to get a tad slow unless you’re planning on doing some heavy optimizations however, there’s always room to get things to some new extremes when you start handling that much data.

One of the first logical things you check on a database is that you have the correct indexes on each table and if you’re searching over one field, then that field is indexed if it can be so search gets a bit faster. An index is quite the good thing to work with because basically it stores an index (Duh!) of how to access records (rows) on the table, where the numeration on those indexes is precisely that field you chose. This article is a very good and quick way to know how indexes work, if you’re a developer and don’t know, please take 5 minutes of your time to read this, don’t worry, I’ll wait for you…

Back? Very good, give yourself a pat on the back and lets move on. Now, as you may know now indexes are stored in trees and as you add or delete (fundamentally delete) rows you create gaps on the tree that would be way too costly to fix as part of those insert or delete operations and those gaps are called fragmentation. SQL Server being such a fan-fricking-tastic tool as it is (no Microsoft does no pay me) has a way for you to query off a database which of its indexes is fragmented and how fragmented it is.

The key is on the sys.dm_db_index_physical_stats function:

sys.dm_db_index_physical_stats (DB_ID(@'MY_DB'), OBJECT_ID(@'MY_TABLE'), 1, NULL, NULL)

However, that provides information for just one index, so with a bit of SQL Magic, we get this far:

SELECT DISTINCT frag.* FROM
	( SELECT 
			TableName = t.name, IndexName = ind.name, IndexId = ind.index_id, ColumnId = ic.index_column_id, ColumnName = col.name,
			(
				SELECT avg_fragmentation_in_percent
				FROM sys.dm_db_index_physical_stats (DB_ID(NULL), OBJECT_ID(t.Name), ind.index_id, NULL, NULL) AS a
				JOIN sys.indexes AS b 
				ON a.object_id = b.object_id AND a.index_id = b.index_id
				WHERE avg_fragmentation_in_percent >= 0.1
			) AS Fragmentation
	FROM 
			sys.indexes ind 
	INNER JOIN sys.index_columns ic ON  ind.object_id = ic.object_id and ind.index_id = ic.index_id 
	INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id 
	INNER JOIN sys.tables t ON ind.object_id = t.object_id 
	WHERE 
			ind.is_primary_key = 0 
			AND ind.is_unique = 0 
			AND ind.is_unique_constraint = 0 
			AND t.is_ms_shipped = 0
	) frag
ORDER BY frag.Fragmentation DESC

Whoa! That’s a lot bigger isn’t it? Well, yes, because this one will go through all the indexes on our database and call that function and return everything on a nice little table where you’ll know if there are indexes that should take your attention or not.

Defragmenting the bad ones.

Ok, so you know now which index(es) are misbehaving on your database, now we need to defragment them, so for that we’re going to use a simple ALTER statement and be done with it. To make a quick sample, if we had an index called MY_INDEX_I on a table called MY_TABLE the alter statement would be

ALTER INDEX MY_INDEX_I ON MY_TABLE REBUILD

Now this is a bit extreme because it will drop the index stored and will rebuild the whole thing which can potentially take quite some time depending on your data and the amount of rows you have. There is another way to defragment indexes but because it doesn’t work with SQL Azure and this particular investigation was done for a server over that version, I didn’t looked into it, however the call is not too far. On top of that, this whole rebuild is an overkill for any table with more than 10% of fragmentation, but will always work so it’s a very good safe bet.

Now we have a way to do it manually, however, what happens on a production database with maybe 90 or 100 indexes that gets heavily fragmented every few weeks? You just can’t do it manually in case you were wondering. For that we’re going to use a very handy stored procedure called sp_executesql that lets us invoke dynamically created SQL. All we need is a cursor to go through the indexes and then generate the sql to defragment it and voila.  It goes a bit something like this:

 

DECLARE @MySQL NVARCHAR(4000)
DECLARE @Now DATETIME
DECLARE @Message VARCHAR(255)
DECLARE @TableName VARCHAR(255)
DECLARE @IndexName   NVARCHAR(255)
DECLARE @IndexId   INTEGER
DECLARE @ColumnId   INTEGER
DECLARE @ColumnName   NVARCHAR(255)
DECLARE @Fragmentation FLOAT

DECLARE myCursor CURSOR FOR SELECT DISTINCT frag.* FROM
	( SELECT 
			TableName = t.name, IndexName = ind.name, IndexId = ind.index_id, ColumnId = ic.index_column_id, ColumnName = col.name,
			(
				SELECT avg_fragmentation_in_percent
				FROM sys.dm_db_index_physical_stats (DB_ID(NULL), OBJECT_ID(t.Name), ind.index_id, NULL, NULL) AS a
				JOIN sys.indexes AS b 
				ON a.object_id = b.object_id AND a.index_id = b.index_id
				WHERE avg_fragmentation_in_percent >= 0.1
			) AS Fragmentation
	FROM 
			sys.indexes ind 
	INNER JOIN sys.index_columns ic ON  ind.object_id = ic.object_id and ind.index_id = ic.index_id 
	INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id 
	INNER JOIN sys.tables t ON ind.object_id = t.object_id 
	WHERE 
			ind.is_primary_key = 0 
			AND ind.is_unique = 0 
			AND ind.is_unique_constraint = 0 
			AND t.is_ms_shipped = 0
	) frag
ORDER BY frag.Fragmentation DESC
OPEN myCursor

FETCH NEXT FROM myCursor INTO @TableName, @IndexName, @IndexId, @ColumnId, @ColumnName, @Fragmentation

WHILE @@FETCH_STATUS = 0
BEGIN
	SET @Now = GETDATE()
	SET @MySQL ='ALTER INDEX ' + @IndexName + ' ON [' + @TableName + '] REBUILD'

	SET @Message = 'Reindex: ' + @IndexName + ' ON ' + @TableName + ': ' + CONVERT(NVARCHAR(28), @Now, 21) + ' (' + @MySQL + ')'
	PRINT @IndexName + ' ' + @TableName + '(' + CONVERT(NVARCHAR(28), @Fragmentation, 21) + ')'

	EXEC sp_executesql @MySQL

	FETCH NEXT FROM myCursor INTO @TableName, @IndexName, @IndexId, @ColumnId, @ColumnName, @Fragmentation
END

CLOSE myCursor
DEALLOCATE myCursor

I know right? But well, this works like a charm and because it’s fully automated, you can just call it from any of your systems as a cron job and no need to worry about fragmented indexes anymore. Did you tried it? Well, give me a shout and let me know how you got on.

Inspecting SQL Server’s tables

A few days ago, I was working on a small project for autogenerating source code from a SQL Server database, and I bumped into the problem of obtaining from a given server the description of the tables in it. If using MySQL this task is quite simple, however, using SQL Server the problem is not that simple.

Long story short, after a few hours I came up with this “small” piece of code:

SELECT DISTINCT
     sys.tables.object_id AS TableId,
     sys.columns.column_id AS ColumnId,
     sys.columns.name AS ColumnName,
     sys.types.name AS TypeName,
     sys.columns.precision AS NumericPrecision,
     sys.columns.scale AS NumericScale,
     sys.columns.is_nullable AS IsNullable,
     ( 
     SELECT
          COUNT(column_name)
     FROM
          INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
     WHERE
          TABLE_NAME = sys.tables.name AND
          CONSTRAINT_NAME =
          ( 
          SELECT
               constraint_name
          FROM
               INFORMATION_SCHEMA.TABLE_CONSTRAINTS
          WHERE
               TABLE_NAME = sys.tables.name AND
               constraint_type = 'PRIMARY KEY' AND
               COLUMN_NAME = sys.columns.name
          )
     ) AS IsPrimaryKey,
     sys.columns.max_length / 2 AS CharMaxLength
FROM
     sys.columns, sys.types, sys.tables
WHERE
     sys.tables.object_id = sys.columns.object_id AND
     sys.types.system_type_id = sys.columns.system_type_id AND
     sys.types.user_type_id = sys.columns.user_type_id AND
     sys.tables.name = '{0}'
ORDER BY
     IsPrimaryKey

Nice uh?

Basically the whole deal after this code is just inspecting into the sys database, which holds the information about the tables in the database. Obviously the ‘{0}’ should be replaced with the table in question. I place the ‘{0}’ because this code was meant to be running using C#, so, all I needed was a call to string.Format.

After retrieving the data with the tables, it would be nice to inspect the relationships between those tables, uh? Well, that is simpler than you might think, just we need to retrieve the relationships list, again from the same database. SQL code for doing that is this one:

SELECT
     sys.foreign_keys.name AS RelationshipName,
     sys.foreign_keys.object_id AS RelationshipId,
     sys.foreign_keys.parent_object_id AS ParentTableId,
     sys.foreign_keys.referenced_object_id AS ReferencedTableId,
     sys.foreign_key_columns.parent_column_id AS ParentColumnId,
     sys.foreign_key_columns.referenced_column_id AS ReferencedColumnId
FROM
     sys.foreign_keys, sys.foreign_key_columns
WHERE
     object_id = constraint_object_id; 

After having all that data, we are in conditions to generate any source code based on any database inspected from a SQL Server 2k5/2k8 database, which is a cool thing.