SQL Server Hardware Performance Checklist
Auditing SQL Server Hardware Is An Important Early Step
From this previous article, on using Performance Monitor, you may have identified some potential hardware bottlenecks that are negatively affecting your SQL Server’s performance. In this section, we will take a look at each of the major components of a SQL Server’s hardware, and examine what can be done to help maximize the performance of your hardware.
This portion of the audit will be divided into these major sections:
As part of this audit, you will want to complete the above checklist. As you do, you may find out things about server you were not aware of.
Number of CPUs
This first point is obvious, the more CPUs your SQL Server has, the faster it can perform. The standard edition of SQL Server 2000 can support up to 4 CPUs. The Enterprise version can support up to 32 CPUs, depending on the OS used. Multiple CPUs can be effectively used by SQL Server to boost overall performance.
It is very difficult to estimate the number of CPUs any specific SQL Server-based application will need. This is because each application works differently and is used differently. Experienced DBAs often have a feel for what kind of CPU power an application might need, although until you really test your server’s configuration under realistic conditions, it is hard to really know what is needed.
Because of the difficulty of selecting the appropriate numbers of CPUs to purchase of a SQL Server, you might want to consider the following rules of thumb:
Here’s some potential scenarios:
I could provide many other examples, but the gist of what I am trying to get across is that it is very hard to predict exactly how many CPUs a particular SQL Server-based application will need, and that you should generally purchase a system bigger than you think you will need, because in many cases, usage demands on an application are often underestimated. It is less expensive in the long run to purchase a larger server now (with more CPUs), than to have to replace your entire server in 6-12 months because of poor estimates.
Like the number of CPUs, the needed speed of the CPUs you purchase is hard to estimate. Generally speaking, as with the number of CPUs your SQL Server has, purchase the fastest CPUs you can afford. It is better to purchase too large a system than too small a system.
CPU L2 Cache
One of the most common questions I get is “should you purchase a less expensive CPU with a smaller L2 cache, or a more expensive XEON CPU with a larger L2 cache?” What complicates this decision is the fact that you can purchase faster chips with smaller L2 caches than you can of chips that have a large L2 cache. Here’s my rule of thumb:
CPU Audit Checklist
Since this article is about an audit of your current SQL Server’s CPU capability, your focus now should be on whether or not your current servers are experiencing any CPU bottlenecks. As was discussed in the Performance Monitor section of this article, you can use the Performance Monitor to help you identify hardware bottlenecks.
If you are not experiencing currently CPU bottlenecks, then you can skip to the next section on memory. But if your current server is experiencing a CPU bottleneck, and it is bad enough to cause major performance problems, then these are your options to resolving this bottleneck:
Unfortunately, none of these options to deal with CPU bottlenecks are extremely easy to implement, unless of course your company has unlimited money to spend. As a DBA in charge of a SQL Server with a CPU bottleneck, you have many difficult decisions to make, and lots of work ahead of you, especially if your only option, due to a lack of money, is to “reduce the load on your server.”
While server memory is discussed here after first discussing the CPU, don’t think that it is not as important as your server’s CPU. In fact, memory is probably the most important hardware ingredient for any SQL Server, affecting SQL Server’s performance more than any other hardware.
When we are talking about memory, we are referring to physical RAM. Often, the word memory (in the Windows Server world) refers to physical RAM and virtual memory (swap file). This definition is not good for SQL Server because SQL Server is not really designed to use virtual memory, although it can if it has too.
Instead of using the operating system’s combination of physical RAM and virtual memory, SQL Server prefers to stay in physical RAM as much as it can. The reason for this is speed. Data in RAM is much faster to retrieve than data on disk.
When SQL Server can’t keep all of the data it manages in RAM (the SQL Server buffer cache), it accesses disk, similar to the way that the operating system manages virtual memory. But SQL Server’s “caching” mechanism is more sophisticated and faster than what the operating system virtual memory can provide.
The fastest way to find out if your SQL Server has an adequate amount of RAM is to check the SQL Server: Buffer Cache Hit Ratio counter that was discussed in the previous page. If this counter is 99% or higher, then most likely you have enough physical RAM in your SQL Server. If this counter is between 90% and 99%, and if you are happy with your SQL Server’s performance, then you probably have enough physical RAM in your SQL Server. But if you are not satisfied with your server’s performance, then more RAM should be added.
If this counter is less than 90%, the odds are that your SQL Server’s performance is unacceptable (if you are running OLAP, then less than 90% is generally OK), and you will want to add more RAM to your server.
Ideally, the amount of physical RAM in a SQL Server should exceed the size of the largest database on the server. This is not always possible, as many databases are very large. If you are sizing a new SQL Server, and assuming your budget is large enough, try to order your SQL Server with enough RAM to hold the entire size of the projected database. Assuming that your database is 4GB or less, then this isn’t generally too much of a problem. But if your database is larger than (or is expected to grow larger than 4GB) then you may be unable to easily or affordably get more than 4GB of RAM. While SQL Server 2000 Enterprise Edition will support up to 64GB of RAM, there aren’t too many affordable servers that support this much RAM.
Even if your entire database cannot fit into SQL Server buffer cache, SQL Server can still be very fast when it comes time to retrieve data. With a 99% buffer cache hit ratio, this means that 99% of the time the data SQL Server needs is already in cache, and performance will be very fast. For example, I manage one database that is 30GB, but the server only has 4GB of RAM. The buffer cache hit ratio for this server is always over 99.6%. What this means is that in most cases, users don’t access all the data in a database at the same time–only a fraction of it–and that SQL Server has the ability to keep the most used data in cache all the time, so 99% of all requests are met quickly in this particular instance, even though the server has much less physical RAM than the size of the data in the database.
So what does all of this boil down to? If your buffer hit cache ratio is less than 90%, then seriously consider adding more RAM.
After memory, disk storage is often the most important factor affecting SQL Server’s performance. It is also a complicated topic. In this section, I will focus on the “easiest” areas where disk storage performance can be bolstered.
Total Amount of Available Drive Space on Server
While the performance effect isn’t huge, it is important that all of your disk arrays have at least 20% of free space. This is because NTFS (which is the disk format I assume you are using) needs extra space to work efficiently. If the space is not available, then NTFS is not able to function at its full capacity and performance can degrade. It also leads to more disk fragmentation, which causes the server to work harder to read and write data.
Take a look at each of the physical disks in your SQL Server, checking to see if there is at least 20% or more of free space. If there isn’t, then consider trying:
Total Number of Physical Drives in Each Array
A disk array generally refers to two or more physical disk drives working together as a single unit. For example, a RAID 5 array might have 4 physical drives in it. So why is it important to know how many physical drives are in the one or more arrays in your SQL Server?
With the exception of mirrored arrays (which are two physical drives working together), the more physical drives that are in an array, the faster reads and writes are for that array.
For example, let’s say that I want to purchase a new SQL Server with a RAID 5 array and that I need at least 100MB of available space. Let’s also assume that the vendor has proposed two different array configurations:
Both of these options meet our criteria of providing at least 100MB of RAID 5 disk space. But which array will provide better read and write performance? The answer is the second choice, the 7 18GB drives. Why?
Generally speaking, the more disks that are in an array, the more disk heads there are available to read and write data. SCSI drives, for example, have the ability to read and write data simultaneously. So the more physical drives that there are in an array, the faster data is read or written to the array. Each drive in the array shares part of the workload, and the more, the better. There are some limits to this, depending on the disk controller, but generally, more is better.
So what does this mean to you? After you take a look at the number of arrays you have in your SQL Server, and the number of drives in each array, is it feasible to reconfigure your current arrays to take better advantage of the principal of more is better?
For example, let’s say that your current server has two disk arrays used to store user databases. Each is a RAID 5 array with 3 18GB drives each. In this case, it might be beneficial to reconfigure these two arrays into a single array of 6 18GB drives. Not only would this provide faster I/O, but it would also recover 18GB of hard disk space.
Take a careful look at your current configuration. You may, or may not be able to do much. But if you can, you will be able to see the benefits of your change as soon as you make them.
RAID Level of Array Used for SQL Server Databases
As you probably already know, there are various different types of disk array configurations, called RAID levels. Each has their pros and cons. Here is a brief summary of the most commonly used RAID levels, and how they can be best used in your SQL Server:
Most likely, your current SQL Server configuration does not match the recommendations above. In some cases, you may be able to modify your current array configuration to come closer to what is recommended above, but in most cases, you will probably have to live with what you have until you get a new budget for a new server and array.
If you can only do one of the above recommendations, I would recommend that you move to RAID 10 over the other options. This option, above all others listed above, will give you the greatest overall boost in SQL Server I/O performance.
Hardware vs. Software RAID
RAID can be implemented through hardware or software (via the operating system). There is no debate on this topic, don’t ever user software RAID, it is very slow. Always use hardware RAID.
Disk Fragmentation Level
If you create a new database on a brand new disk array, the database file and transaction log file created will be one contiguous file. But if your database or transaction log grows in size (and what database and transaction log doesn’t), it is possible for the files to become fragmented over time. File fragmentation, which scatters pieces of your files all over a disk array, causes your disk array to work harder to read or write data, hurting disk I/O performance.
As part of your performance audit, you need to find out how defragmented your SQL Server database and transaction logs are. If you have Windows 2000 or 2003, you can use the built-in defragmentation utility to run a fragmentation analysis to see how badly the files are fragmented. If you are running Windows NT Server 4.0, then you will have to use a third party utility, such as Diskeeper from Executive Software, to perform the analysis.
If the analysis recommends that you defragment, you should. Unfortunately, defragmenting a SQL Server’s database and transaction log files is not always an easy task. Open files, such as those database and transaction log files found on a running SQL Server, cannot always be defragmented. For example, the built-in defragmentation utility cannot defrag SQL Server MDF and LDF files, but Diskeeper 8.0 can in many cases, but not all. This means, than is some cases, you may have to bring SQL Server offline in order to defrag MDF and LDF files. And depending on how fragmented the files are, and the size of the files, this could take many hours.
But do you really have much choice about defragmenting your SQL Server files? If your I/O performance is currently adequate, then you shouldn’t bother defragmenting. But if your I/O performance is a bottleneck, then defragmenting is one inexpensive way of boosting performance, albeit a time consuming one in many cases.
Ideally, you should periodically defragment your SQL Server database and transaction log files. This way, you can ensure that you don’t experience any I/O performance issues because of this very common problem.
Location of the Operating System
For best performance, operating system files should be on a disk array that does not include the SQL Server data files (MDBs or LDFs). In addition, they should be located on a disk array that supports either RAID 1, 5, or 10.
Generally, I install, as most people do, the operating system on drive C: of the server. I usually configure drive C: as a RAID 1 mirrored drive for both fault tolerance and best overall performance.
In most cases, as long as you don’t locate the operating system on the same array as SQL Server data files, you have great flexibility in placing operating system files on your server.
Location of SQL Server Executables
The location of the SQL Server executables (binaries), like the location of the operating system files, are not critical, as long as they are not located on the same array as the SQL Server data files. As with operating system files, I generally place SQL Server executables on drive C:, which is generally configured as a RAID 1 mirrored drive.
If you are building a SQL Server 7.0 cluster, then the SQL Server executables cannot be located on drive C:, but instead must be located on a shared array. Unfortunately, this is often the same array that you store the SQL Server data files, unless you have a lot of money to spend on a separate array just for the executables. While performance is somewhat hindered by locating the executables on the same shared array as the data files, it is not too bad a compromise, given the fault tolerance you are getting in return. On the other hand, this is a good reason to upgrade to SQL Server 2000 clustering. If you are building a SQL Server 2000 cluster, then the SQL Server executables have to be located on local drives, not the shared array, so performance is not an issue.
Location of Swap File
Assuming that your SQL Server is a dedicated SQL Server, and that SQL Server memory usage has been set to dynamic (the default), the swap file won’t see a lot of activity. This is because SQL Server doesn’t normally use it a lot. Because of this, it is not critical that the swap file be located in any particular location, except you don’t want to locate it on the same array as SQL Server data files.
Generally, I place the swap file on the same array as the operating system and SQL Server executables, which I have indicated earlier, is a disk array that supports RAID 1, RAID 5, or RAID 10. This is usually drive C:. This makes administration much easier.
If your SQL Server is a shared server, running applications other than SQL Server, and paging is an issue (due to the other applications), you might want to consider moving the swap file to its own dedicated array for better performance. But better yet, make SQL Server a dedicated server.
Location of the tempdb Database
If your tempdb database is heavily used, consider moving it to an array of its own, either RAID 1 or RAID 10, to boost disk I/O performance. Avoid RAID 5 arrays as they can be slow when writing data, a common side-effect of using tempdb. If you can’t locate the tempdb on its own array, and you want to avoid locating it on the same array as your database files, consider locating it on the same drive as the operating system. This will help to reduce overall I/O contention and boost performance.
If your application uses the tempdb database a lot, and causes it to grow larger than its default size, you may want to permanently increase the default size of the tempdb file to a size closer to what is actually used by your application on a day-to-day basis. This is because every time the SQL Server service (mssqlserver) is restarted, the tempdb file is recreated to the default size. While the tempdb file can grow, it does take some resources to perform this task. By having the tempdb file at the correct size when SQL Server is restarted, you don’t have to worry about the overhead of it growing during production.
In addition, heavy activity in the tempdb database can drag down your application’s performance. This is especially true if you create one or more large temp tables and then are querying or joining them. To help speed these queries, be sure the AUTOSTATS database option is turned on for tempdb, and then create one or more indexes on these temp tables that can be used by your query. In many cases, you will find that this can substantially speed up your application. But like many performance tips, be sure you test this one to see if it actually helps in your particular situation.
Location of System Databases
The system databases (master, msdb, model) don’t experience a lot of read and write activity, so locating them on the same array as your SQL Server data files is generally not a performance issue. The only exception might be for very large databases with hundreds or thousands of users. In this case, putting them on their own array can help boost overall I/O performance somewhat.
Location of User Databases
For best performance, user database files (MDBs) should be located on their own array (RAID 1, 5, or 10), separate from all other data files, including log files. If you have multiple large databases on the same SQL Server, consider locating each separate database file(s) on its own array for less I/O contention.
Location of Log Files
Ideally, each log file should reside on its own separate array (RAID 1 or 10, RAID 5 will slow down transaction log writes more than you would like). The reason for this is because most of the time, transaction logs experience sequential writes, and if the array can write the data sequentially (not having to interrupt itself to perform other reads and writes), then sequential writes are very fast. But if the array can’t write sequentially because it has to random perform other reads and writes, sequential writes can’t be performed, and performance suffers.
Of course, having a separate array for each log file is expensive, and often can’t be cost justified. At the very least though, locate all log files on an array (RAID 1 or RAID 10) other than the array used for database files. While sequential write performance won’t be as good as if each log file had its own array, it is still much better than trying to contend for disk I/O with data files.
Number of Disk Controllers in Server
A single disk controller, whether is it is SCSI or fibre, has a maximum limit on its throughput. Because of this, you will want to match the number of disk controllers to the amount of data throughput you expect. As each controller is do different, I can’t recommend specific solutions, other than to say that at a very minimum, you will want two disk controllers. One controller should be used for non-hard disk devices, such as the CD-ROM, backup devices, and so on. And the other controller would be used for hard disk. The goal is not to attach both slow and fast devices on the same controller.
Quite often, you see this scenario, which is a good one. One controller is for non-hard disk devices, one controller is used for a RAID 1 local hard disk, and a third (and sometimes more) is used for arrays that hold SQL Server database files and logs. Be sure you don’t attach more drives to a controller than it can handle. While it may work, performance will suffer.
Type of Disk Controllers in Server
Always purchase the fastest disk controller you can afford, assuming you want the best SQL Server performance. As you may know, different disk controllers have different performance characteristics. For example, there are different types of SCSI, such as Wide SCSI, Narrow SCSI, Ultra SCSI, and so on. The same is true, although to a less degree, of fibre connections.
Because of the wide variety of controllers, I can’t recommend any specific ones. Generally, a hardware vendor will offer several models to choose from. Ask about the performance benefits of each one, and get the one that offers the best throughput.
Size of Cache in Disk Controllers in Server
Also, when you purchase a disk controller, consider how much disk cache it has. Some disk controllers allow you to add extra disk cache. Generally, you will want to purchase as much disk cache as your controller can hold. SQL Server is very I/O intensive, and anything we can do to boost I/O performance, like employing a large disk cache, will help out a lot.
Is Write Back Cache in Disk Controller On or Off?
Disk cache in your disk controller offers two ways to speed access. One is for reads, and the other for writes. Of these, the most important use is for reads, as this is where most disk I/O time is spent in most SQL Server databases. A write back cache, on the other hand, is used to speed up writes, which usually occur less often, relatively speaking. Unfortunately, SQL Server, in most cases, assumes that write back cache is not on, and because of this, write back caching should be turned off on most controllers. If you don’t, it is possible, under certain circumstances, to get corrupted data after SQL Server writes data (once it writes data, it assumes it was written correctly), but for some reason (such as a loss of power) the write back cache does not write the data to disk.
While there are some controllers that offer battery backup to help prevent such issues, they don’t always work as expected. Personally, I prefer non-corrupt data (written more slowly) than corrupt data (that was written much faster). In other words, I recommend turning write back catching off on your disk controller, even though you might suffer a very small write performance hit by doing so.
Speed of Disk Drives
The disk drives that come in your arrays can often be purchased with different speeds. As you might expect, for best performance, always purchase the fastest disks you can. Generally, this is 15,000 RPM or faster. In addition, don’t mix and match drives of different speeds in the same array. If you do, performance will suffer.
How Many Network Cards Are in Your Server?
Fortunately, network traffic to and from a SQL Server is generally not a bottleneck, and a single network card is often more than adequate. But if you find that network traffic is a problem (you have hundreds or thousands of users) then moving to multiple network cards is justified, and can boost performance. In addition, two or more network cards can add to redundancy, helping to reduce downtime.
What is the Speed of the Network Cards in Server?
At the very minimum, your server should have 100Mbs network cards. Ten megabit cards just don’t offer the bandwidth you need. If one or more 100MBs cards don’t offer enough throughput, then consider gigabit cards. If fact, you might want to skip 100MBs cards altogether and only use gigabit cards instead. Using a faster network card doesn’t speed up network traffic, it only allows more traffic to get through, which in turn allows your server to work at its optimum performance.
Are the Network Cards Hard-Coded for Speed/Duplex?
If you have a dual 10/100 or 10/100/1000 card in a SQL Server that is supposed to auto-sense the network’s speed and set itself accordingly, don’t accept that it has worked correctly. It is fairly common for a network card to auto-sense incorrectly, setting a less than optimum speed or duplex setting, which can significantly hurt network performance. What you need to do is to manually set the card’s speed and duplex setting, this way you know for sure that it has been set correctly.
Are the Network Cards Attached to a Switch?
This may be obvious in a large data center, but for smaller organizations, a hub may still being used to connect server. If so, seriously consider replacing the hub with an appropriate switch, and configure the switch to communicate at its highest possible performance, such as 100MBs and full duplex. Moving from a hub to a switch can make dramatic difference in network performance.
Are All the Hardware Drivers Up-to-Date?
Admittedly, this is a boring topic, but it is more important than you might think. One of the biggest performance hogs (not to leave out causes of strange and unusual problems) are buggy drivers, whether they are found in disk controllers, network, cards, or elsewhere. By using the latest drivers, the odds are that you will be getting a better, faster performing driver, allowing SQL Server to perform at its best.
Regularly, you should be checking to see if newer drivers are available for your hardware, and installing them when you have downtime. I have personally seen radical performance differences by changing from an old, buggy driver to a new one that has been thoroughly debugged and tuned.
Is this Physical Server Dedicated to SQL Server?
I have alluded to this before, but I can’t say it too often. SQL Server should run on a dedicated physical server, not shared with other application software. When you share SQL Server with other software, you force SQL Server to fight over physical resources, and you make it much more difficult to tune your server for optimum SQL Server performance. Time and time again, when I get questions about poor SQL Server performance, I find out that the culprit responsible is another application running on the same server. You just have to learn to say NO.
This has been a long journey so far, but we still have a long way to go. When I first evaluate a SQL Server for performance, and perform a performance audit, I take detailed notes about all of the topics discussed above. I then compare how the server is configured to the ideal configuration, and then look for easy ways to move closer to the ideal configuration. Sometimes this is easy (obvious, easy to correct mistakes have been made), and other times, there is not too much you can do. But you won’t know this if you don’t perform the audit.
Your goal should be to perform the part of the performance audit, described on this page, for each of your SQL Servers, and then use this information to make corrections, if you can. If you can’t, then you can use this information as ammunition for getting new and better hardware.
Once you have completed this part of the performance audit, you are now ready to audit the operating system for potential performance improvements.
*Originally published at SQL-Server-Performance.com
Brad M. McGehee is a full-time DBA with a large manufacturing company, and the publisher of http://www.SQL-Server-Performance.Com, a website specializing in SQL Server performance tuning and clustering.
He is an MVP, MCSE+I, MCSD, and MCT (former).
Brad also runs another website called http://www.WorldClassGear.com It provides independent gear reviews for backpackers, trekkers, and adventure travelers.