SQL Server Parallel Execution Plans
Microsoft SQL Server introduced parallel query processing capability in version 7.0. The purpose of parallel query execution is to complete a query involving a large amount of data more quickly than possible with a single thread on computers with more than one processor.
Books Online and various Microsoft documents describe the principles of parallel execution and the use of configuration setting that affect parallel execution. However, there is very little explanation on the matter of interpreting execution plans when parallelism is involved. A comparison of several queries with parallel execution plans, to the execution plan with parallelism disabled, provide a means of explaining some of the characteristics of parallel execution plans, including the meaning the of the estimated cost. An examination of actual queries within and without parallel execution plans provides additional guidance on circumstances where parallel execution is beneficial and when it should be disabled.
Parallel Query Processing
Figure 1 below shows a portion of the execution plans for a query with and without parallel operations. The query is a simple SELECT with a WHERE clause search argument (SARG), aggregates in the SELECT list, and a GROUP BY clause. The non-parallel execution plan is forced with the OPTION (MAXDOP 1) clause.
Figure 1. Non-parallel and parallel portions of an execution plan.
Note that the symbols for the common SQL operations–such as Index Seek, Hash Match, and Compute Scalar–have a yellow circle with arrows in the lower right corner when parallel execution is involved for that step. The Parallelism/Gather Streams symbol in Figure 1 is specific to parallel plans. Other symbols specific to parallel plans include Parallelism/Broadcast, Parallelism/Distribute Streams, and Parallelism Repartition Streams.
Figure 2 below shows the left side portion of the two execution plans from Figure 1. The top query with cost 66.65% relative to the batch (Query 1 and Query 2) is the non-parallel execution plan and the bottom query with cost 33.35% is the parallel execution plan.
Figure 2. Non-parallel (top) and parallel (bottom) execution plan relative cost.
Figure 3 below show the detail boxes for the SELECT symbols on the non-parallel (left) and parallel (right) execution plans. The non-parallel plan has a total estimated cost of 121 and the parallel plan has a total estimated cost of 60.7, from which the 66.65% and 33.35% relative costs in Figure 2 are derived.
Figure 3. Non-parallel (left) and parallel (right) plan cost detail.
Nowhere in the public Microsoft documentation for SQL Server is the unit of measure for the execution plan cost explained. In fact, all Microsoft SQL Server documentation on this subject seems to be deliberately vague. Is the execution plan cost some measure of time or processor utilization? If it were processor utilization, is it relative to one processor or all processors? An examination of non-parallel and parallel execution plans provide a hint that the unit of measure is most probably time, and that lower plan costs represents lower execution time, but makes no representation of CPU resources.
Figure 4 below shows the cost details for the two major components of each execution plan discussed above. Figure 4a is the non-parallel Clustered Index Seek and Figure 4b is the parallel plan Clustered Index Seek operation. In both cases the estimated row counts are identical, yet the parallel plan Clustered Index Seek has a cost of 47.1095, approximately one-half the cost of the non-parallel Clustered Index Seek at 94.2190.
Figure 4a. Non-parallel Clustered Index Seek cost details.
Figure 4b. Parallel Clustered Index Seek cost details.
The same pattern is observed in the non-parallel and parallel Hash Match/Aggregate component operations shown in Figure 4c and Figure 4d.
Figure 4c. Non-parallel Hash Match/Aggregate cost details.
Figure 4c. Non-parallel Hash Match/Aggregate cost details.
There is no rational reason to believe that having two or more threads separately process portions of the index seek or hash match operations reduces the total CPU-cycles in processing the entire operation, except under unusual circumstances. In fact, the parallel operation should be more expensive taking into account the cost of merging the partial results from each thread. It is reasonable, however, to expect the parallel operation to complete faster than the non-parallel operation, assuming all required resources are available. The logical conclusion is that the unit of measure for the cost in SQL Server’s execution plan is probably time and not processor utilization.
The execution plan shown seems to be generating parallel execution plans with a cost that represents the time to complete the query based on the availability of two processors, even if more than two processors are available. SQL Server Book Online, in fact, states that the actual degree of parallelism is determined at the time of execution depending on the availability and utilization of resources. If resources are tight, SQL Server may even elect to execute the plan with a single thread. Hence, a parallel plan only indicates that parallel execution is possible and displays the estimated cost, in units of time, based on the use of two processors.
Figure 5 below shows a portion of the parallel execution plan for another query. Figure 6a shows the cost detail for the Clustered Index Scan on the Customer table for the non-parallel execution plan and Figure 6b shows the Customer table Clustered Index Scan cost detail for the parallel execution plan.
Figure 5. Parallel execution plan for second example.
Figure 6a. Non-parallel Clustered Index Scan cost details.
Figure 6b. Parallel Clustered Index Scan cost details.
The non-parallel version is fairly simple to interpret. The estimated IO cost is 2.45 and the estimated CPU cost is 0.165 for a total of 2.61, even though the numbers do not add up exactly. The parallel version is more difficult to interpret. The I/O cost is approximately one-half of the non-parallel IO cost, the CPU is only one-quarter, but the total cost in the parallel plan, 2.529, is only slightly less the non-parallel plan total, 2.651.
In other cases of Table or Index Scans, the I/O cost is the same between non-parallel and parallel versions, but the CPU cost on the parallel plan is one-half that of the non-parallel version. In these cases, the total cost of the parallel plan may be less than the equivalent the non-parallel plan, but more the one-half the non-parallel plan cost. It seems unusual that the execution plan suggests that table and index scans do not derive the same degree of benefit from parallel execution as index seek and hash match operations.
In some queries with parallel execution plans, the cost of the some of the parallelism operators, typically Parallel/Repartition Stream, can be expensive compared to the other operations, such that the overall query actually has a higher plan cost the than non-parallel plan. Yet this does not inhibit the use of the parallel execution plan.
Figure 7 shows another aspect of parallel execution plans. The Nation table at the upper right is a small table and is scanned with a non-parallel operation. The results are then distributed to multiple threads with the Parallelism/Distribute Streams operation.
Figure 7. Parallel execution plan with non-parallel component.
TPC-H Query Analysis on SQL Server
A simple method of investigating parallel execution plans is to use the data and code generators from the TPC-H benchmark. The Transaction Processing Council (tpc.org) provides the source code for the dgen program used to generate the dataset for the TPC-H benchmark and the qgen program to generate queries. Most TPC-H benchmarks published recently involved at least 100GB or much larger datasets. However, even the 1GB dataset is sufficient to generate parallel execution plans on SQL Server. By choosing a dataset small enough to fit in memory, the disk system is not likely to be a bottleneck, allowing for simpler test configurations focused on investigating CPU usage.
Table 1 below shows the execution plan estimated costs for the 22 TPC-H queries with a 1GB (data only) LineItem table of 6 million rows. In this case, 18 of the 22 queries generated a parallel execution plan. In some cases (Query 1 and 6), the parallel execution plan is approximately one-half the cost of the non-parallel plan. In most cases, the parallel plan is just somewhat less expensive than the non-parallel plan, but not by as much as 50% less. In one case, Query 5, the parallel plan is actually more expensive than the non-parallel plan, even though the default plan (no hints) is the parallel plan.
Table 1. Plan Cost for TPC-H queries on 1GB LineItem dataset.
Table 2 shows the best measured run times for SQL Server 2000 and Windows Server 2003 on a 2×2.4GHz Xeon server with Hyper-Threading (HT) enabled, and 2GB memory. Each query was run several times consecutively, so that the dataset should be mostly cached in memory, as evidenced by the minimal disk activity. The reason for choosing the best time from several consecutive runs was to eliminate the query optimization cost, which itself would make for a very interesting investigation. There is occasional variation in run-times, even between the second, third and fourth runs. The run times were captured in SQL Profiler along with CPU time and the degree of parallelism (DOP) which verifies the actual number of threads used.
Table 2. Best measured duration (ms) on 1GB LineItem dataset.
When the DOP is 2, it appears that the operating system or SQL Server knows to use the two different physical processors rather than both logical processors on a single physical processor. It is possible this is achieved entirely by the sequencing of physical and logical processors. At DOP 4, there are only 2 physical processors, so this is not a test of the ability of SQL Server parallel execution to scale to 4 physical processors, but rather a test of the benefit of HT in parallel execution plans.
The blue font in the DOP 2 column shows cases where the query execution time speedup was more than 30%. In all, 7 of the 18 queries with parallel plan showed better than 30% speed-up, while 4 were slower. The sum total execution time for all 18 queries with parallel execution plans was 29% faster than the same 18 queries without parallel execution plans. The queries that were slower were not predicted to be slower by the execution plan, and the query predicted to be slower was in fact faster with parallel execution. It does appear that SQL Server significantly speeds up the parallel execution of table and index scans even though this is not reflected in the plan cost estimate.
In all but one query, CPU utilization was higher with the parallel plan compared to the non-parallel plan. For the 18 queries with parallel plans, the overall average increase in CPU utilization was 52%. For this reason, parallel execution plans are not recommended for applications where very high through-put is desired, typically OLTP application. Parallel plans are very valuable when speed is desired for a single or few users, typically maintenance operations and DSS applications.
In five queries, parallel execution on all logical processors provided further speed improvement, but in five other queries, the use of the additional logical processors degraded performance to worse than both non-parallel execution and parallel execution with the two physical processors. Hyper-threading is clearly a feature with potential to be very useful. However, it should not be used blindly. Hopefully Microsoft will investigate HT very carefully in Yukon and devise clever algorithms to determine when HT should be used and when it should not be used.
The examination of the execution plans for a range of queries definitely suggests that the unit of measure in the plan cost is time rather than CPU usage relative to a single processor. Hence a parallel execution plan may show a lower cost than a non-parallel plan. Furthermore, SQL Server rates the parallel execution of certain operations, such as an index seek and hash match, as being approximately twice as fast with two threads compared to a single thread. For some reason, SQL Server does not rate table and index scans as deriving comparable benefit from parallel execution even though actual parallel query measurements indicate this to be the case.
Parallel execution plans require additional operations that in some cases the additional overhead negates the benefit of a parallel execution plan, but this needs to be determined from actual tests. It has not been determined what the time unit the plan cost represents. A reasonable guess would be that the plan cost is in seconds on a processor that was prevalent at the time SQL Server 7.0 was in development. This could anything from a Pentium 100MHz to a Pentium Pro at 200MHz. This would make sense in that the actual run time was on average 20 times faster on the Xeon 2.4GHz system than represented by the plan cost if the unit of measure was time in seconds.
Published with the express written permission of the author. Copyright 2004 Joe Chang.
|Reference material for this article can be found at :
Joe Chang is a freelance consultant specializing in SQL Server, database architecture, design, performance tuning, and scalability analysis. Joe has more than 12 years experience in software development, including performance and scalability analysis, for microprocessors, server systems and database applications. The materials and tools in this series of articles are available as a 1- or 2-day onsite training course.