【译】The Accidental DBA：Troubleshooting Performance

最近重新翻看The Accidental DBA，将Troubleshooting Performance部分稍作整理，方便以后查阅。此篇是Part 2
Part 1：The Accidental DBA：SQL Server Backup
Part 3：The Accidental DBA：Troubleshooting
一、Baselines

网友提供的性能基线的含义：每天使用windows性能计数器定时(周期为一个月，具体需要根据自己的需求)收集服务器硬件信息，然后对硬件信息进行分析统计，计算平均值、最大值、最小值，用来与之后每天硬件信息进行比较，从而快速的估算服务器硬件状态。

之前对基线的理解一直停留在使用Perfmon收集几个计数器，然后拿收集到的数值和网上推荐的数值进行对比，得以判别是否异常。
数据库服务器的基线的作用：
1、帮助你发现在它成为问题之前发生了什么变化
2、允许你主动调整你的数据库
3、允许你使用历史信息解决问题
4、提供用于环境和数据趋势的数据
5、捕获数据，提供给管理层，以及服务器和存储管理员，用于资源和容量规划
基线详细内容可参考：Capturing Baseline Data
二、Tools for On-Going Monitoring
2.1、Performance Monitor and PAL
Performance Monitor(PerfMon)提供了大量关于Windows和SQL Server的配置选项，你可以根据需要对不同的服务器进行调整，或者每次都使用相同的模板。它允许你在指定的时间段内生成一个全面的性能概要，并且你可以实时查看性能。
性能计数器的捕获可参考PerfMon模板；性能监视器文件的分析可借助PAL工具
2.2、SQL Trace and Trace Analysis Tools
客户端(Profiler)获取跟踪可参考：Trace-跟踪高消耗的语句需添加哪些事件；服务器端跟踪可参考：模板-Trace；导出正在运行的跟踪模板可参考：Trace-导出已有的服务器端跟踪
分析跟踪文件的工具有ClearTrace和RML Utilities for SQL Server
对于SQL Server 2012及以上版本，推荐使用Extended Events代替Trace~
2.3、SQLNexus
SQLNexus用于分析SQLDiag和PSSDiag捕获的数据。我们可以自定义SQLDiag和PSSDiag的默认模版：SQLdiag-配置文件-PerfmonCollector、SQLdiag-配置文件-ProfilerCollector、SQLdiag-配置文件-扩展
收集SQLDiag数据过于繁琐，现在很少有人使用SQLNexus工具~
2.4、Essential DMVs for Monitoring
DMVs非常实用，每次需要使用的时候到处查找，得找个时间把这个坑填补一番

sys.dm_os_wait_stats --I want to know what SQL Server is waiting on, when there is a problem and when there isn’t.

sys.dm_exec_requests --When I want to see what’s executing currently, this is where I start.

sys.dm_os_waiting_tasks --In addition to the overall waits, I want to know what tasks are waiting right now (and the wait_type).

sys.dm_exec_query_stats --execution count and resource usage

sys.dm_exec_query_plan --This DMV has cached plans as well as those for queries that are currently executing.

sys.dm_db_stats_properties --I always take a look at statistics in new systems, and when there’s a performance issue, initially just to check when they were last updated and the sample size.

Glenn had a great set of diagnostic queries to use for monitoring and troubleshooting.
三、Are your indexing strategies working?
对于一个全新的系统，可以按照以下三个步骤分析索引信息
step1、是否存在无效的索引
step2、是否存在臃肿和不健康的索引
step3、是否需要添加新索引
3.1、Getting rid of the dead weight
3.1.1、Fully duplicate indexes
如果不知道索引的内部机制，可能会比你想象的更难以识别重复索引。它并不总是简单的col1列上的Index1和col1上的Index2。
在内部，SQL Server会添加列到索引，大多数命令(比如sp_helpindex)不会显示这些内部添加的列。
可以参考Identifying Duplicate Indexes得到重复索引信息。
注意：你可能会打断使用索引提示的应用程序，因此，请当心！通常在删除索引之前最好先禁用一段时间。
3.1.2、Unused Indexes
从未使用的索引和重复索引一样消耗资源。你可以使用sys.dm_db_index_usage_stats获取索引的使用情况，注意在sqlserver2012某些版本，重建索引会清空sys.dm_db_index_usage_stats中此索引的条目
user_updates列只反映语句的数量，不反映影响的行数。例如，我执行以下语句

UPDATE Table SET ColumnX = VALUE

影响10000行，那么表和包含ColumnX列的索引对应的user_updates都会累加1(update/delete/insert 类似)

--数据表、索引参考Identifying Duplicate Indexes(http://www.cnblogs.com/Uest/p/6679504.html)中的测试数据

USE Test

GO

SELECT *

INTO Test.dbo.SalesOrderDetail_IndexUsage

FROM AdventureWorks2008R2.Sales.SalesOrderDetail

GO

--CREATE UNIQUE CLUSTERED INDEX SalesOrderDetail_IndexUsage ON dbo.SalesOrderDetail(SalesOrderDetailID)

CREATE INDEX IX_SalesOrderID1 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid,SalesOrderDetailID) INCLUDE(LineTotal)

--CREATE INDEX IX_SalesOrderID2 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid) INCLUDE(LineTotal)

--CREATE INDEX IX_SalesOrderID3 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid) INCLUDE(SalesOrderDetailID,LineTotal)

--CREATE UNIQUE INDEX IX_SalesOrderID4 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid) INCLUDE(SalesOrderDetailID,LineTotal)

GO

--查看索引使用情况

select o.name,i.index_id,i.name,user_seeks,user_scans,user_lookups,user_updates

from sys.dm_db_index_usage_stats ddus

inner join sys.tables o

on ddus.object_id=o.object_id

inner join sys.indexes i

on ddus.index_id=i.index_id

and ddus.object_id=i.object_id

where database_id = db_id()

and o.name='SalesOrderDetail_IndexUsage'

order by i.index_id

--查询返回12行

select * from SalesOrderDetail_IndexUsage WHERE SalesOrderID=43659

--sys.dm_db_index_usage_stats结果

name    index_id    name    user_seeks    user_scans    user_lookups    user_updates

SalesOrderDetail_IndexUsage    0    NULL    0    0    1    0

SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    1    0    0    0

--更新影响12行

UPDATE SalesOrderDetail_IndexUsage

SET LineTotal=LineTotal*1

WHERE SalesOrderID=43659

--sys.dm_db_index_usage_stats结果

name    index_id    name    user_seeks    user_scans    user_lookups    user_updates

SalesOrderDetail_IndexUsage    0    NULL    0    0    1    1

SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    2    0    0    1

--删除影响12行

DELETE from SalesOrderDetail_IndexUsage

WHERE SalesOrderID=43659

--sys.dm_db_index_usage_stats结果

name    index_id    name    user_seeks    user_scans    user_lookups    user_updates

SalesOrderDetail_IndexUsage    0    NULL    0    0    1    2

SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    3    0    0    2

--插入12行

SET IDENTITY_INSERT dbo.SalesOrderDetail_IndexUsage on

INSERT INTO SalesOrderDetail_IndexUsage

(SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,UnitPrice,UnitPriceDiscount,LineTotal,rowguid,ModifiedDate)

select * from AdventureWorks2008R2.Sales.SalesOrderDetail

WHERE SalesOrderID=43659

SET IDENTITY_INSERT dbo.SalesOrderDetail_IndexUsage OFF

--sys.dm_db_index_usage_stats结果

name    index_id    name    user_seeks    user_scans    user_lookups    user_updates

SalesOrderDetail_IndexUsage    0    NULL    0    0    1    3

SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    3    0    0    3

--注意在sqlserver2012某些版本，重建索引会清空sys.dm_db_index_usage_stats中此索引的条目

ALTER INDEX IX_SalesOrderID1 ON SalesOrderDetail_IndexUsage REBUILD

-- Clean up

--DROP TABLE Test.dbo.SalesOrderDetail_IndexUsage

3.1.3、Similar or semi-redundant indexes
你可能会有一些适合合并的索引
Indexes that have the same key (but possibly different included columns)

Index1: Key = LastName

Index2: Key = LastName, INCLUDE = FirstName

在这种情况下你不"需要"Index1，因为Index1能做的Index2都能做。然而Index2要宽些。因此下面的查询需要更多的I/O

SELECT LastName,COUNT(*) FROM TableName GROUP BY LastName

但是，问题是这个查询有多重要？这个索引使用频率是多少？你可以使用sys.dm_db_index_usage_stats检查索引使用情况。
Indexes that have left-based subsets of other index KEYS

Index1: Key = LastName, FirstName, MiddleInitial

Index2: Key = LastName INCLUDE = SSN

Index3: Key = LastName, FirstName INCLUDE = phone

这种情况下每个索引提供特定使用，然而你会冗余很多数据。如果我们创建一个新索引: LastName,FirstName,MiddleInitial INCLUDE(SSN,phone)
同样，这个新索引比之前的3个索引都要宽，但是这个新索引有更多的用途而且它的总开销更少(只需要维护一个索引，磁盘上只有一个索引，缓存中只有一个索引)。但是，你还是得确定使用窄索引的查询有多重要，以及使用新索引会消耗多少更多的资源。
Index consolidation is a critical step in reducing waste and table bloat but there isn’t a simple answer to every consolidation option.This is another "it depends" case.
3.2、Analyze the health of your existing indexes
在清理重复/未使用/相似索引后，要确保现有的索引是健康的
Make sure your index maintenance routines at indexes on tables AND views
Make sure your index routines use a LIMITED scan if you’re only analyzing avg_fragmentation_in_percent
更多内容可查看Index Maintenance
3.3、Adding more indexes
这是一个棘手的问题。在添加索引方面有很多好/坏的做法。最糟糕的是，大多数人在没有真正全面分析(并正确分析)现有索引的情况下添加索引。
我说正确分析索引的原因是，那些sp_helpindex和SSMS工具会隐藏部分被添加到索引的列。除非你真正了解你的索引，否则你不可能正确的添加新索引，同时合并现有索引。
虽然我强烈建议你把缺失索引作为指南，我希望你记住它们不是完美的：
The missing index DMVs only tune the plan that was executed. If the plan performed a hash join then the index is going to help the hash join. But, it’s unlikely that the join type will change. And, it might be the case that a different index would perform a different join type and the query would be even faster.
缺失索引只是针对每个索引给出最好的索引，但是你得综合考虑，你不可能为每一个查询单独创建一个索引。
缺失索引可能显示已经存在索引，SQL Server的Missing index DMV的 bug可能会使你失去理智
 四、Essential PerfMon counters
可以通过导出/编辑PAL模板，得到性能计数器配置文件。日常收集使用的计数器参考：模板-Perfmon
得到性能监视器文件后，可以先在命令行使用relog命令对其处理，之后再借助PAL工具进行分析~

--relog /?

--列出输入文件中的性能计数器

relog F:\TroubleShooting\Perfmon\SamplePerfmonLog.blg -q -o F:\TroubleShooting\Perfmon\PerfmonCounters.txt -y

--从输入文件中筛选出计数器

relog F:\TroubleShooting\Perfmon\DataCollector01.blg -c "\Memory\Available MBytes" -o F:\TroubleShooting\Perfmon\logfile.blg -y

--截取某段时间内的计数器到新的文件中

relog F:\TroubleShooting\Perfmon\DataCollector01.blg -b // :: -e // :: -o F:\TroubleShooting\Perfmon\logfile.blg -y

--转到csv文件中

relog F:\TroubleShooting\Perfmon\DataCollector01.blg -o F:\TroubleShooting\Perfmon\Counters.csv -f csv

--以2*原采样间隔重新收集

relog F:\TroubleShooting\Perfmon\DataCollector02.blg -b // :: -e // :: -o F:\TroubleShooting\Perfmon\logfile.blg -y -t

An important consideration when looking at performance counters, or any monitoring data from SQL Server for that matter, is that no single data point will tell you the root cause of a problem, if one occurs. For performance counters, you need to look across multiple counters for correlating information to pinpoint the root of problems.
总结
很巧原文提及的部分知识点可以从工具分类中找到，继续搬砖 ●-●

秒客网

【译】The Accidental DBA：Troubleshooting Performance

相关文章