使用Grafana监控Lync/Skype服务器(六)

时间:2023-03-28 13:17:54

在上一篇文章中,我们使用Grafana从SQL Server的表中取数,然后通过Dashboard实现了对Lync/Skype服务器的功能状态监控。那么在本篇文章中,我们将继续使用Grafana来实现对Lync/Skype服务器端的报错TOP 10以及用户TOP的监控数据。

如何使用Grafana监控Lync/Skype服务器端报错TOP 10以及用户TOP

Skype服务器报错数据来源

Skype服务器的报错数据来源于后端SQL Server的LcsCDR数据库的ErrorReportViewMsDiagMetaDataView视图。

  • ErrorReportView:主要存储报错的信息
  • MsDiagMetaDataView:主要存储报错的分类

报错主要分为两类:

  • Expected Failure: 即“预期故障”。例如,如果用户已将其状态设置为“请勿打扰”,则对该用户的任何调用都会失败。
  • Unexpected Failure: “意外故障”是指在看似正常的系统中发生的故障。例如,如果呼叫者处于保持状态,则不应终止呼叫。如果发生这种情况,则会被标记为意外故障。

报错信息获取

第一次处理 - 获取报错信息元数据

我们可以通过命令从上文提到的视图ErrorReportView中筛选出我们需要的数据,然后将其导入到对应的SQL表中,示例如下:

使用Grafana监控Lync/Skype服务器(六)

这些主要的字段包括:

  •  ErrorTime : 报错发生时间
  •  MsDiagHeader :报错信息
  •  FromUri:报错from方
  •  ToUri :报错to方
  •  ClientVersion:报错发生的客户端
  •  Source:报错发生的服务器

完整脚本如下:

#Connect to SQL database
Function SQLConnectionWindows
{
    param
    (
        [string]$server,
        [string]$database
 
    )
    $SQLCon = New-Object System.Data.SqlClient.SqlConnection
    $SQLCon.ConnectionString = "Data Source=$server;Initial Catalog=$database;Integrated Security=SSPI;"
 
    try
    {
        $SQLCon.Open()
        return $SQLCon
    }
 
catch [exception]
    {
        Write-Warning ('Connect to database failed with error message:{0}' -f ,$_)
        $SQLCon.Dispose()
        return $null
    }
}


Function SQLConnectionSQL
{
    param
    (
        [string]$server,
        [string]$database,
        [string]$uid,
        [string]$pwd
 
    )
    $SQLCon = New-Object System.Data.SqlClient.SqlConnection
    $SQLCon.ConnectionString = "Data Source=$server;Initial Catalog=$database;User ID=$uid;pwd=$pwd;"
 
    try
    {
        $SQLCon.Open()
        return $SQLCon
    }
 
catch [exception]
    {
        Write-Warning ('Connect to database failed with error message:{0}' -f ,$_)
        $SQLCon.Dispose()
        return $null
    }
}

Function SQLCommand
{
    param
    (
        [System.Data.SqlClient.SqlConnection]$SQLConnection,
        [string]$command
    )
    $dataset = New-Object System.Data.DataSet
    $dataAdapter = New-Object System.Data.SqlClient.SqlDataAdapter($command,$SQLConnection)
    $dataAdapter.Fill($dataset) | Out-Null
    return $dataset.Tables[0]
} 

$SQLServer = 'SkypeBackEndSQL'
$SQLDBName = 'LcsCDR'
$conn = SQLConnectionWindows -server $SQLServer -database $SQLDBName -uid 'Account' -pwd 'password'


$SqlQuery = "
declare @date1 datetime
declare @date2 datetime
set @date1 = (SELECT DATEADD(minute,-5,GETUTCDATE()))
set @date2 = (SELECT GETUTCDATE());

SELECT 
    ErrorTime
    ,MsDiagHeader
    ,FromUri
    ,ToUri
    ,ClientVersion
    ,Source

FROM [LcsCDR].[dbo].[ErrorReportView]

where [ErrorTime] > @date1 and [ErrorTime] < @date2

"


$DataTables = SQLCommand -SQLConnection $conn -command $SqlQuery

$arr = @()
foreach($Data in $DataTables){    
    
    $IsReasonExist = ($Data.MsDiagHeader -split ";") -like  '*reason=*'

    if($IsReasonExist){
        
        $DiagnosticId = $Data.MsDiagHeader.split(";")[0]
        $DiagnosticReason = $IsReasonExist.Trim(" .-`t`n`r").split('"')[1]

        $obj = New-Object PSObject
        $obj | Add-Member -MemberType NoteProperty -Name ErrorTime -Value $Data.ErrorTime
        $obj | Add-Member -MemberType NoteProperty -Name DiagnosticId -Value $DiagnosticId
        $obj | Add-Member -MemberType NoteProperty -Name DiagnosticReason -Value $DiagnosticReason
        $obj | Add-Member -MemberType NoteProperty -Name FromUri -Value $Data.FromUri
        $obj | Add-Member -MemberType NoteProperty -Name ToUri -Value $Data.ToUri
        $obj | Add-Member -MemberType NoteProperty -Name ClientVersion -Value $Data.ClientVersion
        $obj | Add-Member -MemberType NoteProperty -Name Source -Value $Data.Source

        $arr += $obj
    }
}

$Server = 'SqlServer'
$DBName = 'DBName'
$userName = "UserName"
$Cred = 'Password'
$Newconn = SQLConnectionSQL -server $Server -database $DBName -uid $userName -pwd $Cred


#更新至数据库
foreach($i in $arr){
    
    $time = $i.ErrorTime.ToString('yyyy-MM-dd HH:mm:ss.fff')
    $id = $i.DiagnosticId
    $reason = $i.DiagnosticReason
    $fromUri = $i.FromUri
    $toUri = $i.ToUri
    $clientVersion = $i.ClientVersion
    $source = $i.Source
    
    $cmd = "INSERT INTO Skype_EnterpriseVoice_ErrorReportMetaData VALUES ('$time','$id','$reason','$fromUri','$toUri','$clientVersion','$source')"

    SQLCommand -SQLConnection $Newconn -command $cmd
}

第二次处理 - 获取报错分类信息

接下来,我们需要对上文中提到的视图MsDiagMetaDataView中的数据进行处理,以获取到对应的报错分类信息,然后将其导入到对应的SQL表中,示例如下:

使用Grafana监控Lync/Skype服务器(六)

这些主要的字段包括:

  •  MsDiagID: 报错ID
  •  ErrorCategory:报错分类
  •  Type:报错类型
  •  ReasonString:报错原因
  •  Description:报错描述信息

完整脚本如下:

#Connect to SQL database
Function SQLConnectionWindows
{
    param
    (
        [string]$server,
        [string]$database
 
    )
    $SQLCon = New-Object System.Data.SqlClient.SqlConnection
    $SQLCon.ConnectionString = "Data Source=$server;Initial Catalog=$database;Integrated Security=SSPI;"
 
    try
    {
        $SQLCon.Open()
        return $SQLCon
    }
 
catch [exception]
    {
        Write-Warning ('Connect to database failed with error message:{0}' -f ,$_)
        $SQLCon.Dispose()
        return $null
    }
}


Function SQLConnectionSQL
{
    param
    (
        [string]$server,
        [string]$database,
        [string]$uid,
        [string]$pwd
 
    )
    $SQLCon = New-Object System.Data.SqlClient.SqlConnection
    $SQLCon.ConnectionString = "Data Source=$server;Initial Catalog=$database;User ID=$uid;pwd=$pwd;"
 
    try
    {
        $SQLCon.Open()
        return $SQLCon
    }
 
catch [exception]
    {
        Write-Warning ('Connect to database failed with error message:{0}' -f ,$_)
        $SQLCon.Dispose()
        return $null
    }
}

Function SQLCommand
{
    param
    (
        [System.Data.SqlClient.SqlConnection]$SQLConnection,
        [string]$command
    )
    $dataset = New-Object System.Data.DataSet
    $dataAdapter = New-Object System.Data.SqlClient.SqlDataAdapter($command,$SQLConnection)
    $dataAdapter.Fill($dataset) | Out-Null
    return $dataset.Tables[0]
} 

$SQLServer = 'SkypeBackEndSQL'
$SQLDBName = 'LcsCDR'
$conn = SQLConnectionWindows -server $SQLServer -database $SQLDBName -uid 'Account' -pwd 'password'


$SqlQuery = "

SELECT [MsDiagId]
      ,[ErrorCategory]
      ,[Type]
      ,[ReasonString]
      ,[Description]
 FROM [LcsCDR].[dbo].[MsDiagMetaDataView]

"

$DataTables = SQLCommand -SQLConnection $conn -command $SqlQuery

$Server = 'SqlServer'
$DBName = 'DBName'
$userName = "UserName"
$Cred = 'Password'
$Newconn = SQLConnectionSQL -server $Server -database $DBName -uid $userName -pwd $Cred


#更新至数据库
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $conn
$bulkCopy.DestinationTableName = "Skype_EnterpriseVoice_MsDiagMetaDataView"
$bulkCopy.WriteToServer($DataTables)

第三次处理 - 统计报错数量

当我们进行了第一,二次数据处理后,为方便之后的Grafana取数,我们可以将前面两张导入的表关联起来,对数据进行计算以统计出报错的数量信息,示例如下:

使用Grafana监控Lync/Skype服务器(六)

完整的脚本如下:

#Connect to SQL database
Function SQLConnectionSQL
{
    param
    (
        [string]$server,
        [string]$database,
        [string]$uid,
        [string]$pwd
 
    )
    $SQLCon = New-Object System.Data.SqlClient.SqlConnection
    $SQLCon.ConnectionString = "Data Source=$server;Initial Catalog=$database;User ID=$uid;pwd=$pwd;"
 
    try
    {
        $SQLCon.Open()
        return $SQLCon
    }
 
catch [exception]
    {
        Write-Warning ('Connect to database failed with error message:{0}' -f ,$_)
        $SQLCon.Dispose()
        return $null
    }
}

Function SQLCommand
{
    param
    (
        [System.Data.SqlClient.SqlConnection]$SQLConnection,
        [string]$command
    )
    $dataset = New-Object System.Data.DataSet
    $dataAdapter = New-Object System.Data.SqlClient.SqlDataAdapter($command,$SQLConnection)
    $dataAdapter.Fill($dataset) | Out-Null
    return $dataset.Tables[0]
} 

$SQLServer = 'SQLServer'
$SQLDBName = 'DBName'
$userName = "UserName"
$Cred = 'Password'
$conn = SQLConnectionSQL -server $SQLServer -database $SQLDBName -uid $userName -pwd $Cred


$SqlQuery = "
declare @date1 datetime
declare @date2 datetime
declare @date3 datetime
set @date1 = (SELECT DATEADD(minute,-5,GETUTCDATE()))
set @date2 = (SELECT GETUTCDATE());

select          
		  getdate() as InsertDate,
          count(ErrorReport.DiagnosticId) as Sessions,
          ErrorReport.DiagnosticId,
          ErrorReport.DiagnosticReason,
		  Metadata.ErrorCategory
		  --CONVERT(VARCHAR(16),ErrorReport.ErrorTime,120) as ErrorTime
          
FROM 
    [DBName].[dbo].[Skype_EnterpriseVoice_MsDiagMetaDataView] MetaData

inner join 
    [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReport] ErrorReport on MetaData.MsDiagID = ErrorReport.DiagnosticId

where 
    ErrorReport.ErrorTime > @date1 AND ErrorReport.ErrorTime < @date2

group by 
    ErrorReport.DiagnosticId,ErrorReport.DiagnosticReason,MetaData.ErrorCategory
"
$DataTables = SQLCommand -SQLConnection $conn -command $SqlQuery

#更新至数据库
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $conn
$bulkCopy.DestinationTableName = "Skype_EnterpriseVoice_ErrorReportCount"
$bulkCopy.WriteToServer($DataTables)

创建自动化任务计划(Task Schedule)

在上面的步骤中,我们主要使用了3个脚本:

  • 脚本1:从Skype后端数据库获取到报错元数据,存储在SQL表1中。
  • 脚本2:从Skype后端数据库获取到报错分类信息,存储在SQL表2中。- 只导入一次即可
  • 脚本3:关联表1,2,之后对数据进行计算,并存储在SQL表3中。

基于以上信息,我们需要分别创建2个任务计划,以调用前文中的脚本1和3,将数据每隔一定时间导入到SQL表中。由于任务计划在此系列前几篇文章中已经有所展示,将不在此文中赘述。

在Grafana中创建Dashboard

和之前一样,我们需要在Grafana中通过新建Query来生成Dashboard,由于步骤基本相同,在此我们只展示一些关键的查询语句以及图形配置参数:

Dashboard 1: Unexpected Failure Count

select 
		$__timeEpoch(InsertDate),
		Sessions,
		DiagnosticReason
  from [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReportCount]
  where ErrorCategory = 'UnexpectedFailure' and $__timeFilter(InsertDate)

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard 2: Expected Failure Count

select 
		$__timeEpoch(InsertDate),
		Sessions,
		DiagnosticReason
  from [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReportCount]
  where ErrorCategory = 'ExpectedFailure' and $__timeFilter(InsertDate)

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard 3: Event ID Sessions - UnexpectedFailure

select TOP 10
         count(ErrorReport.DiagnosticId) as Sessions,
          ErrorReport.DiagnosticId,
          ErrorReport.DiagnosticReason
FROM 
    [DBName].[dbo].[Skype_EnterpriseVoice_MsDiagMetaDataView] MetaData
inner join 
    [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReport] ErrorReport on MetaData.MsDiagID = ErrorReport.DiagnosticId
where 
    ErrorCategory = 'UnexpectedFailure' 
and
    $__timeFilter(ErrorReport.ErrorTime)
group by 
    ErrorReport.DiagnosticId,ErrorReport.DiagnosticReason
Order by 
    Sessions desc

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard 4: Event ID Sessions - ExpectedFailure

select TOP 10
          count(ErrorReport.DiagnosticId) as Sessions,
          ErrorReport.DiagnosticId,
          ErrorReport.DiagnosticReason
          
FROM 
    [DBName].[dbo].[Skype_EnterpriseVoice_MsDiagMetaDataView] MetaData
inner join 
    [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReport] ErrorReport on MetaData.MsDiagID = ErrorReport.DiagnosticId
where 
    ErrorCategory = 'ExpectedFailure' 
and
    $__timeFilter(ErrorReport.ErrorTime)
group by 
    ErrorReport.DiagnosticId,ErrorReport.DiagnosticReason
Order by 
    Sessions DESC

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard 5: Top From Users

With Temp as 
(select
Count(FromUri) as FromUserCount,ErrorReportMetaData.FromUri
from [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReportMetaData] ErrorReportMetaData

inner join 
    [DBName].[dbo].[Skype_EnterpriseVoice_MsDiagMetaDataView] MsDiagMetaData on ErrorReportMetaData.DiagnosticId = MsDiagMetaData.MsDiagID
where $__timeFilter(ERRORTIME) and MsDiagMetaData.ErrorCategory = 'UnexpectedFailure'

group by FromUri
)
select TOP(10) *
From Temp
where FromUri <> ''
order by FromUserCount desc

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard 6: Top To Users

With Temp as 
(select
Count(ToUri) as ToUserCount,ErrorReportMetaData.ToUri
from [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReportMetaData] ErrorReportMetaData

inner join 
    [DBName].[dbo].[Skype_EnterpriseVoice_MsDiagMetaDataView] MsDiagMetaData on ErrorReportMetaData.DiagnosticId = MsDiagMetaData.MsDiagID
where $__timeFilter(ERRORTIME) 
and MsDiagMetaData.ErrorCategory = 'UnexpectedFailure'
and ErrorReportMetaData.ToUri not like '%FrontendServername%'

group by ToUri
)

select TOP(10) *
From Temp
where ToUri <> ''
order by ToUserCount desc

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard 7: Top from user agents

With Temp as 
(select
Count(ClientVersion) as ClientVersionCount,ErrorReportMetaData.ClientVersion
from [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReportMetaData] ErrorReportMetaData

inner join 
    [DBName].[dbo].[Skype_EnterpriseVoice_MsDiagMetaDataView] MsDiagMetaData on ErrorReportMetaData.DiagnosticId = MsDiagMetaData.MsDiagID
where $__timeFilter(ERRORTIME)
and MsDiagMetaData.ErrorCategory = 'UnexpectedFailure'

group by ClientVersion
)

select TOP(10) *
From Temp
where ClientVersion <> '' and ClientVersion <> 'UCWA'
order by ClientVersionCount desc

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard 8: Top sources

With Temp as 
(select
Count(Sources) as SourcesCount,ErrorReportMetaData.Sources
from [DBName].[dbo].[Skype_EnterpriseVoice_ErrorReportMetaData] ErrorReportMetaData

inner join 
    [DBName].[dbo].[Skype_EnterpriseVoice_MsDiagMetaDataView] MsDiagMetaData on ErrorReportMetaData.DiagnosticId = MsDiagMetaData.MsDiagID
where $__timeFilter(ERRORTIME) 
and MsDiagMetaData.ErrorCategory = 'UnexpectedFailure'
and ErrorReportMetaData.Sources <> ''
and ErrorReportMetaData.Sources not like '%not available%' 
and ErrorReportMetaData.Sources not like 'sip.contoso.com'

group by ErrorReportMetaData.Sources

)

select *
From Temp
order by SourcesCount desc

使用Grafana监控Lync/Skype服务器(六)

使用Grafana监控Lync/Skype服务器(六)

Dashboard最终效果展示

同当我们将以上Dashboard整合后,最终效果示例如下:

使用Grafana监控Lync/Skype服务器(六)


本章总结

在此篇文章中,我们了解了:

  • 如何获取Skype服务器报错信息
  • 如何对数据进行二次处理并导入到SQL Server表
  • 如何通过Grafana展示报错信息

我们搜集报错信息的好处主要有:

  • 可以对主要的报错信息搜集并集中处理,以发现潜在的风险。
  • 当某些报错大量出现时,我们可以先于用户报障,来发现并解决问题。
  • 主动联系用户,获得用户使用反馈,提升用户体验。

本系列文章《使用Grafana监控Lync/Skype服务器》到这一篇就告一段落了。监控作为运维工作中重要的一环,无论您运维的产品是什么,以及使用什么样的产品来实现运维监控,其逻辑都是相通的。也希望这系列文章能够给作为运维工程师的您一些启发。感谢您的阅读。