归档—监控ORACLE数据库告警日志

时间:2022-05-22 16:01:14

ORACLE的告警日志里面包含许多有用的信息,尤其是一些ORACLE的ORA错误信息,所以有必要及时归档、监控数据库告警日志的ORA错误,及时提醒数据库管理员DBA处理这些错误信息,那么我们首先来看看告警日志的内容片断:

Thread 1 advanced to log sequence 37749 (LGWR switch)
Current log# 6 seq# 37749 mem# 0: /u01/oradata/SCM2/redo06.log
Thu Jun 27 15:02:30 2013
Thread 1 advanced to log sequence 37750 (LGWR switch)
Current log# 2 seq# 37750 mem# 0: /u01/oradata/SCM2/redo02.log
Thu Jun 27 15:13:43 2013
Thread 1 advanced to log sequence 37751 (LGWR switch)
Current log# 3 seq# 37751 mem# 0: /u01/oradata/SCM2/redo03.log
Thu Jun 27 15:25:30 2013
Thread 1 advanced to log sequence 37752 (LGWR switch)
Current log# 4 seq# 37752 mem# 0: /u01/oradata/SCM2/redo04.log
Thu Jun 27 15:32:20 2013
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/admin/SCM2/bdump/scm2_s001_14052.trc.
Thu Jun 27 15:35:05 2013
Thread 1 advanced to log sequence 37753 (LGWR switch)
Current log# 5 seq# 37753 mem# 0: /u01/oradata/SCM2/redo05.log
Thu Jun 27 15:43:11 2013
Thread 1 advanced to log sequence 37754 (LGWR switch)
Current log# 1 seq# 37754 mem# 0: /u01/oradata/SCM2/redo01.log
Thu Jun 27 15:49:58 2013
Thread 1 advanced to log sequence 37755 (LGWR switch)
Current log# 6 seq# 37755 mem# 0: /u01/oradata/SCM2/redo06.log
Thu Jun 27 16:01:25 2013
Thread 1 advanced to log sequence 37756 (LGWR switch)
Current log# 2 seq# 37756 mem# 0: /u01/oradata/SCM2/redo02.log
Thu Jun 27 16:12:14 2013
Thread 1 advanced to log sequence 37757 (LGWR switch)
Current log# 3 seq# 37757 mem# 0: /u01/oradata/SCM2/redo03.log
Thu Jun 27 16:24:10 2013
Thread 1 advanced to log sequence 37758 (LGWR switch)

归档告警日志文件

告警日志文件如果不加管理的话,那么文件会持续增长,有时候文件会变得非常大,不利于读写。一般建议将告警日志按天归档,归档文件保留三个月(视情况而定),下面来看看将告警日志文件归档的两个Shell脚本:

alert_log_archive.sh version 1
  1. #*************************************************************************
  2. #  FileName     :alert_log_archive.sh
  3. #*************************************************************************
  4. #  Author       :Kerry
  5. #  CreateDate   :2013-07-02
  6. #  blogs       :www.cnblogs.com/kerrycode
  7. #  Description  :this script is made the alert log archived every day
  8. #*************************************************************************
  9. #! /bin/bash
  10. date=`date +%Y%m%d`
  11. alert_log_path="$ORACLE_BASE/admin/$ORACLE_SID/bdump"
  12. alert_log_file="alert_$ORACLE_SID.log"
  13. alert_arc_file="alert_$ORACLE_SID.log""."${date}
  14. cd ${alert_log_path};
  15. if [ ! -e "${alert_log_file}" ]; then
  16. echo "the alert log didn't exits, please check file path is correct!";
  17. exit;
  18. fi
  19. if [ -e ${alert_arc_file} ];then
  20. echo "the alert log file have been archived!"
  21. else
  22. cat ${alert_log_file} >> ${alert_arc_file}
  23. cat /dev/null > ${alert_log_file}
  24. fi

其实脚本1和脚本差别不大,仅仅是mv与cat >>的区别

alert_log_archive.sh version 2
  1. #*************************************************************************
  2. #  FileName     :alert_log_archive.sh
  3. #*************************************************************************
  4. #  Author       :Kerry
  5. #  CreateDate   :2013-07-02
  6. #  blogs       :www.cnblogs.com/kerrycode
  7. #  Description  :this script is made the alert log archived every day
  8. #*************************************************************************
  9. #! /bin/bash
  10. date=`date +%Y%m%d`
  11. alert_log_path="$ORACLE_BASE/admin/$ORACLE_SID/bdump"
  12. alert_log_file="alert_$ORACLE_SID.log"
  13. alert_arc_file="alert_$ORACLE_SID.log""."${date}
  14. cd ${alert_log_path};
  15. if [ ! -e "${alert_log_file}" ]; then
  16. echo "the alert log didn't exits, please check file path is correct!";
  17. exit;
  18. fi
  19. if [ -e ${alert_arc_file} ];then
  20. echo "the alert log file have been archived!"
  21. else
  22. mv ${alert_log_file}  ${alert_arc_file}
  23. cat /dev/null > ${alert_log_file}
  24. fi

然后在crontab定时任务里面加上下面语句,每天23点59对告警日志进行归档。

[oracle@DB-Server scripts]$ crontab -l

# the alert log archived every day                    Add by kerry 2013-07-02

59 23 * * * /home/oracle/scripts/alert_log_archive.sh >/dev/null 2>$1

细心的朋友可能已经发现上面的脚本、配置错误了,我在部署测试的过程中,是指定二十分钟执行一次,但是等了四十分钟,发现定时任务一次都没有执行,手工执行上面脚本是完全没有问题的,最后仔细的检查一遍,居然发现悲剧的发现时自己一时粗心将&符号写成了$,真是很二的一个错误

59 23 * * * /home/oracle/scripts/alert_log_archive.sh >/dev/null 2>$1

59 23 * * * /home/oracle/scripts/alert_log_archive.sh >/dev/null 2>&1

接下来测试发现脚本执行有问题,在crontab 里执行该shell脚本时,获取不到ORACLE的环境变量,这是因为crontab环境变量问题,Crontab的环境默认情况下并不包含系统中当前用户的环境。所以,你需要在shell脚本中添加必要的环境变量的设置,修改的脚本如下:

alert_log_archive.sh V1
  1. #*************************************************************************
  2. #  FileName     :alert_log_archive.sh
  3. #*************************************************************************
  4. #  Author       :Kerry
  5. #  CreateDate   :2013-07-02
  6. #  blogs       :www.cnblogs.com/kerrycode
  7. #  Description  :this script is made the alert log archived every day
  8. #*************************************************************************
  9. #! /bin/bash
  10. # these solved the oracle variable problem.
  11. export ORACLE_SID=gps
  12. export ORACLE_BASE=/u01/app/oracle
  13. date=`date +%Y%m%d`
  14. alert_log_path="$ORACLE_BASE/admin/$ORACLE_SID/bdump"
  15. alert_log_file="alert_$ORACLE_SID.log"
  16. alert_arc_file="alert_$ORACLE_SID.log""."${date}
  17. cd ${alert_log_path};
  18. if [ ! -e "${alert_log_file}" ]; then
  19. echo "the alert log didn't exits, please check file path is correct!";
  20. exit;
  21. fi
  22. if [ -e ${alert_arc_file} ];then
  23. echo "the alert log file have been archived!"
  24. else
  25. cat ${alert_log_file} >> ${alert_arc_file}
  26. cat /dev/null > ${alert_log_file}
  27. fi
alert_log_archive.sh V2
  1. #*************************************************************************
  2. #  FileName     :alert_log_archive.sh
  3. #*************************************************************************
  4. #  Author       :Kerry
  5. #  CreateDate   :2013-07-0
  6. #  blogs       :www.cnblogs.com/kerrycode
  7. #  Description  :this script is made the alert log archived every day
  8. #*************************************************************************
  9. #! /bin/bash
  10. # these solved the oracle variable problem.
  11. export ORACLE_SID=gps
  12. export ORACLE_BASE=/u01/app/oracle
  13. date=`date +%Y%m%d`
  14. alert_log_path="$ORACLE_BASE/admin/$ORACLE_SID/bdump"
  15. alert_log_file="alert_$ORACLE_SID.log"
  16. alert_arc_file="alert_$ORACLE_SID.log""."${date}
  17. cd ${alert_log_path};
  18. if [ ! -e "${alert_log_file}" ]; then
  19. echo "the alert log didn't exits, please check file path is correct!";
  20. exit;
  21. fi
  22. if [ -e ${alert_arc_file} ];then
  23. echo "the alert log file have been archived!"
  24. else
  25. mv ${alert_log_file}  ${alert_arc_file}
  26. cat /dev/null > ${alert_log_file}
  27. fi

监控告警日志文件

接下来看看如何监控告警日志文件的ORA错误,这里是采用Perl结合Shell的方式,因为Shell获取错误的时间、行数等不如Perl操作字符串方便。

monitoring_alert_log.pl
  1. #**********************************************************************************
  2. #       FileName         :monitoring_alert_log.pl
  3. #**********************************************************************************
  4. #       Author           :Kerry
  5. #       CreateDate       :2013-07-01
  6. #       blogs           :www.cnblogs.com/kerrycode
  7. #       Description      :check the alert log and find out the ora error
  8. #**********************************************************************************
  9. #    Modified Date    Modified User     Version   Modified Reason
  10. #    2013-07-02         Kerry          V01.0.1    add comment for this script
  11. #***********************************************************************************
  12. #! /usr/bin/perl
  13. use strict;
  14. my($argv) = @ARGV;
  15. if ( @ARGV != 1)
  16. {
  17. print '
  18. Parameter error:  you must assined the alert log file as a input parameter or the number of prarameter is not right.
  19. ';
  20. exit
  21. }
  22. if( ! -e $argv )
  23. {
  24. print '
  25. Usage: monitoring_alert_log.pl
  26. $ cat alert_[sid].log | monitoring_alert_log.pl
  27. $ tail -f alert_[sid].log | monitoring_alert_log.pl
  28. $ monitoring_alert_log.pl alert_[sid].log
  29. ';
  30. exit;
  31. }
  32. my $err_regex = '^(\w+ \w+ \d{2} \d{2}:\d{2}:\d{2} \d{4})|(ORA-\d+:.+)$';
  33. my $date = "";
  34. my $line_counter = 0;
  35. while ( <> )
  36. {
  37. $line_counter++;
  38. if( m/$err_regex/oi )
  39. {
  40. if ($1)
  41. {
  42. $date = $1;
  43. next;
  44. }
  45. print "$line_counter | $date | $2 \n" if ($2);
  46. }
  47. }
monitoring_alert_log.sh
  1. #**********************************************************************************
  2. #    FileName     :            monitoring_alert_log.sh
  3. #**********************************************************************************
  4. #    Author       :            Kerry
  5. #    CreateDate   :            2013-07-01
  6. #    blogs       :            www.cnblogs.com/kerrycode
  7. #    Description:            check the alert log and find out the ora error
  8. #**********************************************************************************
  9. #    Modified Date    Modified User  Version             Modified Reason
  10. #   2013-07-02          Kerry        V01.0.1             add comment and modified script
  11. #***********************************************************************************
  12. #!/bin/bash
  13. # these solved the oracle variable problem.
  14. export ORACLE_SID=gsp
  15. export ORACLE_BASE=/u01/app/oracle
  16. logfile="/home/oracle/scripts/alter_err_log.txt"
  17. pl_monitoring_alert="/home/oracle/scripts/monitoring_alert_log.pl"
  18. pl_sendmail="/home/oracle/scripts/sendmail.pl"
  19. alert_logfile="$ORACLE_BASE/admin/$ORACLE_SID/bdump/alert_$ORACLE_SID.log"
  20. #delete the old alter error log file
  21. rm -f${logfile}
  22. rm -f${pl_sendmail}
  23. #run the perl and check if exists the ora error
  24. perl ${pl_monitoring_alert} ${alert_logfile}> ${logfile}
  25. #if have no error in alert log then exit the program
  26. if [[ -e "${logfile}"  &&  ! -s "${logfile}"  ]]; then
  27. exit;
  28. fi
  29. date_today=`date +%Y_%m_%d`
  30. subject="Monitoring the Oracle Alert logs and find ora errors"
  31. content="Dear All,
  32. The Instance ${ORACLE_SID}\' alert log occured the ora errors ,please see the detail in attachment and take action for it. many thanks!
  33. Oracle Alert Services
  34. "
  35. echo "#!/usr/bin/perl" >> ${pl_sendmail}
  36. echo "use Mail::Sender;" >> ${pl_sendmail}
  37. echo "\$sender = new Mail::Sender {smtp => '10.xxx.xxx.xxx', from => 'xxxx@xxxx.com'}; ">> ${pl_sendmail}
  38. echo "\$sender->MailFile({to => 'kerry@xxxxx.com',">> ${pl_sendmail}
  39. echo "cc=>'konglb@esquel.com'," >> ${pl_sendmail}
  40. echo "subject => '$subject',">> ${pl_sendmail}
  41. echo "msg => '$content',">> ${pl_sendmail}
  42. echo "file => '$logfile'});">> ${pl_sendmail}
  43. perl ${pl_sendmail}

*/20 6-21 * * * /home/oracle/scripts/monitoring_alert_log.sh  >/dev/null 2>&1

问题/优化脚本:Crontab 定时任务配置每二十分钟执行一次,结果,又有麻烦事情来了,假如8点发生了ORA错误,之后到下午6点都没有发生ORA错误,上面的脚本会每隔二十分钟发送一次邮件,重复发送,感觉比较烦人,而我需要的是:只有当新的ORA错误出现,才给DBA发送邮件,否则就不要发送,其次,感觉二十分钟的时间段太长了,如果出现了严重错误,二十分钟后才去处理,就显得时延比较滞后,但是如果你频率短的话, 基于第一个bug,你回收到N多邮件,那么我们继续改写,优化下面脚本吧

Code Snippet
  1. #****************************************************************************************************
  2. #       FileName         :monitoring_alert_log.sh
  3. #****************************************************************************************************
  4. #       Author           :Kerry
  5. #       CreateDate       :2013-07-01
  6. #       Description      :check the alert log and find out the ora error
  7. #****************************************************************************************************
  8. #       Modified Date  Modified User     Version      Modified Reason
  9. #       2013-07-02       Kerry        V01.0.1      add comment and modified script
  10. #       2013-07-02       Kerry        V01.0.2      Solved the email repated send problems, only
  11. #                                                   the new ora error occured then send the email.
  12. #****************************************************************************************************
  13. #!/bin/bash
  14. # these solved the oracle variable problem.
  15. export ORACLE_SID=gsp
  16. export ORACLE_BASE=/u01/app/oracle
  17. new_log_file="/home/oracle/scripts/new_err_log.txt"
  18. old_log_file="/home/oracle/scripts/old_err_log.txt"
  19. pl_monitoring_alert="/home/oracle/scripts/monitoring_alert_log.pl"
  20. pl_sendmail="/home/oracle/scripts/sendmail.pl"
  21. alert_logfile="$ORACLE_BASE/admin/$ORACLE_SID/bdump/alert_${ORACLE_SID}.log"
  22. #delete the old alter error log file
  23. #rm -f${new_log_file}
  24. rm -f${old_log_file}
  25. mv ${new_log_file}${old_log_file}
  26. rm -f${pl_sendmail}
  27. #run the perl and check if exists the ora error
  28. perl ${pl_monitoring_alert} ${alert_logfile}> ${new_log_file}
  29. #if have no error in alert log then exit the program
  30. if [[ -e "${new_log_file}"  &&  ! -s "${new_log_file}"  ]]; then
  31. exit;
  32. fi
  33. new_err_num=`cat ${new_log_file} | wc -l`
  34. old_err_num=`cat ${old_log_file} | wc -l`
  35. if [ ${new_err_num} -le ${old_err_num} ]; then
  36. exit
  37. fi
  38. date_today=`date +%Y_%m_%d`
  39. subject="xxx (192.168.xxx.xxx) Monitoring the Oracle Alert logs and find ora errors"
  40. content="Dear All,
  41. The Instance ${ORACLE_SID}\' alert log occured the ora errors ,please see the detail in attachment and take action for it. many thanks!
  42. Oracle Alert Services
  43. "
  44. echo "#!/usr/bin/perl" >> ${pl_sendmail}
  45. echo "use Mail::Sender;" >> ${pl_sendmail}
  46. echo "\$sender = new Mail::Sender {smtp => '10.xxx.xxx.xxx', from => 'xxxx@xxxx.com'}; ">> ${pl_sendmail}
  47. echo "\$sender->MailFile({to => 'kerry@xxxxxx.com',">> ${pl_sendmail}
  48. echo "cc=>'xxxxx@xxxxx.com'," >> ${pl_sendmail}
  49. echo "subject => '$subject',">> ${pl_sendmail}
  50. echo "msg => '$content',">> ${pl_sendmail}
  51. echo "file => '${new_log_file}'});">> ${pl_sendmail}
  52. perl ${pl_sendmail}

但是我在部署过程中,由于环境问题(多台ORACLE服务器,不同的操作系统、不同的环境),发送邮件的部分出现改动,又有下面两个小版本的改动

Code Snippet
  1. #**********************************************************************************
  2. #    FileName       :            monitoring_alert_log.sh
  3. #**********************************************************************************
  4. #    Author         :            Kerry
  5. #    CreateDate     :            2013-07-01
  6. #    Description:            check the alert log and find out the ora error
  7. #***********************************************************************************
  8. #    Modified Date    Modified User   Version                 Modified Reason
  9. #    2013-07-02       Kerry            V01.0.1            add comment and modified script
  10. #   2013-07-02        Kerry            V01.0.2     Solved the email repated send problems, only
  11. #                                                  the new ora error occured then send the email
  12. #***********************************************************************************
  13. #!/bin/bash
  14. new_log_file="/home/oracle/scripts/new_err_log.txt"
  15. old_log_file="/home/oracle/scripts/old_err_log.txt"
  16. pl_monitoring_alert="/home/oracle/scripts/monitoring_alert_log.pl"
  17. email_content="/home/oracle/scripts/sendmail.txt"
  18. alert_logfile="$ORACLE_BASE/admin/$ORACLE_SID/bdump/alert_$ORACLE_SID.log"
  19. #delete the old alter error log file
  20. rm -f${old_log_file}
  21. mv ${new_log_file} ${old_log_file}
  22. rm -f${pl_sendmail}
  23. #run the perl and check if exists the ora error
  24. perl ${pl_monitoring_alert} ${alert_logfile}> ${new_log_file}
  25. #if have no error in alert log then exit the program
  26. if [[ -e "${new_log_file}"  &&  ! -s "${new_log_file}"  ]]; then
  27. exit;
  28. fi
  29. new_err_num=`cat ${new_log_file} | wc -l`
  30. old_err_num=`cat ${old_log_file} | wc -l`
  31. if [ ${new_err_num} -le ${old_err_num} ]; then
  32. exit
  33. fi
  34. date_today=`date +%Y_%m_%d`
  35. subject="Monitoring the Oracle Alert logs and find ora errors"
  36. content="Dear All,
  37. The Instance ${ORACLE_SID}\' alert log occured the ora errors ,please see the detail in attachment and take action for it. many thanks!
  38. The Error is blow :
  39. "
  40. echo 'Content-Type: text/html' > ${email_content}
  41. echo 'To: xxxxx@xxxxx.com' >> ${email_content}
  42. echo ${subject} >> ${email_content}
  43. echo '<pre style="font-family: courier; font-size: 9pt">' >> ${email_content}
  44. echo ${content} >> ${email_content}
  45. cat ${new_log_file} >>${email_content} 2>&1
  46. echo 'Oracle Alert Services' >> ${email_content}
  47. /usr/sbin/sendmail -t -f ${subject} < ${email_content}
  48. rm -f ${email_content}
Code Snippet
  1. #**********************************************************************************
  2. #    FileName     :            monitoring_alert_log.sh
  3. #**********************************************************************************
  4. #    Author         :            Kerry
  5. #    CreateDate :2013-07-01
  6. #    Description:            check the alert log and find out the ora error
  7. #***********************************************************************************
  8. #    Modified Date    Modified User   Version                 Modified Reason
  9. #   2013-07-02        Kerry            V01.0.1            add comment and modified script
  10. #   2013-07-02        Kerry            V01.0.2            Solved the email repated send problems, only
  11. #                                                         the new ora error occured then send the email
  12. #***********************************************************************************
  13. #!/bin/bash
  14. new_log_file="/home/oracle/scripts/new_err_log.txt"
  15. old_log_file="/home/oracle/scripts/old_err_log.txt"
  16. pl_monitoring_alert="/home/oracle/scripts/monitoring_alert_log.pl"
  17. email_content="/home/oracle/scripts/sendmail.pl"
  18. alert_logfile="$ORACLE_BASE/admin/$ORACLE_SID/bdump/alert_$ORACLE_SID.log"
  19. reportname="alert_log_err.txt"
  20. #delete the old alter error log file
  21. rm -f${old_log_file}
  22. mv ${new_log_file} ${old_log_file}
  23. rm -f${pl_sendmail}
  24. #run the perl and check if exists the ora error
  25. perl ${pl_monitoring_alert} ${alert_logfile}> ${new_log_file}
  26. #if have no error in alert log then exit the program
  27. if [[ -e "${new_log_file}"  &&  ! -s "${new_log_file}"  ]]; then
  28. exit;
  29. fi
  30. date_today=`date +%Y_%m_%d`
  31. subject="Monitoring the Oracle Alert logs and find ora errors"
  32. content="Dear All,
  33. The Instance ${ORACLE_SID}\' alert log occured the ora errors ,please see the detail in attachment and take action for it. many thanks!
  34. Oracle Alert Services
  35. "
  36. ( ${content} ; uuencode ${new_log_file} ${reportname} ) | /bin/mail -s ${subject} xxxx@xxxx.com xxxxx@xxx.com
  37. /bin/mail