运用索引快速删除aud$表6个月前的审计记录

时间:2022-10-28 22:56:42
文档课题:运用索引快速删除aud$表6个月前的审计记录.
应用场景:最近在生产库发现定时Job执行删除aud$表6个月前的记录需要1.5小时,该表大小30g,总记录数0.12 billion.
真是有点耗时,那么如何缩短该Job的执行时间呢?以下便是相关实验过程.
测试数据库:oracle 11.2.0.4 64位

创建一个含时间类型字段的大表.
SQL> create table aud$_ts as select * from aud$;

Table created.

Elapsed: 00:00:00.03
……(省略若干insert语句)
SQL> insert into aud$_ts select * from aud$_ts

41943040 rows created.

Elapsed: 00:21:15.85
SQL> insert into aud$_ts select * from aud$_ts

83886080 rows created.

Elapsed: 00:42:46.64

SQL> select count(*) from aud$_ts;

COUNT(*)
----------
167772160
查表大小.
SQL> select blocks,bytes/1024/1024 from dba_segments where owner='SYS' and segment_name='AUD$_TS';

BLOCKS BYTES/1024/1024
---------- ---------------
3596928 28101
说明:表AUD$_TS并不符合此处的要求,因为以上方法创建的表AUD$_TS中字段ntimestamp#基本都是重复值.
运用aud$_ts表创建含有时间字段随机产生的大表.
先创建表结构
SQL> create table TBL_TS_NTIMESTAMP(
id number,
user_id varchar2(64) not null,
UPDATE_NTIMESTAMP varchar2(20) not null
);
SQL> commit;
运用aud$_ts往TBL_TS_NTIMESTAMP插入用户id和随机时间.
保证每条记录对应一个随机时间.
SQL> insert into TBL_TS_NTIMESTAMP
select
rownum,
a.userid,
to_char(to_date(TRUNC(DBMS_RANDOM.VALUE(
to_number(to_char(to_date('20220301','yyyymmdd'),'J')),
to_number(to_char(to_date('20221023','yyyymmdd')+1,'J')))),'J')
+DBMS_RANDOM.VALUE(750,3450)/3600,'yyyy-MM-dd HH24:mi:ss')
from AUD$_TS a;
SQL> commit;
说明:以上语句重点是随机产生2022年3月1日到2022年10月23日,且每天登陆的具体时间落在5:00至23:00之间的时间.(该语句消耗时间13:13-14:00)
产生的数据类似如下:
SQL> select * from tbl_ts_ntimestamp where rownum<11

ID USER_ID UPDATE_NTIMESTAMP
---------- -------------------- --------------------
1 SYSTEM 2022-10-22 10:42:16
2 SYSTEM 2022-04-28 13:07:10
3 SYSTEM 2022-06-11 20:04:02
4 SYSTEM 2022-04-26 08:48:06
5 SYSTEM 2022-08-09 19:30:19
6 SYSTEM 2022-04-15 17:34:01
7 SYSTEM 2022-08-18 12:39:09
8 SYSTEM 2022-09-02 05:49:55
9 CTXSYS 2022-08-13 12:24:54
10 CTXSYS 2022-06-16 07:37:52

10 rows selected.

Elapsed: 00:00:00.00
SQL> select count(*) from tbl_ts_ntimestamp;

COUNT(*)
----------
167772160

Elapsed: 00:00:10.93
说明:表tbl_ts_ntimestamp与表aud$_ts记录数一样.
先看下在没有索引的情况下相关查询的执行计划.
SQL> explain plan for
2 select * from tbl_ts_ntimestamp where update_ntimestamp<sysdate-180;

Explained.

Elapsed: 00:00:00.04
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 220502802
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8059K| 245M| 265K (2)| 00:53:11 |
|* 1 | TABLE ACCESS FULL| TBL_TS_NTIMESTAMP | 8059K| 245M| 265K (2)| 00:53:11 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
1 - filter(INTERNAL_FUNCTION("UPDATE_NTIMESTAMP")<SYSDATE@!-180)
13 rows selected.

Elapsed: 00:00:00.14
说明:可以看到是全表扫描.
创建函数索引.
SQL> create index ind_tbl_ts_update_ntimestamp on tbl_ts_ntimestamp (to_date(update_ntimestamp,'yyyy-mm-dd hh24:mi:ss'));

Index created.

Elapsed: 00:07:13.60
查索引与表的大小.
SQL> select bytes/1024/1024 from dba_segments where segment_name='TBL_TS_NTIMESTAMP';

BYTES/1024/1024
---------------
7567

Elapsed: 00:00:00.04
SQL> select bytes/1024/1024 from dba_segments where segment_name='IND_TBL_TS_UPDATE_NTIMESTAMP';

BYTES/1024/1024
---------------
3365

Elapsed: 00:00:00.09
再次查看执行计划
SQL> explain plan for
select * from tbl_ts_ntimestamp where to_date(update_ntimestamp,'yyyy-mm-dd hh24:mi:ss')<sysdate-180;
Explained.

Elapsed: 00:00:00.00
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
Plan hash value: 2303792711
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8059K| 245M| 261K (1)| 00:52:15 |
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_TS_NTIMESTAMP | 8059K| 245M| 261K (1)| 00:52:15 |
|* 2 | INDEX RANGE SCAN | IND_TBL_TS_UPDATE_NTIMESTAMP | 1450K| | 3831 (1)| 00:00:46 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------
2 - access(TO_DATE("UPDATE_NTIMESTAMP",'yyyy-mm-dd hh24:mi:ss')<SYSDATE@!-180)

14 rows selected.

Elapsed: 00:00:00.02
说明:可以看到执行计划为索引范围扫描.
现创建存储过程.
SQL> create or replace procedure clear_TBL_TS_NTIMESTAMP(in_days in number) as
begin
delete sys.TBL_TS_NTIMESTAMP
where to_date(update_ntimestamp,'yyyy-mm-dd hh24:mi:ss') <
sysdate-in_days;
exception when others then null;
end;
/
删除存储过程.
SQL> drop procedure if exists clear_TBL_TS_NTIMESTAMP;
查存储过程.
SQL> select * from user_source
创建Job.
SQL> variable aud_job number;
SQL> begin
dbms_job.submit(:aud_job,'clear_TBL_TS_NTIMESTAMP(180);',sysdate,'TRUNC(sysdate+1)+3/24');
commit;
end;
/
说明:此Job在每天凌晨3点执行删除表TBL_TS_NTIMESTAMP 180天以前的记录.文档并没有将添加索引前后的查询时间进行对比,因为没加索引前的查询时间实在太长,查询出来的时间字段且无序.添加索引后查询速度快很多,查询结果显示时间字段为有序,这也符合索引的特点.
虽然没进行查询时间的对比,不过在生产环境中笔者运用以上测试所涉及的知识将删除aud$表180天之前记录的Job执行时间成功从1.5小时降低到132s.

大表产生随机时间解析.
SQL> insert into TBL_TS_NTIMESTAMP
select
rownum,
a.userid,
to_char(to_date(TRUNC(DBMS_RANDOM.VALUE(
to_number(to_char(to_date('20220301','yyyymmdd'),'J')),
to_number(to_char(to_date('20221023','yyyymmdd')+1,'J')))),'J')
+DBMS_RANDOM.VALUE(750,3450)/3600,'yyyy-MM-dd HH24:mi:ss')
from AUD$_TS a;
文档中随机时间的产生可分步理解.
首先确定2022年3月1日的日期整数.
SQL> select select to_char(to_date('20220301','yyyymmdd'),'J') from dual;

TO_CHAR
-------
2459640
生成指定范围[min_value, max_value]内的随机日期.
SQL> select to_date(2459640+trunc(dbms_random.value(0,365)),'J') from dual;

TO_DATE(2459640+TRU
-------------------
2022-10-05 00:00:00
生成随机日期+时间.
SQL> SELECT to_date(TRUNC(DBMS_RANDOM.VALUE(to_number(to_char(to_date('20220301',
'yyyymmdd'),
'J')),
to_number(to_char(to_date('20221023',
'yyyymmdd') + 1,
'J')))),
'J') + DBMS_RANDOM.VALUE(1, 3600) / 3600 prize_time
FROM dual;

PRIZE_TIME
-------------------
2022-03-30 21:36:12

说明:如此就生成指定时间内的随机时间.