正则表达式提取具有其他约束的特定字段

时间:2022-09-13 16:35:19

I have to create a Regex which extract specific field from large log file. I have created one but its not perfect as different type of occurane present in logs.

我必须创建一个从大型日志文件中提取特定字段的正则表达式。我创造了一个,但它不完美,因为日志中存在不同类型的事件。

I have attached screenshot. There are 2 different type of log entry and I want to extract real="value".

我附上截图。有两种不同类型的日志条目,我想提取real =“value”。

The problem is, multiple "real=value" present in extracted line and I want to get only first occurance. +

问题是,在提取的行中存在多个“real = value”,我想只获得第一次出现。 +

My Regex:

CMS-concurrent-abortable-preclean:\s.*real=(?P<cms_abortable_preclean>\d+\.\d+)\s

Screen Shot: Sample data and Regex command

屏幕截图:示例数据和Regex命令

Sample Data:

2017-05-16T13:21:47.420+0200: 5.114: [GC (Allocation Failure) 2017-05-16T13:21:47.420+0200: 5.114: [ParNew2017-05-16T13:21:47.461+0200: 5.155: [CMS-concurrent-abortable-preclean: 0.120/0.735 secs] [Times: user=1.17 sys=0.12, real=0.73 secs] : 886080K->110720K(996800K), 0.3158400 secs] 886080K->161751K(6180736K), 0.3168208 secs] [Times: user=0.33 sys=0.10, real=0.32 secs]

2017-05-16T13:21:47.420 + 0200:5.114:[GC(分配失败)2017-05-16T13:21:47.420 + 0200:5.114:[ParNew2017-05-16T13:21:47.461 + 0200:5.155:[ CMS-并发 - 流产 - 预清洁:0.120 / 0.735秒] [时间:用户= 1.17 sys = 0.12,实际= 0.73秒]:886080K-> 110720K(996800K),0.3158400秒] 886080K-> 161751K(6180736K),0.3168208秒] [时间:用户= 0.33 sys = 0.10,实际= 0.32秒]

1.583: [CMS-concurrent-abortable-preclean: 0.052/0.171 secs] [Times: user=0.20 sys=0.01, real=0.17 secs] CMS: abort preclean due to time 8077.162: [CMS-concurrent-abortable-preclean: 4.850/5.566 secs] [Times: user=5.92 sys=0.02, real=5.57 secs]

1.583:[CMS-并发 - 流产 - 预清洁:0.052 / 0.171秒] [时间:用户= 0.20 sys = 0.01,实际= 0.17秒] CMS:由于时间而中止预清洁8077.162:[CMS-并发 - 流产 - 预清洁:4.850 /5.566秒] [时间:用户= 5.92 sys = 0.02,真实= 5.57秒]

I want to extract fields in bold.

我想以粗体提取字段。

2 个解决方案

#1


0  

You could use a regex like this:

你可以使用这样的正则表达式:

^.*?Times.*?real=(\d+(?:\.\d+))

Working demo

The idea is to capture the first real belonging to Times for each line

这个想法是为每一行捕获第一个属于Times的真实属性

#2


0  

One approach:

(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)

Demo

The first group match: CMS-concurrent-abortable-preclean:
And the second is: (\d+\.\d+)

第一组匹配:CMS-concurrent-abortable-preclean:第二组是:(\ d + \。\ d +)


Fast test:

perl -lne 'print "$1 and $2" while/(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)/g;' file  

the output:

CMS-concurrent-abortable-preclean: and 0.73
CMS-concurrent-abortable-preclean: and 0.17

#1


0  

You could use a regex like this:

你可以使用这样的正则表达式:

^.*?Times.*?real=(\d+(?:\.\d+))

Working demo

The idea is to capture the first real belonging to Times for each line

这个想法是为每一行捕获第一个属于Times的真实属性

#2


0  

One approach:

(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)

Demo

The first group match: CMS-concurrent-abortable-preclean:
And the second is: (\d+\.\d+)

第一组匹配:CMS-concurrent-abortable-preclean:第二组是:(\ d + \。\ d +)


Fast test:

perl -lne 'print "$1 and $2" while/(CMS-concurrent-abortable-preclean:).*?(?=.*:)(?<=real=)(\d+\.\d+)/g;' file  

the output:

CMS-concurrent-abortable-preclean: and 0.73
CMS-concurrent-abortable-preclean: and 0.17