使用正则表达式解析日志的问题

时间:2021-12-12 01:23:18

I have tried separating the wowza logs using regex for data analysis, but I couldn't separate the section below.

我尝试使用正则表达式分离wowza日志进行数据分析,但我无法将下面的部分分开。

I need a SINGLE regex pattern that would satisfy below both log formats.

我需要一个单一的正则表达式模式,以满足两种日志格式。

Format 1:

live wowz://test1.example.com:443/live/_definst_/demo01|wowz://test2.example.com:443/live/_definst_/demo01 test

Format 2:

live demo01 test

I am trying to split the line on the 3 parameters and capturing them in the groups app, streamname and id, but streamname should only capture the text after the last /.

我试图在3个参数上拆分线并在组app,streamname和id中捕获它们,但streamname应该只捕获最后一个/后的文本。

This is what I've tried:

这就是我尝试过的:

(?<stream_name>[^/]+)$ --> Using this pattern I could only separate the format 1 "wowz" section. Not entire Format 1 example mentioned above.

(? [^ /] +)$ - >使用此模式我只能将格式1“wowz”部分分开。不是上面提到的整个Format 1示例。

Expected Output

{

"app": [
    [
        "live"
    ]
],
"streamname": [
    [
        "demo1"
    ]
],
"id": [
    [
        "test"
    ]
]

}

1 个解决方案

#1


2  

You can achieve what you specified using the following regex:

您可以使用以下正则表达式实现您指定的内容:

^(?<app>\S+) (?:\S*/)?(?<streamname>\S+) (?<id>\S+)$

regex101 demo


  • \S+ matches any number of characters except whitespace.

    \ S +匹配除空格之外的任意数量的字符。

  • (?:\S*/)? to optionally consume the characters in the second parameter up to the last /. This is not included in the group, so it won't be captured.

    (?:\ S * /)?可选地使用第二个参数中的字符直到最后一个/。这不包括在组中,因此不会被捕获。

#1


2  

You can achieve what you specified using the following regex:

您可以使用以下正则表达式实现您指定的内容:

^(?<app>\S+) (?:\S*/)?(?<streamname>\S+) (?<id>\S+)$

regex101 demo


  • \S+ matches any number of characters except whitespace.

    \ S +匹配除空格之外的任意数量的字符。

  • (?:\S*/)? to optionally consume the characters in the second parameter up to the last /. This is not included in the group, so it won't be captured.

    (?:\ S * /)?可选地使用第二个参数中的字符直到最后一个/。这不包括在组中,因此不会被捕获。