PHP preg_split使用分隔符作为数组键

时间:2021-12-28 03:29:06

I need to split a string by an regex delimiter, but need the delimiter as array key.

我需要通过正则表达式分隔符拆分字符串,但需要将分隔符作为数组键。

Here is an example string:

这是一个示例字符串:

*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times

The delimiter is an asterisk (*) followed by two alphanumeric characters. I use this regex pattern: /\*[A-Z0-9]{2}/

分隔符是星号(*),后跟两个字母数字字符。我使用这个正则表达式模式:/ \ * [A-Z0-9] {2} /

This is my preg_split call:

这是我的preg_split调用:

$attributes = preg_split('/\*[A-Z0-9]{2}/', $line);

This works, but I need each matching delimiter as the key of the value in an associative array.

这是有效的,但我需要每个匹配的分隔符作为关联数组中值的键。

What I get looks like this:

我得到的是这样的:

$matches = [
        0 => 'the title',
        1 => 'the author',
        2 => 'other useless infos',
        3 => 'other useful infos',
        4 => 'some delimiters can be there multiple times'
    ];

It should look like this:

它应该如下所示:

$matches = [
        '*01' => 'the title',
        '*35' => 'the author',
        '*A7' => 'other useless infos',
        '*AE' => [
            'other useful infos',
            'some delimiters can be there multiple times',
        ],
    ];

Has anyone any suggestions on how to achieve this?

有没有人建议如何实现这一目标?

3 个解决方案

#1


2  

Use the PREG_SPLIT_DELIM_CAPTURE flag of the preg_split function to also get the captured delimiter (see documentation).

使用preg_split函数的PREG_SPLIT_DELIM_CAPTURE标志也可以获取捕获的分隔符(请参阅文档)。

So in your case:

所以在你的情况下:

# The -1 is the limit parameter (no limit)
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);

Now you have element 0 of $attributes as everything before the first delimiter and then alternating the captured delimiter and the next group so you can build your $matches array like this (assuming that you do not want to keep the first group):

现在你将$ attributes的元素0作为第一个分隔符之前的所有内容然后交替捕获的分隔符和下一个组,这样你就可以构建你的$ matches数组(假设你不想保留第一个组):

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $matches[$attributes[$i]] = $attributes[$i+1];
}

In order to account for delimiters being present multiple times you can adjust the line inside the for loop to check whether this key already exists and in that case create an array.

为了考虑多次出现的分隔符,您可以调整for循环内的行以检查此键是否已存在,并在这种情况下创建一个数组。

Edit: a possibility to create an array if necessary is to use this code:

编辑:如有必要,可以使用此代码创建数组:

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $key = $attributes[$i];
    if(array_key_exists($key, $matches)){
        if(!is_array($matches[$key]){
            $matches[$key] = [$matches[$key]];
        }
        array_push($matches[$key], $attributes[$i+1]);
    } else {
        $matches[$attributes[$i]] = $attributes[$i+1];
    }
}

The downstream code can certainly be simplified, especially if you put all values in (possibly single element) arrays.

下游代码当然可以简化,特别是如果您将所有值放在(可能是单个元素)数组中。

#2


1  

You may match and capture the keys into Group 1 and all the text before the next delimiter into Group 2 where the delimiter is not the same as the first one captured. Then, in a loop, check all the keys and values and split those values with the delimiter pattern where it appears one or more times.

您可以将键匹配并捕获到组1中,并将下一个分隔符之前的所有文本捕获到组2中,其中分隔符与捕获的第一个分隔符不同。然后,在循环中,检查所有键和值,并使用分隔符模式将这些值拆分一次或多次。

The regex is

正则表达式是

(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)

See the regex demo.

请参阅正则表达式演示。

Details

  • (\*[A-Z0-9]{2}) - Delimiter, Group 1: a * and two uppercase letters or digits
  • (\ * [A-Z0-9] {2}) - 分隔符,第1组:a *和两个大写字母或数字

  • (.*?) - Value, Group 2: any 0+ chars other than line break chars, as few as possible
  • (。*?) - 值,第2组:除了换行符之外的任何0+字符,尽可能少

  • (?=(?!\1)\*[A-Z0-9]{2}|$) - up to the delimiter pattern (\*[A-Z0-9]{2}) that is not equal to the text captured in Group 1 ((?!\1)) or end of string ($).
  • (?=(?!\ 1)\ * [A-Z0-9] {2} | $) - 直至分隔符模式(\ * [A-Z0-9] {2})不等于在第1组((?!\ 1))或字符串结尾($)中捕获的文本。

See the PHP demo:

查看PHP演示:

$re = '/(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)/';
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
$res = [];
if (preg_match_all($re, $str, $m, PREG_SET_ORDER, 0)) {
    foreach ($m as $kvp) {
        $tmp = preg_split('~\*[A-Z0-9]+~', $kvp[2]);
        if (count($tmp) > 1) {
            $res[$kvp[1]] = $tmp;
        } else {
            $res[$kvp[1]] = $kvp[2];
        }
    }
    print_r($res);
}

Output:

Array
(
    [*01] => the title
    [*35] => the author
    [*A7] => other useless infos
    [*AE] => Array
        (
            [0] => other useful infos
            [1] => some delimiters can be there multiple times
        )

)

#3


0  

Ok, I answer my own question on how to handle the multiple same delimiters. Thanks to @markus-ankenbrand for the start:

好的,我回答了自己关于如何处理多个相同分隔符的问题。感谢@ markus-ankenbrand的开始:

$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
        $matches = [];
        for ($i = 1; $i < sizeof($attributes) - 1; $i += 2) {
            if (isset($matches[$attributes[$i]]) && is_array($matches[$attributes[$i]])) {
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } elseif (isset($matches[$attributes[$i]]) && !is_array($matches[$attributes[$i]])) {
                $currentValue = $matches[$attributes[$i]];
                $matches[$attributes[$i]] = [$currentValue];
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } else {
                $matches[$attributes[$i]] = $attributes[$i + 1];
            }
        }

The fat if/else statement does not look really nice, but it does what it need to do.

胖if / else语句看起来不太好,但它做了它需要做的事情。

#1


2  

Use the PREG_SPLIT_DELIM_CAPTURE flag of the preg_split function to also get the captured delimiter (see documentation).

使用preg_split函数的PREG_SPLIT_DELIM_CAPTURE标志也可以获取捕获的分隔符(请参阅文档)。

So in your case:

所以在你的情况下:

# The -1 is the limit parameter (no limit)
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);

Now you have element 0 of $attributes as everything before the first delimiter and then alternating the captured delimiter and the next group so you can build your $matches array like this (assuming that you do not want to keep the first group):

现在你将$ attributes的元素0作为第一个分隔符之前的所有内容然后交替捕获的分隔符和下一个组,这样你就可以构建你的$ matches数组(假设你不想保留第一个组):

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $matches[$attributes[$i]] = $attributes[$i+1];
}

In order to account for delimiters being present multiple times you can adjust the line inside the for loop to check whether this key already exists and in that case create an array.

为了考虑多次出现的分隔符,您可以调整for循环内的行以检查此键是否已存在,并在这种情况下创建一个数组。

Edit: a possibility to create an array if necessary is to use this code:

编辑:如有必要,可以使用此代码创建数组:

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $key = $attributes[$i];
    if(array_key_exists($key, $matches)){
        if(!is_array($matches[$key]){
            $matches[$key] = [$matches[$key]];
        }
        array_push($matches[$key], $attributes[$i+1]);
    } else {
        $matches[$attributes[$i]] = $attributes[$i+1];
    }
}

The downstream code can certainly be simplified, especially if you put all values in (possibly single element) arrays.

下游代码当然可以简化,特别是如果您将所有值放在(可能是单个元素)数组中。

#2


1  

You may match and capture the keys into Group 1 and all the text before the next delimiter into Group 2 where the delimiter is not the same as the first one captured. Then, in a loop, check all the keys and values and split those values with the delimiter pattern where it appears one or more times.

您可以将键匹配并捕获到组1中,并将下一个分隔符之前的所有文本捕获到组2中,其中分隔符与捕获的第一个分隔符不同。然后,在循环中,检查所有键和值,并使用分隔符模式将这些值拆分一次或多次。

The regex is

正则表达式是

(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)

See the regex demo.

请参阅正则表达式演示。

Details

  • (\*[A-Z0-9]{2}) - Delimiter, Group 1: a * and two uppercase letters or digits
  • (\ * [A-Z0-9] {2}) - 分隔符,第1组:a *和两个大写字母或数字

  • (.*?) - Value, Group 2: any 0+ chars other than line break chars, as few as possible
  • (。*?) - 值,第2组:除了换行符之外的任何0+字符,尽可能少

  • (?=(?!\1)\*[A-Z0-9]{2}|$) - up to the delimiter pattern (\*[A-Z0-9]{2}) that is not equal to the text captured in Group 1 ((?!\1)) or end of string ($).
  • (?=(?!\ 1)\ * [A-Z0-9] {2} | $) - 直至分隔符模式(\ * [A-Z0-9] {2})不等于在第1组((?!\ 1))或字符串结尾($)中捕获的文本。

See the PHP demo:

查看PHP演示:

$re = '/(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)/';
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
$res = [];
if (preg_match_all($re, $str, $m, PREG_SET_ORDER, 0)) {
    foreach ($m as $kvp) {
        $tmp = preg_split('~\*[A-Z0-9]+~', $kvp[2]);
        if (count($tmp) > 1) {
            $res[$kvp[1]] = $tmp;
        } else {
            $res[$kvp[1]] = $kvp[2];
        }
    }
    print_r($res);
}

Output:

Array
(
    [*01] => the title
    [*35] => the author
    [*A7] => other useless infos
    [*AE] => Array
        (
            [0] => other useful infos
            [1] => some delimiters can be there multiple times
        )

)

#3


0  

Ok, I answer my own question on how to handle the multiple same delimiters. Thanks to @markus-ankenbrand for the start:

好的,我回答了自己关于如何处理多个相同分隔符的问题。感谢@ markus-ankenbrand的开始:

$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
        $matches = [];
        for ($i = 1; $i < sizeof($attributes) - 1; $i += 2) {
            if (isset($matches[$attributes[$i]]) && is_array($matches[$attributes[$i]])) {
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } elseif (isset($matches[$attributes[$i]]) && !is_array($matches[$attributes[$i]])) {
                $currentValue = $matches[$attributes[$i]];
                $matches[$attributes[$i]] = [$currentValue];
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } else {
                $matches[$attributes[$i]] = $attributes[$i + 1];
            }
        }

The fat if/else statement does not look really nice, but it does what it need to do.

胖if / else语句看起来不太好,但它做了它需要做的事情。