使用正则表达式提取自定义标记的属性值

时间:2022-11-27 07:49:47

Thanks for taking a look at this. I'm using PHP. I have a string like so:

谢谢你看看这个。我正在使用PHP。我有一个像这样的字符串:

[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don't so much dance as rhythmically convulse.[/QUOTE]

And I want to pull out the values in the quotes and create an associative array like so:

我想拉出引号中的值并创建一个关联数组,如下所示:

["name" => "Max-Fischer", "post" => "486662533", "member" => "123"]

Then, I would like to remove the opening and closing [QUOTE] tags and replace them with custom HTML like so:

然后,我想删除打开和关闭[QUOTE]标签并将其替换为自定义HTML,如下所示:

<blockquote><a href="URL_I_WILL_GENERATE_FROM_THE_ARRAY_VALUES">Max-Fischer</a> wrote: I don't so much dance as rhythmically convulse.</blockquote>

So the main problem is creating the preg_match() or preg_replace() to handle first: grabbing the values out in an array, and second: removing the tags and replacing them with my custom content. I can figure out how to use the array to create the custom HTML, I just can't figure how to use regular expressions well enough to achieve it.

因此,主要问题是首先要创建preg_match()或preg_replace()来处理:在数组中获取值,第二个:删除标记并将其替换为我的自定义内容。我可以弄清楚如何使用数组来创建自定义HTML,我只是无法想象如何使用正则表达式来实现它。

I tried a match like this to get the attribute values:

我尝试了这样的匹配来获取属性值:

/(\S+)=[\"\']?((?:.(?![\"\']?\s+(?:\S+)=|[>\"\']))+.)[\"\']?/

But this only returns:

但这只会返回:

[QUOTE

And that's not even addressing how to put the values (if I can get them) into an array.

而这甚至没有解决如何将值(如果我可以得到它们)放入数组中。

Thanks in advance for your time.

在此先感谢您的时间。

Cheers.

2 个解决方案

#1


If the tag you're looking for is always going to be quote, then perhaps something a little simpler is possible:

如果您正在寻找的标签总是引用,那么可能更简单一些:

  $s ='"[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';

  $r = '/\[QUOTE="(.*?)"\](.*)\[\/QUOTE\]/';  

  $m = array();
  $arr = array();
  preg_match($r, $s, $m);
  // m[0] = the initial string
  // m[1] = the string of attributes
  // m[2] = the quote itself
  foreach(explode(',', $m[1]) as $valuepair) { // split the attributes on the comma
    preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
    // mm[0] = the attribute pairing
    // mm[1] = the attribute name
    // mm[2] = the attribute value
    $arr[$mm[1]] = $mm[2];
  }
  print_r($arr);
  print $m[2] . "\n";

this gives the following output:

这给出了以下输出:

Array
(
    [name] => Max-Fischer
    [post] => 486662533
    [member] => 123
)
I don't so much dance as rhythmically convulse.

If you want to handle the case where there is more than one quote in the string, we can do this by modifying the regex to be slightly less greedy, and then using preg_match_all, instead of preg_match

如果你想处理字符串中有多个引号的情况,我们可以通过修改正则表达式稍微贪婪,然后使用preg_match_all而不是preg_match来做到这一点

  $s ='[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';
  $s .='[QUOTE="name: Some-Guy, post: 486562533, member: 1234"]Quidquid latine dictum sit, altum videtur[/QUOTE]';

  $r = '/\[QUOTE="(.*?)"\](.*?)\[\/QUOTE\]/';
  //                         ^  <--- added to make it less greedy
  $m = array();
  $arr = array();
  preg_match_all($r, $s, $m, PREG_SET_ORDER);
  // m[0] = the first quote
  // m[1] = the second quote
  // m[0][0] = the initial string
  // m[0][1] = the string of attributes
  // m[0][2] = the quote itself
  // element for each quote found in the string
  foreach($m as $match) { // since there is more than quote, we loop and operate on them individually
    $quote = array();
    foreach(explode(',', $match[1]) as $valuepair) { // split the attributes on the comma
      preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
      // mm[0] = the attribute pairing
      // mm[1] = the attribute name
      // mm[2] = the attribute value
      $quote[$mm[1]] = $mm[2];
    }
    $arr[] = $quote; // we now build a parent array, to hold each individual quote
  }
  print_r($arr);

This gives output like:

这给出了如下输出:

Array
(
    [0] => Array
        (
            [name] => Max-Fischer
            [post] => 486662533
            [member] => 123
        )

    [1] => Array
        (
            [name] => Some-Guy
            [post] => 486562533
            [member] => 1234
        )

)

#2


I managed to resolve yout problem: to get an associative array. I hope it will help you.

我设法解决了你的问题:得到一个关联数组。我希望它会对你有所帮助。

Here is code

这是代码

$str =  <<< PP
[QUOTE=" name : Max-Fischer,post : 486662533,member : 123 "]I don't so much dance as rhythmically convulse.[/QUOTE]
PP;

preg_match_all('/^\[QUOTE=\"(.*?)\"\](?:.*?)]$/', $str, $matches);
preg_match_all('/([a-zA-Z0-9]+)\s+:\s+([a-zA-Z0-9]+)/', $matches[1][0], $result);

$your_data = array_combine($result[1],$result[2]);

echo "<pre>";
print_r($your_data);

#1


If the tag you're looking for is always going to be quote, then perhaps something a little simpler is possible:

如果您正在寻找的标签总是引用,那么可能更简单一些:

  $s ='"[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';

  $r = '/\[QUOTE="(.*?)"\](.*)\[\/QUOTE\]/';  

  $m = array();
  $arr = array();
  preg_match($r, $s, $m);
  // m[0] = the initial string
  // m[1] = the string of attributes
  // m[2] = the quote itself
  foreach(explode(',', $m[1]) as $valuepair) { // split the attributes on the comma
    preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
    // mm[0] = the attribute pairing
    // mm[1] = the attribute name
    // mm[2] = the attribute value
    $arr[$mm[1]] = $mm[2];
  }
  print_r($arr);
  print $m[2] . "\n";

this gives the following output:

这给出了以下输出:

Array
(
    [name] => Max-Fischer
    [post] => 486662533
    [member] => 123
)
I don't so much dance as rhythmically convulse.

If you want to handle the case where there is more than one quote in the string, we can do this by modifying the regex to be slightly less greedy, and then using preg_match_all, instead of preg_match

如果你想处理字符串中有多个引号的情况,我们可以通过修改正则表达式稍微贪婪,然后使用preg_match_all而不是preg_match来做到这一点

  $s ='[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';
  $s .='[QUOTE="name: Some-Guy, post: 486562533, member: 1234"]Quidquid latine dictum sit, altum videtur[/QUOTE]';

  $r = '/\[QUOTE="(.*?)"\](.*?)\[\/QUOTE\]/';
  //                         ^  <--- added to make it less greedy
  $m = array();
  $arr = array();
  preg_match_all($r, $s, $m, PREG_SET_ORDER);
  // m[0] = the first quote
  // m[1] = the second quote
  // m[0][0] = the initial string
  // m[0][1] = the string of attributes
  // m[0][2] = the quote itself
  // element for each quote found in the string
  foreach($m as $match) { // since there is more than quote, we loop and operate on them individually
    $quote = array();
    foreach(explode(',', $match[1]) as $valuepair) { // split the attributes on the comma
      preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
      // mm[0] = the attribute pairing
      // mm[1] = the attribute name
      // mm[2] = the attribute value
      $quote[$mm[1]] = $mm[2];
    }
    $arr[] = $quote; // we now build a parent array, to hold each individual quote
  }
  print_r($arr);

This gives output like:

这给出了如下输出:

Array
(
    [0] => Array
        (
            [name] => Max-Fischer
            [post] => 486662533
            [member] => 123
        )

    [1] => Array
        (
            [name] => Some-Guy
            [post] => 486562533
            [member] => 1234
        )

)

#2


I managed to resolve yout problem: to get an associative array. I hope it will help you.

我设法解决了你的问题:得到一个关联数组。我希望它会对你有所帮助。

Here is code

这是代码

$str =  <<< PP
[QUOTE=" name : Max-Fischer,post : 486662533,member : 123 "]I don't so much dance as rhythmically convulse.[/QUOTE]
PP;

preg_match_all('/^\[QUOTE=\"(.*?)\"\](?:.*?)]$/', $str, $matches);
preg_match_all('/([a-zA-Z0-9]+)\s+:\s+([a-zA-Z0-9]+)/', $matches[1][0], $result);

$your_data = array_combine($result[1],$result[2]);

echo "<pre>";
print_r($your_data);