php爆炸:通过使用空格分隔符将字符串分割成单词。

时间:2022-03-07 01:43:00
$str = "This is a    string";
$words = explode(" ", $str);

Works fine, but spaces still go into array:

工作很好,但空间仍然进入数组:

$words === array ('This', 'is', 'a', '', '', '', 'string');//true

I would prefer to have words only with no spaces and keep the information about the number of spaces separate.

我宁愿只使用没有空格的单词,并保留有关空格的信息。

$words === array ('This', 'is', 'a', 'string');//true
$spaces === array(1,1,4);//true

Just added: (1, 1, 4) means one space after the first word, one space after the second word and 4 spaces after the third word.

(1、1、4)是指在第一个单词后的一个空格,第二个单词后面有一个空格,第三个单词后面有四个空格。

Is there any way to do it fast?

有什么方法可以快速完成吗?

Thank you.

谢谢你!

6 个解决方案

#1


24  

For splitting the String into an array, you should use preg_split:

要将字符串分割为数组,应该使用preg_split:

$string = 'This is a    string';
$data   = preg_split('/\s+/', $string);

Your second part (counting spaces):

你的第二部分(计算空间):

$string = 'This is a    string';
preg_match_all('/\s+/', $string, $matches);
$result = array_map('strlen', $matches[0]);// [1, 1, 4]

#2


5  

$financialYear = 2015-2016;

financialYear = 2015 - 2015美元;

$test = explode('-',$financialYear);
echo $test[0]; // 2015
echo $test[1]; // 2016

#3


2  

Here is one way, splitting the string and running a regex once, then parsing the results to see which segments were captured as the split (and therefore only whitespace), or which ones are words:

这里有一种方法,拆分字符串并运行regex一次,然后解析结果,以查看哪些段被捕获为拆分(因此只有空格),或者哪些部分是单词:

$temp = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());

You can see from this demo that $words is:

你可以从这个演示中看到$words是:

Array
(
    [0] => This
    [1] => is
    [2] => a
    [3] => string
)

And $spaces is:

空间是:美元

Array
(
    [0] => 1
    [1] => 1
    [2] => 4
)

#4


1  

Another way to do it would be using foreach loop.

另一种方法是使用foreach循环。

$str = "This is a    string";
$words = explode(" ", $str);
$spaces=array();
$others=array();
foreach($words as $word)
{
if($word==' ')
{
array_push($spaces,$word);
}
else
{
array_push($others,$word);
}
}

#5


1  

You can use preg_split() for the first array:

您可以对第一个数组使用preg_split():

$str   = 'This is a    string';
$words = preg_split('#\s+#', $str);

And preg_match_all() for the $spaces array:

$spaces数组的preg_match_all():

preg_match_all('#\s+#', $str, $m);
$spaces = array_map('strlen', $m[0]);

#6


0  

Here are the results of performance tests:

以下是性能测试的结果:

$str = "This is a    string";

var_dump(time());

for ($i=1;$i<100000;$i++){
//Alma Do Mundo  - the winner
$rgData = preg_split('/\s+/', $str);


preg_match_all('/\s+/', $str, $rgMatches);
$rgResult = array_map('strlen', $rgMatches[0]);// [1,1,4]


}
print_r($rgData); print_r( $rgResult);
var_dump(time());




for ($i=1;$i<100000;$i++){
//nickb
$temp = preg_split('/(\s+)/', $str, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());
}


print_r( $words); print_r( $spaces);
var_dump(time());

int(1378392870) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392871) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392873)

int数组(1378392870)([0]= >[1]= >[2]= >[3]= >字符串)数组([0]= > 1[1]= > 1[2]= > 4)int数组(1378392871)([0]= >[1]= >[2]= >[3]= >字符串)数组([0]= > 1[1]= > 1[2]= > 4)int(1378392873)

#1


24  

For splitting the String into an array, you should use preg_split:

要将字符串分割为数组,应该使用preg_split:

$string = 'This is a    string';
$data   = preg_split('/\s+/', $string);

Your second part (counting spaces):

你的第二部分(计算空间):

$string = 'This is a    string';
preg_match_all('/\s+/', $string, $matches);
$result = array_map('strlen', $matches[0]);// [1, 1, 4]

#2


5  

$financialYear = 2015-2016;

financialYear = 2015 - 2015美元;

$test = explode('-',$financialYear);
echo $test[0]; // 2015
echo $test[1]; // 2016

#3


2  

Here is one way, splitting the string and running a regex once, then parsing the results to see which segments were captured as the split (and therefore only whitespace), or which ones are words:

这里有一种方法,拆分字符串并运行regex一次,然后解析结果,以查看哪些段被捕获为拆分(因此只有空格),或者哪些部分是单词:

$temp = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());

You can see from this demo that $words is:

你可以从这个演示中看到$words是:

Array
(
    [0] => This
    [1] => is
    [2] => a
    [3] => string
)

And $spaces is:

空间是:美元

Array
(
    [0] => 1
    [1] => 1
    [2] => 4
)

#4


1  

Another way to do it would be using foreach loop.

另一种方法是使用foreach循环。

$str = "This is a    string";
$words = explode(" ", $str);
$spaces=array();
$others=array();
foreach($words as $word)
{
if($word==' ')
{
array_push($spaces,$word);
}
else
{
array_push($others,$word);
}
}

#5


1  

You can use preg_split() for the first array:

您可以对第一个数组使用preg_split():

$str   = 'This is a    string';
$words = preg_split('#\s+#', $str);

And preg_match_all() for the $spaces array:

$spaces数组的preg_match_all():

preg_match_all('#\s+#', $str, $m);
$spaces = array_map('strlen', $m[0]);

#6


0  

Here are the results of performance tests:

以下是性能测试的结果:

$str = "This is a    string";

var_dump(time());

for ($i=1;$i<100000;$i++){
//Alma Do Mundo  - the winner
$rgData = preg_split('/\s+/', $str);


preg_match_all('/\s+/', $str, $rgMatches);
$rgResult = array_map('strlen', $rgMatches[0]);// [1,1,4]


}
print_r($rgData); print_r( $rgResult);
var_dump(time());




for ($i=1;$i<100000;$i++){
//nickb
$temp = preg_split('/(\s+)/', $str, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());
}


print_r( $words); print_r( $spaces);
var_dump(time());

int(1378392870) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392871) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392873)

int数组(1378392870)([0]= >[1]= >[2]= >[3]= >字符串)数组([0]= > 1[1]= > 1[2]= > 4)int数组(1378392871)([0]= >[1]= >[2]= >[3]= >字符串)数组([0]= > 1[1]= > 1[2]= > 4)int(1378392873)