如何使用preg_split在2个句子后切断RSS提要描述?

时间:2021-12-25 04:54:55

I want to take a description of a RSS feed located in $the_content and cut it off after 2 full sentences (or 200 words and then the next full sentence) using preg_split.

我想对$ the_content中的RSS提要进行描述,然后使用preg_split在2个完整句子(或200个单词,然后是下一个完整句子)之后将其删除。

I tried a couple times, but I'm way off. I know what I want to do, but I can't seem to even start on something to make this work.

我试了好几次,但是我离开了。我知道我想做什么,但我似乎无法开始做某些工作。

Thanks!

1 个解决方案

#1


1  

Proper splitting of HTML is very tricky, and not worth doing with regular expressions. If you want HTML, something like DOM text iterator will be useful.

正确拆分HTML非常棘手,不值得使用正则表达式。如果你想要HTML,像DOM文本迭代器这样的东西会很有用。

  1. Convert description to text:

    将描述转换为文本:

    $text = html_entities_decode(strip_tags($html),ENT_QUOTES,'UTF-8');
    
  2. This will take first 200 characters (200 words is a bit too much for a sentence, isn't it?) and then look for end of sentence:

    这将需要前200个字符(200个字对于一个句子来说有点太多了,不是吗?)然后寻找句子的结尾:

    $text = preg_replace('/^(.{200}.*?[.!?]).*$/','\1',$text);
    

You could change [.!?] to something more sophisticated, e.g. require space after punctuation or require that there's no punctuation nearby:

您可以将[。!]更改为更复杂的内容,例如在标点符号后要求空格或要求附近没有标点符号:

  (?<![^.!?]{5})[.!?](?=[^.!?]{5})

(?=…) is positive assertion. (?<!…) negative assertion that looks behind current position. {5} means 5 times.

(?= ...)是肯定的断言。 (?

I haven't tested it :)

我没有测试过:)

#1


1  

Proper splitting of HTML is very tricky, and not worth doing with regular expressions. If you want HTML, something like DOM text iterator will be useful.

正确拆分HTML非常棘手,不值得使用正则表达式。如果你想要HTML,像DOM文本迭代器这样的东西会很有用。

  1. Convert description to text:

    将描述转换为文本:

    $text = html_entities_decode(strip_tags($html),ENT_QUOTES,'UTF-8');
    
  2. This will take first 200 characters (200 words is a bit too much for a sentence, isn't it?) and then look for end of sentence:

    这将需要前200个字符(200个字对于一个句子来说有点太多了,不是吗?)然后寻找句子的结尾:

    $text = preg_replace('/^(.{200}.*?[.!?]).*$/','\1',$text);
    

You could change [.!?] to something more sophisticated, e.g. require space after punctuation or require that there's no punctuation nearby:

您可以将[。!]更改为更复杂的内容,例如在标点符号后要求空格或要求附近没有标点符号:

  (?<![^.!?]{5})[.!?](?=[^.!?]{5})

(?=…) is positive assertion. (?<!…) negative assertion that looks behind current position. {5} means 5 times.

(?= ...)是肯定的断言。 (?

I haven't tested it :)

我没有测试过:)