在不破坏html标签的情况下剪切文本

时间:2022-08-22 21:37:07

Is there a way to do this without writing my own function?

有没有一种不用我自己写函数的方法呢?

For example:

例如:

$text = 'Test <span><a>something</a> something else</span>.';
$text = cutText($text, 2, null, 20, true);
//result: Test <span><a>something</a></span>

I need to make this function indestructible

我需要使这个函数不可摧毁

My problem is similar to This thread but I need a better solution. I would like to keep nested tags untouched.

我的问题和这个问题很相似,但我需要一个更好的解决方案。我希望保持嵌套标记不变。

So far my algorithm is:

到目前为止,我的算法是:

function cutText($content, $max_words, $max_chars, $max_word_len, $html = false) {
    $len = strlen($content);
    $res = '';

    $word_count = 0;
    $word_started = false;
    $current_word = '';
    $current_word_len = 0;

    if ($max_chars == null) {
        $max_chars = $len;
    }
    $inHtml = false;
    $openedTags = array();
    for ($i = 0; $i<$max_chars;$i++) {

        if ($content[$i] == '<' && $html) {
            $inHtml = true;
        }

        if ($inHtml) {
            $max_chars++;
        }       

        if ($html && !$inHtml) {

            if ($content[$i] != ' ' && !$word_started) {
                $word_started = true;
                $word_count++;
            }

            $current_word .= $content[$i];
            $current_word_len++;

            if ($current_word_len == $max_word_len) {
                $current_word .= '- ';
            }

            if (($content[$i] == ' ') && $word_started) {
                $word_started = false;
                $res .= $current_word;
                $current_word = '';
                $current_word_len = 0;
                if ($word_count == $max_words) {
                    return $res;
                }
            }
        }

        if ($content[$i] == '<' && $html) {
            $inHtml = true;
        }
    }
    return $res;
}

But of course it won't work. I thought about remembering opened tags and closing them if they were not closed but maybe there is a better way?

但这当然行不通。我想过要记住打开的标签如果没有关闭的话就关闭它们但也许有更好的方法?

3 个解决方案

#1


2  

This works perfectly for me:

这对我来说再合适不过了:

function trimContent ($str, $trimAtIndex) {

    $beginTags = array();       
    $endTags = array();

    for($i = 0; $i < strlen($str); $i++) {
        if( $str[$i] == '<' )
            $beginTags[] = $i;
        else if($str[$i] == '>')
           $endTags[] = $i;
    }

    foreach($beginTags as $k=>$index) {
        // Trying to trim in between tags. Trim after the last tag
        if( ( $trimAtIndex >= $index ) && ($trimAtIndex <= $endTags[$k])  ) {
            $trimAtIndex = $endTags[$k];
        }
    }

    return substr($str, 0, $trimAtIndex);
}

#2


1  

Try something like this

这样的尝试

  function cutText($inputText, $start, $length) {
    $temp = $inputText;
    $res = array();
    while (strpos($temp, '>')) {
      $ts = strpos($temp, '<');
      $te = strpos($temp, '>');
      if ($ts > 0) $res[] = substr($temp, 0, $ts);
      $res[] = substr($temp, $ts, $te - $ts + 1);
      $temp = substr($temp, $te + 1, strlen($temp) - $te);
      }
    if ($temp != '') $res[] = $temp;
    $pointer = 0; 
    $end = $start + $length - 1;
    foreach ($res as &$part) {
      if (substr($part, 0, 1) != '<') {
        $l = strlen($part);
        $p1 = $pointer;
        $p2 = $pointer + $l - 1;
        $partx = "";
        if ($start <= $p1 && $end >= $p2) $partx = "";
        else {
          if ($start > $p1 && $start <= $p2) $partx .= substr($part, 0, $start-$pointer);
          if ($end >= $p1 && $end < $p2) $partx .= substr($part, $end-$pointer+1, $l-$end+$pointer);
          if ($partx == "") $partx = $part;
          }
        $part = $partx;
        $pointer += $l;
        }
      }
    return join('', $res);
    }

Parameters:

参数:

  • $inputText - input text
  • inputText -输入文本
  • $start - position of first character
  • $start -第一个字符的位置
  • $length - how menu characters we want to remove
  • $length—我们想要删除的菜单字符


Example #1 - Removing first 3 characters

  $text = 'Test <span><a>something</a> something else</span>.';
  $text = cutText($text, 0, 3);
  var_dump($text);

Output (removed "Tes")

输出(删除“无”)

string(47) "t <span><a>something</a> something else</span>."

Removing first 10 characters

删除第一个10个字符

  $text = cutText($text, 0, 10);

Output (removed "Test somet")

输出(删除“测试somet”)

string(40) "<span><a>hing</a> something else</span>."

Example 2 - Removing inner characters - "es" from "Test "

示例2 -从“测试”中删除内部字符“es”

  $text = cutText($text, 1, 2);

Output

输出

string(48) "Tt <span><a>something</a> something else</span>."

Removing "thing something el"

删除“el的东西”

  $text = cutText($text, 9, 18);

Output

输出

string(32) "Test <span><a>some</a>se</span>."

Hope this helps.

希望这个有帮助。

Well, maybe this is not the best solution but it's everything I can do at the moment.

也许这不是最好的解决方案但这是我目前能做的所有事情。

#3


1  

Ok I solved this thing.

我解决了这个问题。

I divided this in 2 parts. First cutting text without destroying html:

我把它分成两部分。首先删除文本而不破坏html:

function cutHtml($content, $max_words, $max_chars, $max_word_len) {
    $len = strlen($content);
    $res = '';

    $word_count = 0;
    $word_started = false;
    $current_word = '';
    $current_word_len = 0;

    if ($max_chars == null) {
        $max_chars = $len;
    }
    $inHtml = false;
    $openedTags = array();
    $i = 0;

    while ($i < $max_chars) {

        //skip any html tags
        if ($content[$i] == '<') {
            $inHtml = true;
            while (true) {
                $res .= $content[$i];
                $i++;
                while($content[$i] == ' ') { $res .= $content[$i]; $i++; }

                //skip any values
                if ($content[$i] == "'") {
                    $res .= $content[$i];
                    $i++;
                    while(!($content[$i] == "'" && $content[$i-1] != "\\")) {
                        $res .= $content[$i];
                        $i++;
                    }                   
                }

                //skip any values
                if ($content[$i] == '"') {
                    $res .= $content[$i];
                    $i++;
                    while(!($content[$i] == '"' && $content[$i-1] != "\\")) {
                        $res .= $content[$i];
                        $i++;
                    }                   
                }
                if ($content[$i] == '>') { $res .= $content[$i]; $i++; break;}
            }
            $inHtml = false;
        }

        if (!$inHtml) {

            while($content[$i] == ' ') { $res .= $content[$i]; $letter_count++; $i++; } //skip spaces

            $word_started = false;
            $current_word = '';
            $current_word_len = 0;
            while (!in_array($content[$i], array(' ', '<', '.', ','))) {

                if (!$word_started) {
                    $word_started = true;
                    $word_count++;
                }

                $current_word .= $content[$i];
                $current_word_len++;

                if ($current_word_len == $max_word_len) {
                    $current_word .= '-';
                    $current_word_len = 0;
                }

                $i++;
            }

            if ($letter_count > $max_chars) {
                return $res;
            }

            if ($word_count < $max_words) {
                $res .= $current_word;
                $letter_count += strlen($current_word);
            }

            if ($word_count == $max_words) {
                $res .= $current_word;
                $letter_count += strlen($current_word);
                return $res;
            }
        }

    }
    return $res;
}

And next thing is closing unclosed tags:

接下来是关闭未关闭标签:

function cleanTags(&$html) {
    $count = strlen($html);
    $i = -1;
    $openedTags = array();

    while(true) {
        $i++;
        if ($i >= $count) break;
        if ($html[$i] == '<') {

            $tag = '';
            $closeTag = '';
            $reading = false;
            //reading whole tag
            while($html[$i] != '>') {
                $i++;

                while($html[$i] == ' ') $i++; //skip any spaces (need to be idiot proof)
                if (!$reading && $html[$i] == '/') { //closing tag
                    $i++;
                    while($html[$i] == ' ') $i++; //skip any spaces

                    $closeTag = '';

                    while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
                        $reading = true;
                        $html[$i] = strtolower($html[$i]); //tags to lowercase
                        $closeTag .= $html[$i];
                        $i++;
                    }
                    $c = count($openedTags);
                    if ($c > 0 && $openedTags[$c-1] == $closeTag) array_pop($openedTags);
                }

                if (!$reading) //read only tag
                while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
                    $reading = true;
                    $html[$i] = strtolower($html[$i]); //tags to lowercase
                    $tag .= $html[$i];
                    $i++;
                }

                //skip any values
                if ($html[$i] == "'") {
                    $i++;
                    while(!($html[$i] == "'" && $html[$i-1] != "\\")) {
                        $i++;
                    }                   
                }

                //skip any values
                if ($html[$i] == '"') {
                    $i++;
                    while(!($html[$i] == '"' && $html[$i-1] != "\\")) {
                        $i++;
                    }                   
                }

                if ($reading && $html[$i] == '/') { //self closed tag
                    $tag = '';
                    break;
                }
            }
            if (!empty($tag)) $openedTags[] = $tag;

        }

    }

    while (count($openedTags) > 0) {
        $tag = array_pop($openedTags);
        $html .= "</$tag>";
    }
}

It's not idiot proof but tinymce will clear this thing out so further cleaning is not necessary.

这并不是白痴的证据,但是tinymce会把这个东西清理掉,所以不需要进一步的清理。

It may be a little long but i don't think it will eat a lot of resources and it should be faster than regex.

它可能有点长,但我不认为它会消耗很多资源,而且应该比regex快。

#1


2  

This works perfectly for me:

这对我来说再合适不过了:

function trimContent ($str, $trimAtIndex) {

    $beginTags = array();       
    $endTags = array();

    for($i = 0; $i < strlen($str); $i++) {
        if( $str[$i] == '<' )
            $beginTags[] = $i;
        else if($str[$i] == '>')
           $endTags[] = $i;
    }

    foreach($beginTags as $k=>$index) {
        // Trying to trim in between tags. Trim after the last tag
        if( ( $trimAtIndex >= $index ) && ($trimAtIndex <= $endTags[$k])  ) {
            $trimAtIndex = $endTags[$k];
        }
    }

    return substr($str, 0, $trimAtIndex);
}

#2


1  

Try something like this

这样的尝试

  function cutText($inputText, $start, $length) {
    $temp = $inputText;
    $res = array();
    while (strpos($temp, '>')) {
      $ts = strpos($temp, '<');
      $te = strpos($temp, '>');
      if ($ts > 0) $res[] = substr($temp, 0, $ts);
      $res[] = substr($temp, $ts, $te - $ts + 1);
      $temp = substr($temp, $te + 1, strlen($temp) - $te);
      }
    if ($temp != '') $res[] = $temp;
    $pointer = 0; 
    $end = $start + $length - 1;
    foreach ($res as &$part) {
      if (substr($part, 0, 1) != '<') {
        $l = strlen($part);
        $p1 = $pointer;
        $p2 = $pointer + $l - 1;
        $partx = "";
        if ($start <= $p1 && $end >= $p2) $partx = "";
        else {
          if ($start > $p1 && $start <= $p2) $partx .= substr($part, 0, $start-$pointer);
          if ($end >= $p1 && $end < $p2) $partx .= substr($part, $end-$pointer+1, $l-$end+$pointer);
          if ($partx == "") $partx = $part;
          }
        $part = $partx;
        $pointer += $l;
        }
      }
    return join('', $res);
    }

Parameters:

参数:

  • $inputText - input text
  • inputText -输入文本
  • $start - position of first character
  • $start -第一个字符的位置
  • $length - how menu characters we want to remove
  • $length—我们想要删除的菜单字符


Example #1 - Removing first 3 characters

  $text = 'Test <span><a>something</a> something else</span>.';
  $text = cutText($text, 0, 3);
  var_dump($text);

Output (removed "Tes")

输出(删除“无”)

string(47) "t <span><a>something</a> something else</span>."

Removing first 10 characters

删除第一个10个字符

  $text = cutText($text, 0, 10);

Output (removed "Test somet")

输出(删除“测试somet”)

string(40) "<span><a>hing</a> something else</span>."

Example 2 - Removing inner characters - "es" from "Test "

示例2 -从“测试”中删除内部字符“es”

  $text = cutText($text, 1, 2);

Output

输出

string(48) "Tt <span><a>something</a> something else</span>."

Removing "thing something el"

删除“el的东西”

  $text = cutText($text, 9, 18);

Output

输出

string(32) "Test <span><a>some</a>se</span>."

Hope this helps.

希望这个有帮助。

Well, maybe this is not the best solution but it's everything I can do at the moment.

也许这不是最好的解决方案但这是我目前能做的所有事情。

#3


1  

Ok I solved this thing.

我解决了这个问题。

I divided this in 2 parts. First cutting text without destroying html:

我把它分成两部分。首先删除文本而不破坏html:

function cutHtml($content, $max_words, $max_chars, $max_word_len) {
    $len = strlen($content);
    $res = '';

    $word_count = 0;
    $word_started = false;
    $current_word = '';
    $current_word_len = 0;

    if ($max_chars == null) {
        $max_chars = $len;
    }
    $inHtml = false;
    $openedTags = array();
    $i = 0;

    while ($i < $max_chars) {

        //skip any html tags
        if ($content[$i] == '<') {
            $inHtml = true;
            while (true) {
                $res .= $content[$i];
                $i++;
                while($content[$i] == ' ') { $res .= $content[$i]; $i++; }

                //skip any values
                if ($content[$i] == "'") {
                    $res .= $content[$i];
                    $i++;
                    while(!($content[$i] == "'" && $content[$i-1] != "\\")) {
                        $res .= $content[$i];
                        $i++;
                    }                   
                }

                //skip any values
                if ($content[$i] == '"') {
                    $res .= $content[$i];
                    $i++;
                    while(!($content[$i] == '"' && $content[$i-1] != "\\")) {
                        $res .= $content[$i];
                        $i++;
                    }                   
                }
                if ($content[$i] == '>') { $res .= $content[$i]; $i++; break;}
            }
            $inHtml = false;
        }

        if (!$inHtml) {

            while($content[$i] == ' ') { $res .= $content[$i]; $letter_count++; $i++; } //skip spaces

            $word_started = false;
            $current_word = '';
            $current_word_len = 0;
            while (!in_array($content[$i], array(' ', '<', '.', ','))) {

                if (!$word_started) {
                    $word_started = true;
                    $word_count++;
                }

                $current_word .= $content[$i];
                $current_word_len++;

                if ($current_word_len == $max_word_len) {
                    $current_word .= '-';
                    $current_word_len = 0;
                }

                $i++;
            }

            if ($letter_count > $max_chars) {
                return $res;
            }

            if ($word_count < $max_words) {
                $res .= $current_word;
                $letter_count += strlen($current_word);
            }

            if ($word_count == $max_words) {
                $res .= $current_word;
                $letter_count += strlen($current_word);
                return $res;
            }
        }

    }
    return $res;
}

And next thing is closing unclosed tags:

接下来是关闭未关闭标签:

function cleanTags(&$html) {
    $count = strlen($html);
    $i = -1;
    $openedTags = array();

    while(true) {
        $i++;
        if ($i >= $count) break;
        if ($html[$i] == '<') {

            $tag = '';
            $closeTag = '';
            $reading = false;
            //reading whole tag
            while($html[$i] != '>') {
                $i++;

                while($html[$i] == ' ') $i++; //skip any spaces (need to be idiot proof)
                if (!$reading && $html[$i] == '/') { //closing tag
                    $i++;
                    while($html[$i] == ' ') $i++; //skip any spaces

                    $closeTag = '';

                    while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
                        $reading = true;
                        $html[$i] = strtolower($html[$i]); //tags to lowercase
                        $closeTag .= $html[$i];
                        $i++;
                    }
                    $c = count($openedTags);
                    if ($c > 0 && $openedTags[$c-1] == $closeTag) array_pop($openedTags);
                }

                if (!$reading) //read only tag
                while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
                    $reading = true;
                    $html[$i] = strtolower($html[$i]); //tags to lowercase
                    $tag .= $html[$i];
                    $i++;
                }

                //skip any values
                if ($html[$i] == "'") {
                    $i++;
                    while(!($html[$i] == "'" && $html[$i-1] != "\\")) {
                        $i++;
                    }                   
                }

                //skip any values
                if ($html[$i] == '"') {
                    $i++;
                    while(!($html[$i] == '"' && $html[$i-1] != "\\")) {
                        $i++;
                    }                   
                }

                if ($reading && $html[$i] == '/') { //self closed tag
                    $tag = '';
                    break;
                }
            }
            if (!empty($tag)) $openedTags[] = $tag;

        }

    }

    while (count($openedTags) > 0) {
        $tag = array_pop($openedTags);
        $html .= "</$tag>";
    }
}

It's not idiot proof but tinymce will clear this thing out so further cleaning is not necessary.

这并不是白痴的证据,但是tinymce会把这个东西清理掉,所以不需要进一步的清理。

It may be a little long but i don't think it will eat a lot of resources and it should be faster than regex.

它可能有点长,但我不认为它会消耗很多资源,而且应该比regex快。