如何将字符串分割成给定的行数?

时间:2021-10-04 15:05:51

Here is my question:

这是我的问题:

Given a string, which is made up of space separated words, how can I split that into N strings of (roughly) even length, only breaking on spaces?

给定一个字符串,由空格分隔的单词组成,我如何将它分割成N串(大致)均匀的长度,只在空格上断裂?

Here is what I've gathered from research:

以下是我从研究中收集到的:

I started by researching word-wrapping algorithms, because it seems to me that this is basically a word-wrapping problem. However, the majority of what I've found so far (and there is A LOT out there about word wrapping) assumes that the width of the line is a known input, and the number of lines is an output. I want the opposite.

我首先研究了词包装算法,因为在我看来,这基本上是一个词包装问题。然而,到目前为止,我发现的大部分内容(还有很多关于换行的内容)都假设行的宽度是已知的输入,行数是输出。我希望相反。

I have found a (very) few questions, such as this that seem to be helpful. However, they are all focused on the problem as one of optimization - e.g. how can I split a sentence into a given number of lines, while minimizing the raggedness of the lines, or the wasted whitespace, or whatever, and do it in linear (or NlogN, or whatever) time. These questions seem mostly to be unanswered, as the optimization part of the problem is relatively "hard".

我发现了一些(非常)有用的问题。然而,他们都把问题集中在一个优化问题上——例如,如何将一个句子分成若干行,同时尽量减少线条的粗糙性,或者浪费的空白,或者其他的,用线性(或NlogN,或其他)时间来做。这些问题似乎大多没有答案,因为问题的优化部分相对“困难”。

However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.

但是,我不太关心优化。只要这些线大致是均匀的(在大多数情况下),如果解决方案不是在每个边情况下都有效,或者不能被证明是最少时间复杂度的话,我就没问题。我只需要一个真实的世界解决方案,它可以取一个字符串,和一些行(大于2),然后给我一个字符串数组,这些字符串通常看起来都很漂亮。

Here is what I've come up with: I think I have a workable method for the case when N=3. I start by putting the first word on the first line, the last word on the last line, and then iteratively putting another word on the first and last lines, until my total width (measured by the length of the longest line) stops getting shorter. This usually works, but it gets tripped up if your longest words are in the middle of the line, and it doesn't seem very generalizable to more than 3 lines.

下面是我的想法:我认为对于N=3的情况,我有一个可行的方法。我首先把第一个单词放在第一行,最后一行的最后一个单词,然后在第一行和最后一行重复地加上另一个单词,直到我的总宽度(用最长的一行的长度衡量)停止变短。这通常是可行的,但是如果最长的单词在一行中间,它就会出错,而且看起来不太可能超过3行。

var getLongestHeaderLine = function(headerText) {
  //Utility function definitions
  var getLongest = function(arrayOfArrays) {
    return arrayOfArrays.reduce(function(a, b) {
      return a.length > b.length ? a : b;
    });
  };

  var sumOfLengths = function(arrayOfArrays) {
    return arrayOfArrays.reduce(function(a, b) {
      return a + b.length + 1;
    }, 0);
  };

  var getLongestLine = function(lines) {
    return lines.reduce(function(a, b) {
      return sumOfLengths(a) > sumOfLengths(b) ? a : b;
    });
  };

  var getHeaderLength = function(lines) {
    return sumOfLengths(getLongestLine(lines));
  }

  //first, deal with the degenerate cases
  if (!headerText)
    return headerText;

  headerText = headerText.trim();

  var headerWords = headerText.split(" ");

  if (headerWords.length === 1)
    return headerText;

  if (headerWords.length === 2)
    return getLongest(headerWords);

  //If we have more than 2 words in the header,
  //we need to split them into 3 lines
  var firstLine = headerWords.splice(0, 1);
  var lastLine = headerWords.splice(-1, 1);
  var lines = [firstLine, headerWords, lastLine];

  //The header length is the length of the longest
  //line in the header. We will keep iterating
  //until the header length stops getting shorter.
  var headerLength = getHeaderLength(lines);
  var lastHeaderLength = headerLength;
  while (true) {
    //Take the first word from the middle line,
    //and add it to the first line
    firstLine.push(headerWords.shift());
    headerLength = getHeaderLength(lines);
    if (headerLength > lastHeaderLength || headerWords.length === 0) {
      //If we stopped getting shorter, undo
      headerWords.unshift(firstLine.pop());
      break;
    }
    //Take the last word from the middle line,
    //and add it to the last line
    lastHeaderLength = headerLength;
    lastLine.unshift(headerWords.pop());
    headerLength = getHeaderLength(lines);
    if (headerLength > lastHeaderLength || headerWords.length === 0) {
      //If we stopped getting shorter, undo
      headerWords.push(lastLine.shift());
      break;
    }
    lastHeaderLength = headerLength;
  }

  return getLongestLine(lines).join(" ");
};

debugger;
var header = "an apple a day keeps the doctor away";

var longestHeaderLine = getLongestHeaderLine(header);
debugger;

EDIT: I tagged javascript, because ultimately I would like a solution I can implement in that language. It's not super critical to the problem though, and I would take any solution that works.

编辑:我标记了javascript,因为最终我想要一个可以用这种语言实现的解决方案。它对这个问题不是很关键,我可以用任何可行的解决方案。

EDIT#2: While performance is not what I'm most concerned about here, I do need to be able to perform whatever solution I come up with ~100-200 times, on strings that can be up to ~250 characters long. This would be done during a page load, so it needs to not take forever. For example, I've found that trying to offload this problem to the rendering engine by putting each string into a DIV and playing with the dimensions doesn't work, since it (seems to be) incredibly expensive to measure rendered elements.

编辑#2:虽然性能不是我最关心的,但我确实需要能够执行我提出的任何解决方案,100-200次,在字符串上可以达到~250个字符长。这将在页面加载期间完成,因此不需要花费太长时间。例如,我发现通过将每个字符串放入DIV并使用维度来将这个问题传递给呈现引擎是行不通的,因为度量呈现的元素(似乎)非常昂贵。

4 个解决方案

#1


2  

Try this. For any reasonable N, it should do the job:

试试这个。对于任何合理的N,它都应该做:

function format(srcString, lines) {
  var target = "";
  var  arr =  srcString.split(" ");
  var c = 0;
  var MAX = Math.ceil(srcString.length / lines);
  for (var i = 0, len = arr.length; i < len; i++) {
     var cur = arr[i];
     if(c + cur.length > MAX) {
        target += '\n' + cur;
     c = cur.length;
     }
     else {
       if(target.length > 0)
         target += " ";
       target += cur;
       c += cur.length;
     }       
   }
  return target;
}

alert(format("this is a very very very very " +
             "long and convoluted way of creating " +
             "a very very very long string",7));

#2


1  

You may want to give this solution a try, using canvas. It will need optimization and is only a quick shot, but I think canvas might be a good idea as you can calculate real widths. You can also adjust the font to the really used one, and so on. Important to note: This won't be the most performant way of doing things. It will create a lot of canvases.

您可能想尝试一下这个解决方案,使用canvas。它需要优化,而且只是一个快速的拍摄,但是我认为canvas可能是一个好主意,因为您可以计算真正的宽度。您还可以将字体调整到真正使用的字体,等等。需要注意的是:这并不是最有效的做事方式。它将创建许多画布。

DEMO

演示

var t = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`;


function getTextTotalWidth(text) {
    var canvas = document.createElement("canvas");
    var ctx = canvas.getContext("2d");
  ctx.font = "12px Arial";
    ctx.fillText(text,0,12);
  return ctx.measureText(text).width;
}

function getLineWidth(lines, totalWidth) {
    return totalWidth / lines ;
}

function getAverageLetterSize(text) {
    var t = text.replace(/\s/g, "").split("");
  var sum = t.map(function(d) { 
    return getTextTotalWidth(d); 
  }).reduce(function(a, b) { return a + b; });
    return  sum / t.length;
}

function getLines(text, numberOfLines) {
    var lineWidth = getLineWidth(numberOfLines, getTextTotalWidth(text));
  var letterWidth = getAverageLetterSize(text);
  var t = text.split("");
  return createLines(t, letterWidth, lineWidth);
}

function createLines(t, letterWidth, lineWidth) {
    var i = 0;
  var res = t.map(function(d) {
    if (i < lineWidth || d != " ") {
        i+=letterWidth;
        return d;
    }
    i = 0;
    return "<br />";
  })
  return res.join("");
}

var div = document.createElement("div");
div.innerHTML = getLines(t, 7);
document.body.appendChild(div);

#3


0  

I'm sorry this is C#. I had created my project already when you updated your post with the Javascript tag.

对不起,这是c#。当您用Javascript标记更新您的文章时,我已经创建了我的项目。

Since you said all you care about is roughly the same line length... I came up with this. Sorry for the simplistic approach.

既然你说你关心的是大致相同的线长……我想到了这个。很抱歉这么简单。

    private void DoIt() {

        List<string> listofwords = txtbx_Input.Text.Split(' ').ToList();
        int totalcharcount = 0;
        int neededLineCount = int.Parse(txtbx_LineCount.Text);

        foreach (string word in listofwords)
        {
            totalcharcount = totalcharcount + word.Count(char.IsLetter);
        }

        int averagecharcountneededperline = totalcharcount / neededLineCount;
        List<string> output = new List<string>();
        int positionsneeded = 0;

        while (output.Count < neededLineCount)
        {
            string tempstr = string.Empty;
            while (positionsneeded < listofwords.Count)
            {
                tempstr += " " + listofwords[positionsneeded];
                if ((positionsneeded != listofwords.Count - 1) && (tempstr.Count(char.IsLetter) + listofwords[positionsneeded + 1].Count(char.IsLetter) > averagecharcountneededperline))//if (this is not the last word) and (we are going to bust the average)
                {
                    if (output.Count + 1 == neededLineCount)//if we are writting the last line
                    {
                        //who cares about exceeding.
                    }
                    else
                    {
                        //we're going to exceed the allowed average, gotta force this loop to stop
                        positionsneeded++;//dont forget!
                        break;
                    }
                }
                positionsneeded++;//increment the needed position by one
            }

            output.Add(tempstr);//store the string in our list of string to output
        }

        //display the line on the screen
        foreach (string lineoftext in output)
        {
            txtbx_Output.AppendText(lineoftext + Environment.NewLine);
        }

    }

#4


0  

(Adapted from here, How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?)

(从这里改编,如何对一个整数数组进行分区,使每个分区的和的最大值最小化?)

If we consider the word lengths as a list of numbers, we can binary search the partition.

如果我们把单词length看作一个数字列表,我们可以对分区进行二进制搜索。

Our max length ranges from 0 to sum (word-length list) + (num words - 1), meaning the spaces. mid = (range / 2). We check if mid can be achieved by partitioning into N sets in O(m) time: traverse the list, adding (word_length + 1) to the current part while the current sum is less than or equal to mid. When the sum passes mid, start a new part. If the result includes N or less parts, mid is achievable.

我们的最大长度范围从0到sum(单词长度列表)+ (num words - 1),表示空格。中期=(范围/ 2)。我们可以通过中期检查分区在O N组(m)时间:遍历列表,添加当前部分(word_length + 1)尽管目前的总和小于或等于中期。当通过中期,开始一个新的部分。如果结果包含N个或更少的部分,则可以实现mid。

If mid can be achieved, try a lower range; otherwise, a higher range. The time complexity is O(m log num_chars). (You'll also have to consider how deleting a space per part, meaning where the line break would go, features into the calculation.)

如果可以实现中间值,尝试更低的范围;否则,一个更高的范围。时间复杂度是O(m log num_chars)。(你还必须考虑如何删除每个部分的空间,这意味着在计算过程中,换行符的位置会在哪里。)

JavaScript code (adapted from http://articles.leetcode.com/the-painters-partition-problem-part-ii):

JavaScript代码(改编自http://articles.leetcode.com/the painters-partition-problem-part-ii):

function getK(arr,maxLength) {
  var total = 0,
      k = 1;

  for (var i=0; i<arr.length; i++) {
    total += arr[i] + 1;

    if (total > maxLength) {
      total = arr[i];
      k++;
    }
  }

  return k;
}
 

function partition(arr,n) {
  var lo = Math.max(...arr),
      hi = arr.reduce((a,b) => a + b); 

  while (lo < hi) {
    var mid = lo + ((hi - lo) >> 1);

    var k = getK(arr,mid);

    if (k <= n){
      hi = mid;

    } else{
      lo = mid + 1;
    }
  }

  return lo;
}

var s = "this is a very very very very "
      + "long and convoluted way of creating "
      + "a very very very long string",
    n = 7;

var words = s.split(/\s+/),
    maxLength = partition(words.map(x => x.length),7);

console.log('max sentence length: ' + maxLength);
console.log(words.length + ' words');
console.log(n + ' lines')
console.log('')

var i = 0;

for (var j=0; j<n; j++){
  var str = '';
  
  while (true){
    if (!words[i] || str.length + words[i].length > maxLength){
      break
    }
    str += words[i++] + ' ';
  }
  console.log(str);
}

#1


2  

Try this. For any reasonable N, it should do the job:

试试这个。对于任何合理的N,它都应该做:

function format(srcString, lines) {
  var target = "";
  var  arr =  srcString.split(" ");
  var c = 0;
  var MAX = Math.ceil(srcString.length / lines);
  for (var i = 0, len = arr.length; i < len; i++) {
     var cur = arr[i];
     if(c + cur.length > MAX) {
        target += '\n' + cur;
     c = cur.length;
     }
     else {
       if(target.length > 0)
         target += " ";
       target += cur;
       c += cur.length;
     }       
   }
  return target;
}

alert(format("this is a very very very very " +
             "long and convoluted way of creating " +
             "a very very very long string",7));

#2


1  

You may want to give this solution a try, using canvas. It will need optimization and is only a quick shot, but I think canvas might be a good idea as you can calculate real widths. You can also adjust the font to the really used one, and so on. Important to note: This won't be the most performant way of doing things. It will create a lot of canvases.

您可能想尝试一下这个解决方案,使用canvas。它需要优化,而且只是一个快速的拍摄,但是我认为canvas可能是一个好主意,因为您可以计算真正的宽度。您还可以将字体调整到真正使用的字体,等等。需要注意的是:这并不是最有效的做事方式。它将创建许多画布。

DEMO

演示

var t = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`;


function getTextTotalWidth(text) {
    var canvas = document.createElement("canvas");
    var ctx = canvas.getContext("2d");
  ctx.font = "12px Arial";
    ctx.fillText(text,0,12);
  return ctx.measureText(text).width;
}

function getLineWidth(lines, totalWidth) {
    return totalWidth / lines ;
}

function getAverageLetterSize(text) {
    var t = text.replace(/\s/g, "").split("");
  var sum = t.map(function(d) { 
    return getTextTotalWidth(d); 
  }).reduce(function(a, b) { return a + b; });
    return  sum / t.length;
}

function getLines(text, numberOfLines) {
    var lineWidth = getLineWidth(numberOfLines, getTextTotalWidth(text));
  var letterWidth = getAverageLetterSize(text);
  var t = text.split("");
  return createLines(t, letterWidth, lineWidth);
}

function createLines(t, letterWidth, lineWidth) {
    var i = 0;
  var res = t.map(function(d) {
    if (i < lineWidth || d != " ") {
        i+=letterWidth;
        return d;
    }
    i = 0;
    return "<br />";
  })
  return res.join("");
}

var div = document.createElement("div");
div.innerHTML = getLines(t, 7);
document.body.appendChild(div);

#3


0  

I'm sorry this is C#. I had created my project already when you updated your post with the Javascript tag.

对不起,这是c#。当您用Javascript标记更新您的文章时,我已经创建了我的项目。

Since you said all you care about is roughly the same line length... I came up with this. Sorry for the simplistic approach.

既然你说你关心的是大致相同的线长……我想到了这个。很抱歉这么简单。

    private void DoIt() {

        List<string> listofwords = txtbx_Input.Text.Split(' ').ToList();
        int totalcharcount = 0;
        int neededLineCount = int.Parse(txtbx_LineCount.Text);

        foreach (string word in listofwords)
        {
            totalcharcount = totalcharcount + word.Count(char.IsLetter);
        }

        int averagecharcountneededperline = totalcharcount / neededLineCount;
        List<string> output = new List<string>();
        int positionsneeded = 0;

        while (output.Count < neededLineCount)
        {
            string tempstr = string.Empty;
            while (positionsneeded < listofwords.Count)
            {
                tempstr += " " + listofwords[positionsneeded];
                if ((positionsneeded != listofwords.Count - 1) && (tempstr.Count(char.IsLetter) + listofwords[positionsneeded + 1].Count(char.IsLetter) > averagecharcountneededperline))//if (this is not the last word) and (we are going to bust the average)
                {
                    if (output.Count + 1 == neededLineCount)//if we are writting the last line
                    {
                        //who cares about exceeding.
                    }
                    else
                    {
                        //we're going to exceed the allowed average, gotta force this loop to stop
                        positionsneeded++;//dont forget!
                        break;
                    }
                }
                positionsneeded++;//increment the needed position by one
            }

            output.Add(tempstr);//store the string in our list of string to output
        }

        //display the line on the screen
        foreach (string lineoftext in output)
        {
            txtbx_Output.AppendText(lineoftext + Environment.NewLine);
        }

    }

#4


0  

(Adapted from here, How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?)

(从这里改编,如何对一个整数数组进行分区,使每个分区的和的最大值最小化?)

If we consider the word lengths as a list of numbers, we can binary search the partition.

如果我们把单词length看作一个数字列表,我们可以对分区进行二进制搜索。

Our max length ranges from 0 to sum (word-length list) + (num words - 1), meaning the spaces. mid = (range / 2). We check if mid can be achieved by partitioning into N sets in O(m) time: traverse the list, adding (word_length + 1) to the current part while the current sum is less than or equal to mid. When the sum passes mid, start a new part. If the result includes N or less parts, mid is achievable.

我们的最大长度范围从0到sum(单词长度列表)+ (num words - 1),表示空格。中期=(范围/ 2)。我们可以通过中期检查分区在O N组(m)时间:遍历列表,添加当前部分(word_length + 1)尽管目前的总和小于或等于中期。当通过中期,开始一个新的部分。如果结果包含N个或更少的部分,则可以实现mid。

If mid can be achieved, try a lower range; otherwise, a higher range. The time complexity is O(m log num_chars). (You'll also have to consider how deleting a space per part, meaning where the line break would go, features into the calculation.)

如果可以实现中间值,尝试更低的范围;否则,一个更高的范围。时间复杂度是O(m log num_chars)。(你还必须考虑如何删除每个部分的空间,这意味着在计算过程中,换行符的位置会在哪里。)

JavaScript code (adapted from http://articles.leetcode.com/the-painters-partition-problem-part-ii):

JavaScript代码(改编自http://articles.leetcode.com/the painters-partition-problem-part-ii):

function getK(arr,maxLength) {
  var total = 0,
      k = 1;

  for (var i=0; i<arr.length; i++) {
    total += arr[i] + 1;

    if (total > maxLength) {
      total = arr[i];
      k++;
    }
  }

  return k;
}
 

function partition(arr,n) {
  var lo = Math.max(...arr),
      hi = arr.reduce((a,b) => a + b); 

  while (lo < hi) {
    var mid = lo + ((hi - lo) >> 1);

    var k = getK(arr,mid);

    if (k <= n){
      hi = mid;

    } else{
      lo = mid + 1;
    }
  }

  return lo;
}

var s = "this is a very very very very "
      + "long and convoluted way of creating "
      + "a very very very long string",
    n = 7;

var words = s.split(/\s+/),
    maxLength = partition(words.map(x => x.length),7);

console.log('max sentence length: ' + maxLength);
console.log(words.length + ' words');
console.log(n + ' lines')
console.log('')

var i = 0;

for (var j=0; j<n; j++){
  var str = '';
  
  while (true){
    if (!words[i] || str.length + words[i].length > maxLength){
      break
    }
    str += words[i++] + ' ';
  }
  console.log(str);
}