如何改进代码中的文本替换?我目前正在使用Regex来分割字符串

时间:2021-12-24 07:39:24

I have a bit of code that works great on smaller files, but when the files are bigger the program locks up - or is just so slow it appears to be - I can walk away for 10 minutes and it's still sitting there. How do I improve the efficiency of this code for larger files? Also, something minor - when it's at the last split, the next item has nothing to split and I end up with a duplicate replace. How do I fix this? The efficiency issue is obviously my main problem here.

我有一些代码可以在较小的文件上运行得很好,但是当文件较大时,程序会锁定 - 或者看起来很慢 - 我可以走开10分钟而且它仍然坐在那里。如何提高此代码对较大文件的效率?此外,一些小问题 - 当它在最后一次拆分时,下一个项目没有什么可以拆分,我最终得到一个重复的替换。我该如何解决?效率问题显然是我的主要问题。

for (int i = 0; i < divs.Count; i++)
{
    Regex regex = new Regex("</div>");
    string[] hands = regex.Split(divs[i].ToString());

    string output = string.Empty;
    foreach (var item in hands)
    {

        output += item + "</div>";
        string text = File.ReadAllText(strfilename);
        text = text.Replace("style = \"#\" >", textBox1.Text);
        ////style = "#" > 
        richTextBox1.Text = text;
    }


    //supposed to output the array to a message box
    MessageBox.Show(output);
}

4 个解决方案

#1


1  

It doesn't look like you need a regex, try String.Split

它看起来不像你需要一个正则表达式,尝试String.Split

It also looks like you are parsing HTML with RegEx, consider using a HTML parser.

看起来您正在使用RegEx解析HTML,请考虑使用HTML解析器。

If the files are large avoid ReadAllText as this will load the entire file into memory consider StreamReader - but an HTML parser would be better.

如果文件很大,请避免使用ReadAllText,因为这会将整个文件加载到内存中考虑StreamReader - 但HTML解析器会更好。

And do you really need to update the richTextBox1.Text property each time around the loop?

你真的需要每次循环更新richTextBox1.Text属性吗?

You are reading the entire file each time around the loop? Why?

你每次循环阅读整个文件?为什么?

Move everything that doesn't absolutely have to happen inside the loops outside (before or after).

移动一切并非绝对必须发生在外部循环内(之前或之后)。

#2


0  

The only obvious improvement would be to use String.Split in favor of RegEx. It is sufficient here and performs much better. So the first change I would make would be to change;

唯一明显的改进是使用String.Split来支持RegEx。这里就够了,表现得更好。所以我要做的第一个改变就是改变;

  Regex regex = new Regex("</div>");
  string[] hands = regex.Split(divs[i].ToString());

to

  string[] hands = divs[i].Split(new string[] { "</div>" }, StringSplitOptions.None);

As pointed out in the other answer File.ReadAllText has some limitations that the StreamReader approach does not. However, you'll only run into them if your files are extremely large or the system the software is running on is lacking in RAM. In the main code base I currently work on File.ReadAllText and File.ReadAllLines are almost always the method used to read files.

正如在另一个答案中指出的,File.ReadAllText具有StreamReader方法所没有的一些限制。但是,如果文件非常大或者运行软件的系统缺少RAM,则只会遇到它们。在主代码库中,我目前使用File.ReadAllText和File.ReadAllLines几乎总是用于读取文件的方法。

#3


0  

See what effect each of these has on performance:

看看这些对性能的影响:

  1. Move the 'File.ReadAllText out of that loop. It gets the same text every time.

    将'File.ReadAllText移出该循环。它每次都会得到相同的文本。

  2. Move the 'regex = new Regex' outside of the loop, and use the 'compiled' overload.

    将'regex = new Regex'移到循环之外,并使用'compiled'重载。

  3. use a stringbuilder instead of string concatenation.

    使用stringbuilder而不是字符串连接。

  4. use the stopwatch classes to get timing for parts of the code to see where the time is spent.

    使用秒表类来获取部分代码的时间,以查看花费的时间。

  5. Watch out for Cthulu.

    提防Cthulu。

#4


0  

You're using a for loop in the other and do Read a file in the loop. It's not a good idea for your situation. You can use stack to recognize when is appear close tag "".

你在另一个中使用for循环并在循环中读取一个文件。对你的情况来说,这不是一个好主意。您可以使用堆栈来识别何时出现关闭标记“”。

#1


1  

It doesn't look like you need a regex, try String.Split

它看起来不像你需要一个正则表达式,尝试String.Split

It also looks like you are parsing HTML with RegEx, consider using a HTML parser.

看起来您正在使用RegEx解析HTML,请考虑使用HTML解析器。

If the files are large avoid ReadAllText as this will load the entire file into memory consider StreamReader - but an HTML parser would be better.

如果文件很大,请避免使用ReadAllText,因为这会将整个文件加载到内存中考虑StreamReader - 但HTML解析器会更好。

And do you really need to update the richTextBox1.Text property each time around the loop?

你真的需要每次循环更新richTextBox1.Text属性吗?

You are reading the entire file each time around the loop? Why?

你每次循环阅读整个文件?为什么?

Move everything that doesn't absolutely have to happen inside the loops outside (before or after).

移动一切并非绝对必须发生在外部循环内(之前或之后)。

#2


0  

The only obvious improvement would be to use String.Split in favor of RegEx. It is sufficient here and performs much better. So the first change I would make would be to change;

唯一明显的改进是使用String.Split来支持RegEx。这里就够了,表现得更好。所以我要做的第一个改变就是改变;

  Regex regex = new Regex("</div>");
  string[] hands = regex.Split(divs[i].ToString());

to

  string[] hands = divs[i].Split(new string[] { "</div>" }, StringSplitOptions.None);

As pointed out in the other answer File.ReadAllText has some limitations that the StreamReader approach does not. However, you'll only run into them if your files are extremely large or the system the software is running on is lacking in RAM. In the main code base I currently work on File.ReadAllText and File.ReadAllLines are almost always the method used to read files.

正如在另一个答案中指出的,File.ReadAllText具有StreamReader方法所没有的一些限制。但是,如果文件非常大或者运行软件的系统缺少RAM,则只会遇到它们。在主代码库中,我目前使用File.ReadAllText和File.ReadAllLines几乎总是用于读取文件的方法。

#3


0  

See what effect each of these has on performance:

看看这些对性能的影响:

  1. Move the 'File.ReadAllText out of that loop. It gets the same text every time.

    将'File.ReadAllText移出该循环。它每次都会得到相同的文本。

  2. Move the 'regex = new Regex' outside of the loop, and use the 'compiled' overload.

    将'regex = new Regex'移到循环之外,并使用'compiled'重载。

  3. use a stringbuilder instead of string concatenation.

    使用stringbuilder而不是字符串连接。

  4. use the stopwatch classes to get timing for parts of the code to see where the time is spent.

    使用秒表类来获取部分代码的时间,以查看花费的时间。

  5. Watch out for Cthulu.

    提防Cthulu。

#4


0  

You're using a for loop in the other and do Read a file in the loop. It's not a good idea for your situation. You can use stack to recognize when is appear close tag "".

你在另一个中使用for循环并在循环中读取一个文件。对你的情况来说,这不是一个好主意。您可以使用堆栈来识别何时出现关闭标记“”。