Winforms C#使用正则表达式搜索文本文件,找到匹配项,并在文本框中获取该下一行匹配项

时间:2022-06-01 21:28:26

Lngton KY 40511 ARRIER — LEAVE IF NO RESPONSE LODRESS SERVICE REQUESTED

Lngton KY 40511 ARRIER - 如果没有响应,则要求保留服务要求

604159595920

YAWAR MUHAMMAD YOUNUS

YAWAR MUHAMMAD YOUNUS

1263 S CHILLICOTHE RD STE

1263 S CHILLICOTHE RD STE

AURORA OH 43192—8552 695—81

AURORA OH 43192-8552 695-81

Basically what I want to do is to search for this : 0604156595920 (Some random number of length 12 which will be always above the customer name) and when the match is success then I want to get the next line as an output : YAWAR MUHAMMAD YOUNUS (Customer Name).

基本上我想要做的是搜索:0604156595920(长度为12的随机数,总是高于客户名称),当匹配成功时,我想把下一行作为输出:YAWAR MUHAMMAD YOUNUS (顾客姓名)。

I am doing it like this because the pattern of the text will be different in each case, the line number will be different and obviously the customer name will be different also.

我是这样做的,因为文本的模式在每种情况下都会有所不同,行号会有所不同,显然客户名称也会不同。

Here is my code :

这是我的代码:

using System.Text.RegularExpressions;


string strRegex = @"(?<=\d{12}\n)(\w.*)";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase |RegexOptions.Multiline | RegexOptions.ExplicitCapture);



        foreach (string line in File.ReadLines("SampleText.txt"))
        {   
            Match match = myRegex.Match(line);
            if (match.Success)
            {
                MessageBox.Show(match.Value);   
            }
            else
            {
                MessageBox.Show("Fail");
            }

        }

This code works fine when the numbers and the name are in same line. But doesn't work for different line.

当数字和名称在同一行时,此代码可以正常工作。但不适用于不同的线路。

Another Regex which I tried :

我试过的另一个正则表达式:

(?<=(^\d{13}\n))[A-Za-z].*

The above regex works fine in an online regex tester.

上面的正则表达式在在线正则表达式测试中运行良好。

Any help and suggestion will be appreciated. Thanks :) Sorry in advance if this is a repeated question. I searched alot on the web but couldn't find any specific answer to my problem.

任何帮助和建议将不胜感激。谢谢:)如果这是一个重复的问题,请提前抱歉。我在网上搜索了很多但是找不到我问题的具体答案。

EDIT : Screenshot of regexTester -> https://drive.google.com/open?id=0B6ynC-W5aF41cGhJREhtNGhHRnc

编辑:regexTester的屏幕截图 - > https://drive.google.com/open?id=0B6ynC-W5aF41cGhJREhtNGhHRnc

3 个解决方案

#1


1  

Non-regex - LINQ - Solution

You can use LINQ to get the line you need with:

您可以使用LINQ获取所需的行:

var s = @"Lngton KY 40511 ARRIER — LEAVE IF NO RESPONSE LODRESS SERVICE REQUESTED

604159595920

YAWAR MUHAMMAD YOUNUS

1263 S CHILLICOTHE RD STE

AURORA OH 43192—8552 695—81";
var res = s.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split into lines (you already have it)
             .SkipWhile(p => !(p.Trim().Length == 12 && p.Trim().All(m => Char.IsDigit(m))))
             .Skip(1)
             .Take(1)
             .FirstOrDefault();
if (!string.IsNullOrWhiteSpace(res))
    Console.WriteLine(res);

See the IDEONE demo

请参阅IDEONE演示

Just omit s declaration and replace s.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) in my code with your File.ReadLines("SampleText.txt").

只需省略s声明并用File.ReadLines(“SampleText.txt”)替换我的代码中的s.Split(new [] {“\ r”,“\ n”},StringSplitOptions.RemoveEmptyEntries)。

Explanation:

  • s.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) - splits the string into lines (you read the file into lines (just make sure there are no empty lines!)
  • s.Split(new [] {“\ r”,“\ n”},StringSplitOptions.RemoveEmptyEntries) - 将字符串拆分为行(您将文件读入行(只需确保没有空行!)

  • SkipWhile - above skips all lines that are not 12 chars long (!(p.Trim().Length == 12) and that are not all digits (p.Trim().All(m => Char.IsDigit(m))`)
  • SkipWhile - 上面跳过所有非12个字符长的行(!(p.Trim()。Length == 12)并且不是所有数字(p.Trim()。All(m => Char.IsDigit(m) )`)

  • Skip(1) - skips the line with the 12-digit number
  • 跳过(1) - 跳过12位数字的行

  • Take(1) - just takes the current line (the one after the line with just 12-digit number)
  • 拿(1) - 只取当前行(只有12位数的行后面的行)

  • FirstOrDefault() - gets the item as a string object or null if not found.
  • FirstOrDefault() - 将项目作为字符串对象获取,如果未找到则为null。

Regex Solution

var res = Regex.Match(s, @"^\p{Zs}*\d{12}\p{Zs}*(?:\r?\n)+(?<MYLINE>.*)", RegexOptions.Multiline);
if (res.Success)
    Console.WriteLine(res.Groups["MYLINE"].Value);

See another IDEONE demo

查看另一个IDEONE演示

If there can be a # before then number, replace the regex with

如果在那之前可以有#,那么用正则表达式替换正则表达式

@"^\p{Zs}*#?\d{12}\p{Zs}*(?:\r?\n)+(?<MYLINE>.*)"
          ^

It will match both 12 digits numbers with and without #.

它将匹配带有和不带#的12位数字。

Explanation:

  • ^ - start of a line (becuase RegexOptions.Multiline redefines the ^ to match a line start rather than string start)
  • ^ - 一行的开始(因为RegexOptions.Multiline重新定义^以匹配行开头而不是字符串开始)

  • \p{Zs}* - 0+ horzontal spaces (so as not to jump to another line, can be replaced with [^\S\r\n] to match all horizontal whitespace)
  • \ p {Zs} * - 0+水平空间(以便不跳转到另一条线,可以用[^ \ S \ r \ n]替换以匹配所有水平空格)

  • #? - one or zero #
  • #? - 一个或零#

  • \d{12} - 12 digits
  • \ d {12} - 12位数

  • \p{Zs}* - see above
  • \ p {Zs} * - 见上文

  • (?:\r?\n)+ - 1 or more linebreaks (either CRLF or just LF style)
  • (?:\ r?\ n)+ - 1个或多个换行符(CRLF或只是LF样式)

  • (?<MYLINE>.*) - Capture Group "MYLINE": any 0+ characters other than a newline
  • (? 。*) - 捕获组“MYLINE”:除换行符之外的任何0+个字符

#2


1  

string strRegex = @"\d{12}\s+(.+)\n$";

string strRegex = @“\ d {12} \ s +(。+)\ n $”;

modifiers - Multiline

修饰符 - 多行

string s = File.ReadAllText(@"C:\Temp\SampleText.txt");
MatchCollection mc = Regex.Matches(s, @"\d{12}\s+(.+)\n", RegexOptions.Multiline);
foreach(var i in mc)
{
  Match m = Regex.Match(i.ToString(), @"\s+(.+)*", RegexOptions.Multiline);
            Console.WriteLine(m.Groups[1]);
}

#3


0  

Try using the foloweing 'regex':

尝试使用以下'正则表达式':

(?<=\d{12})[\n\s]*(.+)

#1


1  

Non-regex - LINQ - Solution

You can use LINQ to get the line you need with:

您可以使用LINQ获取所需的行:

var s = @"Lngton KY 40511 ARRIER — LEAVE IF NO RESPONSE LODRESS SERVICE REQUESTED

604159595920

YAWAR MUHAMMAD YOUNUS

1263 S CHILLICOTHE RD STE

AURORA OH 43192—8552 695—81";
var res = s.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split into lines (you already have it)
             .SkipWhile(p => !(p.Trim().Length == 12 && p.Trim().All(m => Char.IsDigit(m))))
             .Skip(1)
             .Take(1)
             .FirstOrDefault();
if (!string.IsNullOrWhiteSpace(res))
    Console.WriteLine(res);

See the IDEONE demo

请参阅IDEONE演示

Just omit s declaration and replace s.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) in my code with your File.ReadLines("SampleText.txt").

只需省略s声明并用File.ReadLines(“SampleText.txt”)替换我的代码中的s.Split(new [] {“\ r”,“\ n”},StringSplitOptions.RemoveEmptyEntries)。

Explanation:

  • s.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) - splits the string into lines (you read the file into lines (just make sure there are no empty lines!)
  • s.Split(new [] {“\ r”,“\ n”},StringSplitOptions.RemoveEmptyEntries) - 将字符串拆分为行(您将文件读入行(只需确保没有空行!)

  • SkipWhile - above skips all lines that are not 12 chars long (!(p.Trim().Length == 12) and that are not all digits (p.Trim().All(m => Char.IsDigit(m))`)
  • SkipWhile - 上面跳过所有非12个字符长的行(!(p.Trim()。Length == 12)并且不是所有数字(p.Trim()。All(m => Char.IsDigit(m) )`)

  • Skip(1) - skips the line with the 12-digit number
  • 跳过(1) - 跳过12位数字的行

  • Take(1) - just takes the current line (the one after the line with just 12-digit number)
  • 拿(1) - 只取当前行(只有12位数的行后面的行)

  • FirstOrDefault() - gets the item as a string object or null if not found.
  • FirstOrDefault() - 将项目作为字符串对象获取,如果未找到则为null。

Regex Solution

var res = Regex.Match(s, @"^\p{Zs}*\d{12}\p{Zs}*(?:\r?\n)+(?<MYLINE>.*)", RegexOptions.Multiline);
if (res.Success)
    Console.WriteLine(res.Groups["MYLINE"].Value);

See another IDEONE demo

查看另一个IDEONE演示

If there can be a # before then number, replace the regex with

如果在那之前可以有#,那么用正则表达式替换正则表达式

@"^\p{Zs}*#?\d{12}\p{Zs}*(?:\r?\n)+(?<MYLINE>.*)"
          ^

It will match both 12 digits numbers with and without #.

它将匹配带有和不带#的12位数字。

Explanation:

  • ^ - start of a line (becuase RegexOptions.Multiline redefines the ^ to match a line start rather than string start)
  • ^ - 一行的开始(因为RegexOptions.Multiline重新定义^以匹配行开头而不是字符串开始)

  • \p{Zs}* - 0+ horzontal spaces (so as not to jump to another line, can be replaced with [^\S\r\n] to match all horizontal whitespace)
  • \ p {Zs} * - 0+水平空间(以便不跳转到另一条线,可以用[^ \ S \ r \ n]替换以匹配所有水平空格)

  • #? - one or zero #
  • #? - 一个或零#

  • \d{12} - 12 digits
  • \ d {12} - 12位数

  • \p{Zs}* - see above
  • \ p {Zs} * - 见上文

  • (?:\r?\n)+ - 1 or more linebreaks (either CRLF or just LF style)
  • (?:\ r?\ n)+ - 1个或多个换行符(CRLF或只是LF样式)

  • (?<MYLINE>.*) - Capture Group "MYLINE": any 0+ characters other than a newline
  • (? 。*) - 捕获组“MYLINE”:除换行符之外的任何0+个字符

#2


1  

string strRegex = @"\d{12}\s+(.+)\n$";

string strRegex = @“\ d {12} \ s +(。+)\ n $”;

modifiers - Multiline

修饰符 - 多行

string s = File.ReadAllText(@"C:\Temp\SampleText.txt");
MatchCollection mc = Regex.Matches(s, @"\d{12}\s+(.+)\n", RegexOptions.Multiline);
foreach(var i in mc)
{
  Match m = Regex.Match(i.ToString(), @"\s+(.+)*", RegexOptions.Multiline);
            Console.WriteLine(m.Groups[1]);
}

#3


0  

Try using the foloweing 'regex':

尝试使用以下'正则表达式':

(?<=\d{12})[\n\s]*(.+)