C#Regex语法帮助解析字符串

时间:2022-08-26 12:07:29

I have this

我有这个

var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");

So this gets me values until it hits a W in the string correct? - I need it to stop at a W OR S. I have tried a few different ways but I am not getting it to work. Anyone got some info?

所以这会得到我的值,直到它击中字符串中的W是正确的? - 我需要它停在W或S.我尝试了几种不同的方法,但我没有让它工作。有人得到一些信息?

More info:

            record = record.Replace(" ", "").Replace("\r\n", "").Replace("-", "/");
            var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
            string strStartDate = regex.Match(record).Groups[1].ToString();
            string strEndDate = regex.Match(record).Groups[2].ToString();
            string Status = regex.Match(record).Groups[3].ToString().ToUpper().StartsWith("In") ? "Inactive" : "Active";

I am trying to parse a big string of values, I only want 3 things - Start Date, End Date, and Status (active/inactive). However there are 3 different values for each (3 start dates, 3 end dates, 3 status')

我试图解析一大串值,我只想要3件事 - 开始日期,结束日期和状态(活动/非活动)。但是每个有3个不同的值(3个开始日期,3个结束日期,3个状态')

First 2 string go like this

前两个字符串是这样的

"Start Date: 

 2014-09-08 



End Date: 

 2017-09-07 



Warranty Type: 

 XXX 



Status: 

 Active 



Serial Number/IMEI: 

 XXXXXXXXXXX









Description:



XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

The 3rd string is like this

第三个字符串是这样的

"Start Date: 

 2014-09-08 



End Date: 

 2017-09-07 



Status: 

 Active 



Warranty Upgrade Code:



SVC_PRIORITY"

On the last string it will not display the dates because of the W.* after end date im guessing

在最后一个字符串上,由于在猜测结束日期之后的W. *,它不会显示日期

I am not getting the 2 dates on the last string

我没有在最后一个字符串上获得2个日期

4 个解决方案

#1


1  

EDIT Please try the function to parse using regex:

编辑请尝试使用正则表达式解析函数:

using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;

private static List<string[]> parseString(string input)
{
    var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
    return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });

}

// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);

Output:

C#Regex语法帮助解析字符串

EDIT2 For OP's situation, you could call the function from inside the foreach loop like this:

EDIT2对于OP的情况,你可以从foreach循环内部调用函数,如下所示:

foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
    if (el.GetAttribute("className") == "fluid-row Borderfluid")
    {
        string record = el.InnerText;
        //if record is the string to parse
        var result = parseString(record);
        var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
        MessageBox.Show(result_string);
    }
}

#2


1  

No need to replace the new lines in your example

无需替换示例中的新行

List<string> resultList = new List<string>();

var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";

Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value);
    resultList.Add(matchResult.Groups[2].Value);
    resultList.Add(matchResult.Groups[4].Value);
    matchResult = matchResult.NextMatch();
} 

#3


0  

You may replace your code with the following one (see IDEONE demo):

您可以使用以下代码替换您的代码(请参阅IDEONE演示):

var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ")            // Remove excessive whitespace
        .Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
        .ToDictionary(n => n[0], n => n[1]);              // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
    Console.WriteLine(res["Start Date"]);
    strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
    Console.WriteLine(res["Warranty Type"]);
    Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
    Console.WriteLine(res["End Date"]);
    strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
    Console.WriteLine(res["Status"]);
    string Status = res["Status"];
}

Note that the best approach is to declare your own class with the fields like WarrantyType, StartDate, etc. and initialize that right in the LINQ code.

请注意,最好的方法是使用WarrantyType,StartDate等字段声明自己的类,并在LINQ代码中对其进行初始化。

#4


0  

Avoid .* its a catch all which gets regex pattern creators in trouble. Instead create the pattern to match to a specific pattern in the data which always occurs in the data.

避免使用。*它可以解决所有让正则表达式模式创建者陷入困境的问题。而是创建模式以匹配数据中始终出现在数据中的特定模式。

Your pattern are the two dates of \d\d\d\d-\d\d-\d\d\d\d the rest is anchor text, which should be used as static anchors which can be skipped.

您的模式是两个日期\ d \ d \ d \ d- \ d \ d- \ d \ d \ d \ d其余是锚文本,应该用作可以跳过的静态锚点。

Here is an example where it looks for the date patterns. Once found regex puts it into named match capture groups (?<GroupNameHere>...) and Linq extracts each match into a dynamic entity and parses the date times.

这是一个查找日期模式的示例。一旦发现正则表达式将其置于命名匹配捕获组(? ...)中,Linq将每个匹配提取到动态实体中并解析日期时间。

Data

Note the first date is reversed as per your example

请注意,根据您的示例,第一个日期是相反的

var data = @"Start Date:

 2014-09-08

End Date:

 2017-09-07

Status:

 Active

Start Date:

 2014-09-09

End Date:

 2017-09-10

Status:

 In-Active
 ";

Pattern

string pattern = @"
^Start\sDate:\s+                     # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d)         # actual start date pattern
\s+                                  # a lot of space including \r\n
^End\sDate:\s+                       # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d)           # pattern of the end date.
\s+                                  # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
 ";

Processing

// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
                             RegexOptions.Multiline       |
                             RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select (mt => new
            {
                Status    = mt.Groups["Status"].Value,
                StartDate = DateTime.Parse(mt.Groups["Start"].Value),
                EndDate   = DateTime.Parse(mt.Groups["End"].Value)
            })

Result

C#Regex语法帮助解析字符串

#1


1  

EDIT Please try the function to parse using regex:

编辑请尝试使用正则表达式解析函数:

using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;

private static List<string[]> parseString(string input)
{
    var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
    return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });

}

// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);

Output:

C#Regex语法帮助解析字符串

EDIT2 For OP's situation, you could call the function from inside the foreach loop like this:

EDIT2对于OP的情况,你可以从foreach循环内部调用函数,如下所示:

foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
    if (el.GetAttribute("className") == "fluid-row Borderfluid")
    {
        string record = el.InnerText;
        //if record is the string to parse
        var result = parseString(record);
        var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
        MessageBox.Show(result_string);
    }
}

#2


1  

No need to replace the new lines in your example

无需替换示例中的新行

List<string> resultList = new List<string>();

var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";

Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value);
    resultList.Add(matchResult.Groups[2].Value);
    resultList.Add(matchResult.Groups[4].Value);
    matchResult = matchResult.NextMatch();
} 

#3


0  

You may replace your code with the following one (see IDEONE demo):

您可以使用以下代码替换您的代码(请参阅IDEONE演示):

var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ")            // Remove excessive whitespace
        .Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
        .ToDictionary(n => n[0], n => n[1]);              // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
    Console.WriteLine(res["Start Date"]);
    strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
    Console.WriteLine(res["Warranty Type"]);
    Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
    Console.WriteLine(res["End Date"]);
    strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
    Console.WriteLine(res["Status"]);
    string Status = res["Status"];
}

Note that the best approach is to declare your own class with the fields like WarrantyType, StartDate, etc. and initialize that right in the LINQ code.

请注意,最好的方法是使用WarrantyType,StartDate等字段声明自己的类,并在LINQ代码中对其进行初始化。

#4


0  

Avoid .* its a catch all which gets regex pattern creators in trouble. Instead create the pattern to match to a specific pattern in the data which always occurs in the data.

避免使用。*它可以解决所有让正则表达式模式创建者陷入困境的问题。而是创建模式以匹配数据中始终出现在数据中的特定模式。

Your pattern are the two dates of \d\d\d\d-\d\d-\d\d\d\d the rest is anchor text, which should be used as static anchors which can be skipped.

您的模式是两个日期\ d \ d \ d \ d- \ d \ d- \ d \ d \ d \ d其余是锚文本,应该用作可以跳过的静态锚点。

Here is an example where it looks for the date patterns. Once found regex puts it into named match capture groups (?<GroupNameHere>...) and Linq extracts each match into a dynamic entity and parses the date times.

这是一个查找日期模式的示例。一旦发现正则表达式将其置于命名匹配捕获组(? ...)中,Linq将每个匹配提取到动态实体中并解析日期时间。

Data

Note the first date is reversed as per your example

请注意,根据您的示例,第一个日期是相反的

var data = @"Start Date:

 2014-09-08

End Date:

 2017-09-07

Status:

 Active

Start Date:

 2014-09-09

End Date:

 2017-09-10

Status:

 In-Active
 ";

Pattern

string pattern = @"
^Start\sDate:\s+                     # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d)         # actual start date pattern
\s+                                  # a lot of space including \r\n
^End\sDate:\s+                       # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d)           # pattern of the end date.
\s+                                  # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
 ";

Processing

// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
                             RegexOptions.Multiline       |
                             RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select (mt => new
            {
                Status    = mt.Groups["Status"].Value,
                StartDate = DateTime.Parse(mt.Groups["Start"].Value),
                EndDate   = DateTime.Parse(mt.Groups["End"].Value)
            })

Result

C#Regex语法帮助解析字符串