在c#中解析这个字符串的最好方法是什么?

时间:2022-09-13 08:05:27

I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:

我有一个从另一个系统中读取的字符串。它基本上是一个长字符串,表示由中间的空格分隔的键值对的列表。它看起来像这样:

 key:value[space]key:value[space]key:value[space]

So I wrote this code to parse it:

所以我写了这段代码来解析它:

string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
     string key = token.split(':')[0];
     string value = token.split(':')[1];
     .  . . . 
}

The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.

现在的问题是,其中的一些值有空格,所以我的“简单”在顶部的分割不再有效。我想看看如何才能解析键值对的列表(给定空格作为分隔符),因为我知道值字段中也可能有空格,因为split看起来它不能再工作了。

NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.

注意:我现在确认键中不会有空格,所以我只需要关心值。道歉的混乱。

9 个解决方案

#1


22  

Use this regular expression:

使用这个正则表达式:

\w+:[\w\s]+(?![\w+:])

I tested it on

我测试它

test:testvalue test2:test value test3:testvalue3

It returns three matches:

它返回三场:

test:testvalue
test2:test value
test3:testvalue3

You can change \w to any character set that can occur in your input.

您可以更改\w到任何可以在您的输入中发生的字符集。

Code for testing this:

测试代码:

var regex = new Regex(@"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";

foreach (Match match in regex.Matches(test))
{
    var key = match.Value.Split(':')[0];
    var value = match.Value.Split(':')[1];

    Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();

As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.

正如神智清醒者Wonko指出的,这个正则表达式将在带有:的值上失败。如果您预测这种情况,请使用\w+:[\w:]+?(?![\w+:])作为正则表达式。当值中的冒号前面有空格时,仍然会失败……我会考虑这个问题的解决方案。

#2


5  

This cannot work without changing your split from a space to something else such as a "|".

这不能在不改变你从空间到其他东西的情况下工作,比如“|”。

Consider this:

考虑一下:

Alfred Bester:Alfred Bester Alfred:Alfred Bester

阿尔弗雷德·贝斯特:阿尔弗雷德·贝斯特:贝斯特

  • Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
  • 这把钥匙是“阿尔弗雷德·贝斯特”和“阿尔弗雷德”,还是“阿尔弗雷德”的钥匙?

#3


4  

string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";

foreach (Match match in Regex.Matches(input, @"(\w+):([^:]+)(?![\w+:])"))
{
   Console.WriteLine("{0} = {1}", 
       match.Groups[1].Value, 
       match.Groups[2].Value
      );
}

Gives you:

给你:

foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan

#4


2  

You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.

您可以尝试Url编码空间(键和值而不是:符号)之间的内容,但这需要您对输入方法进行控制。

Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.

或者您可以简单地使用另一种格式(如XML或JSON),但是同样需要对输入格式进行控制。

If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.

如果你不能控制输入格式,你可以使用一个正则表达式来搜索单个空格,在空格后面加上:。

Update (Thanks Jon Grant) It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.

更新(感谢Jon Grant)看起来您可以在键和值中使用空格。如果是这样的话,您需要认真地重新考虑您的策略,因为即使是Regex也无济于事。

#5


1  

string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);

The first will produce an array:

第一个将产生一个数组:

"key:value", "key:value"

Then an array of arrays:

然后是数组:

{ "key", "value" }, { "key", "value" }

And then a dictionary:

然后一个字典:

"key" => "value", "key" => "value"

Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().

注意,字典 不允许重复键,在这种情况下会引发异常。如果可能,使用ToLookup()。 ,v>

#6


1  

Using a regular expression can solve your problem:

使用正则表达式可以解决您的问题:

private void DoSplit(string str)
{
    str += str.Trim() + " ";
    string patterns = @"\w+:([\w+\s*])+[^!\w+:]";
    var r = new System.Text.RegularExpressions.Regex(patterns);
    var ms = r.Matches(str);
    foreach (System.Text.RegularExpressions.Match item in ms)
    {
        string[] s = item.Value.Split(new char[] { ':' });
        //Do something
    }
}

#7


0  

This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:

这段代码将完成它(给出下面的规则)。它解析键和值,并以双精度 数据结构返回它们。我在最后添加了一些代码,假设您的示例是整个字符串/流的最后一个值将附加一个[空格]: ,>

private Dictionary<string, string> ParseKeyValues(string input)
        {
            Dictionary<string, string> items = new Dictionary<string, string>();

            string[] parts = input.Split(':');

            string key = parts[0];
            string value;

            int currentIndex = 1;

            while (currentIndex < parts.Length-1)
            {
                int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
                value = parts[currentIndex].Substring(0, indexOfLastSpace);
                items.Add(key, value);
                key = parts[currentIndex].Substring(indexOfLastSpace + 1);
                currentIndex++;
            }
            value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);


            items.Add(key, parts[parts.Length-1]);

            return items;

        }

Note: this algorithm assumes the following rules:

注:本算法采用以下规则:

  1. No spaces in the values
  2. 值中没有空格
  3. No colons in the keys
  4. 钥匙上没有冒号。
  5. No colons in the values
  6. 值中没有冒号

#8


0  

Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):

没有任何Regex或字符串concat,并且作为一个可枚举(它假定键没有空格,但是值可以):

    public static IEnumerable<KeyValuePair<string, string>> Split(string text)
    {
        if (text == null)
            yield break;

        int keyStart = 0;
        int keyEnd = -1;
        int lastSpace = -1;
        for(int i = 0; i < text.Length; i++)
        {
            if (text[i] == ' ')
            {
                lastSpace = i;
                continue;
            }

            if (text[i] == ':')
            {
                if (lastSpace >= 0)
                {
                    yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
                    keyStart = lastSpace + 1;
                }
                keyEnd = i;
                continue;
            }
        }
        if (keyEnd >= 0)
            yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
    }

#9


0  

I guess you could take your method and expand upon it slightly to deal with this stuff...

我猜你可以用你的方法稍微扩展一下来处理这个问题……

Kind of pseudocode:

的伪代码:

List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
    // We need to deal with the special case of the last item, 
    // or if the following item does not contain a colon.
    if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
    {
        parsedTokens.Add(tokens[i]);
    }
    else
    {
        // This bit needs to be refined to deal with values with multiple spaces...
        parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
    }
}

Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...

另一种方法是在冒号上分裂……这样,您的第一个数组项将是第一个键的名称,第二个项将是第一个键的值,然后是第二个键的名称(可以使用LastIndexOf将其拆分),等等。如果值可以包含冒号,或者键可以包含空格,这显然会变得非常混乱,但在这种情况下,你会很不走运……

#1


22  

Use this regular expression:

使用这个正则表达式:

\w+:[\w\s]+(?![\w+:])

I tested it on

我测试它

test:testvalue test2:test value test3:testvalue3

It returns three matches:

它返回三场:

test:testvalue
test2:test value
test3:testvalue3

You can change \w to any character set that can occur in your input.

您可以更改\w到任何可以在您的输入中发生的字符集。

Code for testing this:

测试代码:

var regex = new Regex(@"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";

foreach (Match match in regex.Matches(test))
{
    var key = match.Value.Split(':')[0];
    var value = match.Value.Split(':')[1];

    Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();

As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.

正如神智清醒者Wonko指出的,这个正则表达式将在带有:的值上失败。如果您预测这种情况,请使用\w+:[\w:]+?(?![\w+:])作为正则表达式。当值中的冒号前面有空格时,仍然会失败……我会考虑这个问题的解决方案。

#2


5  

This cannot work without changing your split from a space to something else such as a "|".

这不能在不改变你从空间到其他东西的情况下工作,比如“|”。

Consider this:

考虑一下:

Alfred Bester:Alfred Bester Alfred:Alfred Bester

阿尔弗雷德·贝斯特:阿尔弗雷德·贝斯特:贝斯特

  • Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
  • 这把钥匙是“阿尔弗雷德·贝斯特”和“阿尔弗雷德”,还是“阿尔弗雷德”的钥匙?

#3


4  

string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";

foreach (Match match in Regex.Matches(input, @"(\w+):([^:]+)(?![\w+:])"))
{
   Console.WriteLine("{0} = {1}", 
       match.Groups[1].Value, 
       match.Groups[2].Value
      );
}

Gives you:

给你:

foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan

#4


2  

You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.

您可以尝试Url编码空间(键和值而不是:符号)之间的内容,但这需要您对输入方法进行控制。

Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.

或者您可以简单地使用另一种格式(如XML或JSON),但是同样需要对输入格式进行控制。

If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.

如果你不能控制输入格式,你可以使用一个正则表达式来搜索单个空格,在空格后面加上:。

Update (Thanks Jon Grant) It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.

更新(感谢Jon Grant)看起来您可以在键和值中使用空格。如果是这样的话,您需要认真地重新考虑您的策略,因为即使是Regex也无济于事。

#5


1  

string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);

The first will produce an array:

第一个将产生一个数组:

"key:value", "key:value"

Then an array of arrays:

然后是数组:

{ "key", "value" }, { "key", "value" }

And then a dictionary:

然后一个字典:

"key" => "value", "key" => "value"

Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().

注意,字典 不允许重复键,在这种情况下会引发异常。如果可能,使用ToLookup()。 ,v>

#6


1  

Using a regular expression can solve your problem:

使用正则表达式可以解决您的问题:

private void DoSplit(string str)
{
    str += str.Trim() + " ";
    string patterns = @"\w+:([\w+\s*])+[^!\w+:]";
    var r = new System.Text.RegularExpressions.Regex(patterns);
    var ms = r.Matches(str);
    foreach (System.Text.RegularExpressions.Match item in ms)
    {
        string[] s = item.Value.Split(new char[] { ':' });
        //Do something
    }
}

#7


0  

This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:

这段代码将完成它(给出下面的规则)。它解析键和值,并以双精度 数据结构返回它们。我在最后添加了一些代码,假设您的示例是整个字符串/流的最后一个值将附加一个[空格]: ,>

private Dictionary<string, string> ParseKeyValues(string input)
        {
            Dictionary<string, string> items = new Dictionary<string, string>();

            string[] parts = input.Split(':');

            string key = parts[0];
            string value;

            int currentIndex = 1;

            while (currentIndex < parts.Length-1)
            {
                int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
                value = parts[currentIndex].Substring(0, indexOfLastSpace);
                items.Add(key, value);
                key = parts[currentIndex].Substring(indexOfLastSpace + 1);
                currentIndex++;
            }
            value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);


            items.Add(key, parts[parts.Length-1]);

            return items;

        }

Note: this algorithm assumes the following rules:

注:本算法采用以下规则:

  1. No spaces in the values
  2. 值中没有空格
  3. No colons in the keys
  4. 钥匙上没有冒号。
  5. No colons in the values
  6. 值中没有冒号

#8


0  

Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):

没有任何Regex或字符串concat,并且作为一个可枚举(它假定键没有空格,但是值可以):

    public static IEnumerable<KeyValuePair<string, string>> Split(string text)
    {
        if (text == null)
            yield break;

        int keyStart = 0;
        int keyEnd = -1;
        int lastSpace = -1;
        for(int i = 0; i < text.Length; i++)
        {
            if (text[i] == ' ')
            {
                lastSpace = i;
                continue;
            }

            if (text[i] == ':')
            {
                if (lastSpace >= 0)
                {
                    yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
                    keyStart = lastSpace + 1;
                }
                keyEnd = i;
                continue;
            }
        }
        if (keyEnd >= 0)
            yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
    }

#9


0  

I guess you could take your method and expand upon it slightly to deal with this stuff...

我猜你可以用你的方法稍微扩展一下来处理这个问题……

Kind of pseudocode:

的伪代码:

List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
    // We need to deal with the special case of the last item, 
    // or if the following item does not contain a colon.
    if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
    {
        parsedTokens.Add(tokens[i]);
    }
    else
    {
        // This bit needs to be refined to deal with values with multiple spaces...
        parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
    }
}

Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...

另一种方法是在冒号上分裂……这样,您的第一个数组项将是第一个键的名称,第二个项将是第一个键的值,然后是第二个键的名称(可以使用LastIndexOf将其拆分),等等。如果值可以包含冒号,或者键可以包含空格,这显然会变得非常混乱,但在这种情况下,你会很不走运……