使用字符串分隔符(标准c++)解析(拆分)在c++中的字符串[复制]

时间:2022-03-11 22:07:50

This question already has an answer here:

这个问题已经有了答案:

Possible Duplicate:
Splitting a string in C++

可能的重复:在c++中分割一个字符串。

I am parsing a string in C++ using the following:

我使用以下方法解析c++中的一个字符串:

string parsed,input="text to be parsed";
stringstream input_stringstream(input);

if(getline(input_stringstream,parsed,' '))
{
     // do some processing.
}

Parsing with a single char delimiter is fine. But what if I want to use a string as delimiter.

使用单个char分隔符解析很好。但是如果我想用字符串作为分隔符。

Example: I want to split:

例子:我想分手:

scott>=tiger

with >= as delimiter so that I can get scott and tiger.

用>=分隔符,这样我就可以得到scott和tiger了。

11 个解决方案

#1


327  

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

您可以使用std:: find()函数来查找字符串分隔符的位置,然后使用std::string::substr()来获得一个令牌。

Example:

例子:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

    函数的作用是:返回字符串中第一次出现str的位置,如果没有找到字符串,则返回npos。

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.

    substr(size_t pos = 0, size_t n = npos)函数返回对象的子字符串,从位置pos和长度npos开始。


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

如果您有多个分隔符,在您提取了一个令牌之后,您可以删除它(包括分隔符)以进行后续的提取(如果您想保留原始字符串,只需使用s = s)。substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

通过这种方式,您可以轻松地循环获取每个令牌。

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    std::cout << token << std::endl;
    s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

输出:

scott
tiger
mushroom

#2


35  

This method uses std::string::find without mutating the original string by remembering the beginning and end of the previous substring token.

这个方法使用std::string::通过记住前一个substring令牌的开始和结束而没有对原始字符串进行突变。

#include <iostream>
#include <string>

int main()
{
    std::string s = "scott>=tiger";
    std::string delim = ">=";

    auto start = 0U;
    auto end = s.find(delim);
    while (end != std::string::npos)
    {
        std::cout << s.substr(start, end - start) << std::endl;
        start = end + delim.length();
        end = s.find(delim, start);
    }

    std::cout << s.substr(start, end);
}

#3


12  

You can use next function to split string:

您可以使用下一个函数来拆分字符串:

vector<string> split(const string& str, const string& delim)
{
    vector<string> tokens;
    size_t prev = 0, pos = 0;
    do
    {
        pos = str.find(delim, prev);
        if (pos == string::npos) pos = str.length();
        string token = str.substr(prev, pos-prev);
        if (!token.empty()) tokens.push_back(token);
        prev = pos + delim.length();
    }
    while (pos < str.length() && prev < str.length());
    return tokens;
}

#4


11  

strtok allows you to pass in multiple chars as delimiters. I bet if you passed in ">=" your example string would be split correctly (even though the > and = are counted as individual delimiters).

strtok允许您将多个字符作为分隔符传递。我打赌,如果您传入“>=”,那么您的示例字符串将被正确地拆分(即使>和=被视为单独的分隔符)。

EDIT if you don't want to use c_str() to convert from string to char*, you can use substr and find_first_of to tokenize.

如果您不想使用c_str()将字符串转换为char*,那么可以使用substr和find_first_of来标记。

string token, mystring("scott>=tiger");
while(token != mystring){
  token = mystring.substr(0,mystring.find_first_of(">="));
  mystring = mystring.substr(mystring.find_first_of(">=") + 1);
  printf("%s ",token.c_str());
}

#5


6  

This code splits lines from text, and add everyone into a vector.

这段代码将代码从文本中分离出来,并将每个人添加到一个向量中。

vector<string> split(char *phrase, string delimiter){
    vector<string> list;
    string s = string(phrase);
    size_t pos = 0;
    string token;
    while ((pos = s.find(delimiter)) != string::npos) {
        token = s.substr(0, pos);
        list.push_back(token);
        s.erase(0, pos + delimiter.length());
    }
    return list;
}

Called by:

调用:

vector<string> listFilesMax = split(buffer, "\n");

#6


4  

I would use boost::tokenizer. Here's documentation explaining how to make an appropriate tokenizer function: http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm

我想使用boost::分词器。这里的文档说明如何制作一个合适的记号赋予器功能:http://www.boost.org/doc/libs/1 _52_ 0/libs/tokenizer/tokenizerfunction.htm。

Here's one that works for your case.

这里有一个适合你的情况。

struct my_tokenizer_func
{
    template<typename It>
    bool operator()(It& next, It end, std::string & tok)
    {
        if (next == end)
            return false;
        char const * del = ">=";
        auto pos = std::search(next, end, del, del + 2);
        tok.assign(next, pos);
        next = pos;
        if (next != end)
            std::advance(next, 2);
        return true;
    }

    void reset() {}
};

int main()
{
    std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
    for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
        std::cout << i << '\n';
}

#7


3  

Here's my take on this. It handles the edge cases and takes an optional parameter to remove empty entries from the results.

这是我的看法。它处理边界情况,并获取一个可选参数,以删除结果中的空条目。

bool endsWith(const std::string& s, const std::string& suffix)
{
    return s.size() >= suffix.size() &&
           s.substr(s.size() - suffix.size()) == suffix;
}

std::vector<std::string> split(const std::string& s, const std::string& delimiter, const bool& removeEmptyEntries = false)
{
    std::vector<std::string> tokens;

    for (size_t start = 0, end; start < s.length(); start = end + delimiter.length())
    {
         size_t position = s.find(delimiter, start);
         end = position != string::npos ? position : s.length();

         std::string token = s.substr(start, end - start);
         if (!removeEmptyEntries || !token.empty())
         {
             tokens.push_back(token);
         }
    }

    if (!removeEmptyEntries &&
        (s.empty() || endsWith(s, delimiter)))
    {
        tokens.push_back("");
    }

    return tokens;
}

Examples

例子

split("a-b-c", "-"); // [3]("a","b","c")

split("a--c", "-"); // [3]("a","","c")

split("-b-", "-"); // [3]("","b","")

split("--c--", "-"); // [5]("","","c","","")

split("--c--", "-", true); // [1]("c")

split("a", "-"); // [1]("a")

split("", "-"); // [1]("")

split("", "-", true); // [0]()

#8


1  

If you do not want to modify the string (as in the answer by Vincenzo Pii) and want to output the last token as well, you may want to use this approach:

如果您不想修改字符串(如Vincenzo Pii的答案),并且希望输出最后的标记,那么您可能希望使用以下方法:

inline std::vector<std::string> splitString( const std::string &s, const std::string &delimiter ){
    std::vector<std::string> ret;
    size_t start = 0;
    size_t end = 0;
    size_t len = 0;
    std::string token;
    do{ end = s.find(delimiter,start); 
        len = end - start;
        token = s.substr(start, len);
        ret.emplace_back( token );
        start += len + delimiter.length();
        std::cout << token << std::endl;
    }while ( end != std::string::npos );
    return ret;
}

#9


1  

For string delimiter
Split string based on delimiter string. Such as splitting string "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih" based on delimiter string "-+", output will be {"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}

对于基于分隔符字符串的字符串分隔符拆分字符串。例如,“adsf-+qwret-+nvfkbdsj-+”(基于delimiter字符串“-+”),输出将是{“adsf”,“qwret”,“nvfkbdsj”,“orthdfjgh”,“dfjrleih”}

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

// for string delimiter
vector<string> split(string s, string delimiter) {
    size_t pos_start = 0, pos_end, delim_len = delimiter.length();
    string token;
    vector<string> res;
    while ((pos_end = s.find(delimiter, pos_start)) != string::npos) {
        token = s.substr(pos_start, pos_end - pos_start);
        pos_start = pos_end + delim_len;
        res.push_back(token);
    }
    res.push_back(s.substr(pos_start));
    return res;
}

int main() {
    string str = "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih";
    string delimiter = "-+";
    vector<string> v = split(str, delimiter);
    for (auto i : v) cout << i << endl;
    return 0;
}



For single character delimiter

对单个字符分隔符

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

vector<string> split(const string &s, char delim) {
    vector<string> result;
    stringstream ss(s);
    string item;
    while (getline(ss, item, delim)) {
        result.push_back(item);
    }
    return result;
}

int main() {
    string str = "adsf+qwer+poui+fdgh";
    vector<string> v = split(str, '+');
    for (auto i : v) cout << i << endl;
    return 0;
}

#10


0  

#include<iostream>
#include<algorithm>
using namespace std;

int split_count(string str,char delimit){
return count(str.begin(),str.end(),delimit);
}

void split(string str,char delimit,string res[]){
int a=0,i=0;
while(a<str.size()){
res[i]=str.substr(a,str.find(delimit));
a+=res[i].size()+1;
i++;
}
}

int main(){

string a="abc.xyz.mno.def";
int x=split_count(a,'.')+1;
string res[x];
split(a,'.',res);

for(int i=0;i<x;i++)
cout<<res[i]<<endl;
  return 0;
}

P.S: Works only if the lengths of the strings after splitting are equal

P。S:只在分裂后的字符串长度相等时才有效。

#11


-3  

std::vector<std::string> split(const std::string& s, char c) {
  std::vector<std::string> v;
  unsigned int ii = 0;
  unsigned int j = s.find(c);
  while (j < s.length()) {
    v.push_back(s.substr(i, j - i));
    i = ++j;
    j = s.find(c, j);
    if (j >= s.length()) {
      v.push_back(s.substr(i, s,length()));
      break;
    }
  }
  return v;
}

#1


327  

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

您可以使用std:: find()函数来查找字符串分隔符的位置,然后使用std::string::substr()来获得一个令牌。

Example:

例子:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

    函数的作用是:返回字符串中第一次出现str的位置,如果没有找到字符串,则返回npos。

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.

    substr(size_t pos = 0, size_t n = npos)函数返回对象的子字符串,从位置pos和长度npos开始。


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

如果您有多个分隔符,在您提取了一个令牌之后,您可以删除它(包括分隔符)以进行后续的提取(如果您想保留原始字符串,只需使用s = s)。substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

通过这种方式,您可以轻松地循环获取每个令牌。

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    std::cout << token << std::endl;
    s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

输出:

scott
tiger
mushroom

#2


35  

This method uses std::string::find without mutating the original string by remembering the beginning and end of the previous substring token.

这个方法使用std::string::通过记住前一个substring令牌的开始和结束而没有对原始字符串进行突变。

#include <iostream>
#include <string>

int main()
{
    std::string s = "scott>=tiger";
    std::string delim = ">=";

    auto start = 0U;
    auto end = s.find(delim);
    while (end != std::string::npos)
    {
        std::cout << s.substr(start, end - start) << std::endl;
        start = end + delim.length();
        end = s.find(delim, start);
    }

    std::cout << s.substr(start, end);
}

#3


12  

You can use next function to split string:

您可以使用下一个函数来拆分字符串:

vector<string> split(const string& str, const string& delim)
{
    vector<string> tokens;
    size_t prev = 0, pos = 0;
    do
    {
        pos = str.find(delim, prev);
        if (pos == string::npos) pos = str.length();
        string token = str.substr(prev, pos-prev);
        if (!token.empty()) tokens.push_back(token);
        prev = pos + delim.length();
    }
    while (pos < str.length() && prev < str.length());
    return tokens;
}

#4


11  

strtok allows you to pass in multiple chars as delimiters. I bet if you passed in ">=" your example string would be split correctly (even though the > and = are counted as individual delimiters).

strtok允许您将多个字符作为分隔符传递。我打赌,如果您传入“>=”,那么您的示例字符串将被正确地拆分(即使>和=被视为单独的分隔符)。

EDIT if you don't want to use c_str() to convert from string to char*, you can use substr and find_first_of to tokenize.

如果您不想使用c_str()将字符串转换为char*,那么可以使用substr和find_first_of来标记。

string token, mystring("scott>=tiger");
while(token != mystring){
  token = mystring.substr(0,mystring.find_first_of(">="));
  mystring = mystring.substr(mystring.find_first_of(">=") + 1);
  printf("%s ",token.c_str());
}

#5


6  

This code splits lines from text, and add everyone into a vector.

这段代码将代码从文本中分离出来,并将每个人添加到一个向量中。

vector<string> split(char *phrase, string delimiter){
    vector<string> list;
    string s = string(phrase);
    size_t pos = 0;
    string token;
    while ((pos = s.find(delimiter)) != string::npos) {
        token = s.substr(0, pos);
        list.push_back(token);
        s.erase(0, pos + delimiter.length());
    }
    return list;
}

Called by:

调用:

vector<string> listFilesMax = split(buffer, "\n");

#6


4  

I would use boost::tokenizer. Here's documentation explaining how to make an appropriate tokenizer function: http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm

我想使用boost::分词器。这里的文档说明如何制作一个合适的记号赋予器功能:http://www.boost.org/doc/libs/1 _52_ 0/libs/tokenizer/tokenizerfunction.htm。

Here's one that works for your case.

这里有一个适合你的情况。

struct my_tokenizer_func
{
    template<typename It>
    bool operator()(It& next, It end, std::string & tok)
    {
        if (next == end)
            return false;
        char const * del = ">=";
        auto pos = std::search(next, end, del, del + 2);
        tok.assign(next, pos);
        next = pos;
        if (next != end)
            std::advance(next, 2);
        return true;
    }

    void reset() {}
};

int main()
{
    std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
    for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
        std::cout << i << '\n';
}

#7


3  

Here's my take on this. It handles the edge cases and takes an optional parameter to remove empty entries from the results.

这是我的看法。它处理边界情况,并获取一个可选参数,以删除结果中的空条目。

bool endsWith(const std::string& s, const std::string& suffix)
{
    return s.size() >= suffix.size() &&
           s.substr(s.size() - suffix.size()) == suffix;
}

std::vector<std::string> split(const std::string& s, const std::string& delimiter, const bool& removeEmptyEntries = false)
{
    std::vector<std::string> tokens;

    for (size_t start = 0, end; start < s.length(); start = end + delimiter.length())
    {
         size_t position = s.find(delimiter, start);
         end = position != string::npos ? position : s.length();

         std::string token = s.substr(start, end - start);
         if (!removeEmptyEntries || !token.empty())
         {
             tokens.push_back(token);
         }
    }

    if (!removeEmptyEntries &&
        (s.empty() || endsWith(s, delimiter)))
    {
        tokens.push_back("");
    }

    return tokens;
}

Examples

例子

split("a-b-c", "-"); // [3]("a","b","c")

split("a--c", "-"); // [3]("a","","c")

split("-b-", "-"); // [3]("","b","")

split("--c--", "-"); // [5]("","","c","","")

split("--c--", "-", true); // [1]("c")

split("a", "-"); // [1]("a")

split("", "-"); // [1]("")

split("", "-", true); // [0]()

#8


1  

If you do not want to modify the string (as in the answer by Vincenzo Pii) and want to output the last token as well, you may want to use this approach:

如果您不想修改字符串(如Vincenzo Pii的答案),并且希望输出最后的标记,那么您可能希望使用以下方法:

inline std::vector<std::string> splitString( const std::string &s, const std::string &delimiter ){
    std::vector<std::string> ret;
    size_t start = 0;
    size_t end = 0;
    size_t len = 0;
    std::string token;
    do{ end = s.find(delimiter,start); 
        len = end - start;
        token = s.substr(start, len);
        ret.emplace_back( token );
        start += len + delimiter.length();
        std::cout << token << std::endl;
    }while ( end != std::string::npos );
    return ret;
}

#9


1  

For string delimiter
Split string based on delimiter string. Such as splitting string "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih" based on delimiter string "-+", output will be {"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}

对于基于分隔符字符串的字符串分隔符拆分字符串。例如,“adsf-+qwret-+nvfkbdsj-+”(基于delimiter字符串“-+”),输出将是{“adsf”,“qwret”,“nvfkbdsj”,“orthdfjgh”,“dfjrleih”}

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

// for string delimiter
vector<string> split(string s, string delimiter) {
    size_t pos_start = 0, pos_end, delim_len = delimiter.length();
    string token;
    vector<string> res;
    while ((pos_end = s.find(delimiter, pos_start)) != string::npos) {
        token = s.substr(pos_start, pos_end - pos_start);
        pos_start = pos_end + delim_len;
        res.push_back(token);
    }
    res.push_back(s.substr(pos_start));
    return res;
}

int main() {
    string str = "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih";
    string delimiter = "-+";
    vector<string> v = split(str, delimiter);
    for (auto i : v) cout << i << endl;
    return 0;
}



For single character delimiter

对单个字符分隔符

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

vector<string> split(const string &s, char delim) {
    vector<string> result;
    stringstream ss(s);
    string item;
    while (getline(ss, item, delim)) {
        result.push_back(item);
    }
    return result;
}

int main() {
    string str = "adsf+qwer+poui+fdgh";
    vector<string> v = split(str, '+');
    for (auto i : v) cout << i << endl;
    return 0;
}

#10


0  

#include<iostream>
#include<algorithm>
using namespace std;

int split_count(string str,char delimit){
return count(str.begin(),str.end(),delimit);
}

void split(string str,char delimit,string res[]){
int a=0,i=0;
while(a<str.size()){
res[i]=str.substr(a,str.find(delimit));
a+=res[i].size()+1;
i++;
}
}

int main(){

string a="abc.xyz.mno.def";
int x=split_count(a,'.')+1;
string res[x];
split(a,'.',res);

for(int i=0;i<x;i++)
cout<<res[i]<<endl;
  return 0;
}

P.S: Works only if the lengths of the strings after splitting are equal

P。S:只在分裂后的字符串长度相等时才有效。

#11


-3  

std::vector<std::string> split(const std::string& s, char c) {
  std::vector<std::string> v;
  unsigned int ii = 0;
  unsigned int j = s.find(c);
  while (j < s.length()) {
    v.push_back(s.substr(i, j - i));
    i = ++j;
    j = s.find(c, j);
    if (j >= s.length()) {
      v.push_back(s.substr(i, s,length()));
      break;
    }
  }
  return v;
}