txt文件将c ++解析为更有效的向量

时间:2021-11-10 15:54:44

My program uses ifstream() and getline() to parse a text file in to objects that are two vectors deep. i.e vector inside vector. The inner vector contains over 250000 string objects once the text file is finished loading.

我的程序使用ifstream()和getline()将文本文件解析为两个向量深的对象。即矢量内矢量。文本文件加载完成后,内部向量包含超过250000个字符串对象。

this is painfully slow. Is there an STD alternative that is more efficient than using ifstream() and getline() ?

这很痛苦。是否存在比使用ifstream()和getline()更有效的STD替代方案?

Thanks

UPDATE:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>

using namespace std;

class Word
{
private:
    string moniker = "";
    vector <string> definition;
    string type = "";

public:
    void setMoniker(string m) { this->moniker = m; }
    void setDefinition(string d) { this->definition.push_back(d); }
    void setType(string t) { this->type = t; }
    int getDefinitionSize() { return this->definition.size(); }

    string getMoniker() { return this->moniker; }
    void printDefinition()
    {
        for (int i = 0; i < definition.size(); i++)
        {
            cout << definition[i] << endl;
        }

    }


    string getType() { return this->type; }
};

class Dictionary
{
private:
    vector<Word> Words;

public:
    void addWord(Word w) { this->Words.push_back(w); }
    Word getWord(int i) { return this->Words[i]; }
    int getTotalNumberOfWords() { return this->Words.size(); }
    void loadDictionary(string f)
    {
        const regex _IS_DEF("[\.]|[\ ]"),
            _IS_TYPE("^misc$|^n$|^adj$|^v$|^adv$|^prep$|^pn$|^n_and_v$"),
            _IS_NEWLINE("\n");

        string line;

        ifstream dict(f);

        string m, t, d = "";

        while (dict.is_open())
        {
            while (getline(dict, line))
            {
                if (regex_search(line, _IS_DEF))
                {
                    d = line;
                }
                else if (regex_search(line, _IS_TYPE))
                {
                    t = line;
                }
                else if (!(line == ""))
                {
                    m = line;
                }
                else
                {
                    Word w;
                    w.setMoniker(m);
                    w.setType(t);
                    w.setDefinition(d);
                    this->addWord(w);
                }
            }
            dict.close();
        }
    }
};



int main()
{
    Dictionary dictionary;
    dictionary.loadDictionary("dictionary.txt");
    return 0;
}

1 个解决方案

#1


0  

You should reduce your memory allocations. Having a vector of vectors is usually not a good idea, because every inner vector does its own new and delete.

你应该减少你的内存分配。拥有矢量矢量通常不是一个好主意,因为每个内部矢量都有自己的新和删除。

You should reserve() the approximate number of elements you need in the vector at the start.

您应该在开始时保留()矢量中所需元素的大致数量。

You should use fgets() if you don't actually need to extract std::string to get your work done. For example if the objects can be parsed from char arrays, do that. Make sure to read into the same string buffer every time, rather than creating new buffers.

如果你实际上不需要提取std :: string来完成你的工作,你应该使用fgets()。例如,如果可以从char数组中解析对象,那么就这样做。确保每次都读入相同的字符串缓冲区,而不是创建新的缓冲区。

And most important of all, use a profiler.

最重要的是,使用分析器。

#1


0  

You should reduce your memory allocations. Having a vector of vectors is usually not a good idea, because every inner vector does its own new and delete.

你应该减少你的内存分配。拥有矢量矢量通常不是一个好主意,因为每个内部矢量都有自己的新和删除。

You should reserve() the approximate number of elements you need in the vector at the start.

您应该在开始时保留()矢量中所需元素的大致数量。

You should use fgets() if you don't actually need to extract std::string to get your work done. For example if the objects can be parsed from char arrays, do that. Make sure to read into the same string buffer every time, rather than creating new buffers.

如果你实际上不需要提取std :: string来完成你的工作,你应该使用fgets()。例如,如果可以从char数组中解析对象,那么就这样做。确保每次都读入相同的字符串缓冲区,而不是创建新的缓冲区。

And most important of all, use a profiler.

最重要的是,使用分析器。