从文件中提取特定数据

时间:2023-02-04 10:44:34

I have a file(as text) from which I should extract all included files names. The implementation should be in C++ What I thought to do is to read a file line after line (getline) check if line starts from #include - how do I do that?(there could be leading spaces) and after that extract the file name - the string which is between " " - how do I do that? Thanks

我有一个文件(作为文本),我应该从中提取所有包含的文件名。实现应该是在C ++中我想要做的是读取一行一行(getline)检查行是否从#include开始 - 我该怎么做?(可能有前导空格)然后提取文件名 - “”之间的字符串 - 我该怎么做?谢谢

1 个解决方案

#1


0  

First of all an include may be placed either between <> or between "" so you have to check for both.

首先,包含可以放在<>之间或“”之间,所以你必须检查两者。

Next - best way to do what you try is using regular expressions. If you are using c++11 they will be part of the standard and you should be good to go. If that is not the case maybe try boost::regex for instance.

下一步 - 你尝试做的最好方法是使用正则表达式。如果您使用的是c ++ 11,它们将成为标准的一部分,您应该很高兴。如果不是这种情况,可以试试boost :: regex。

If using regular expressions is not an option you will have to implement your own parsing but this will be error-prone and most probably you will miss some freaky edge-case. If I have to I would implement such parsing by going over the file line by line. Than I would check if the line contains #include as substring and verify there are only whitespaces before it. If the line does have the mentioned substring I would than iterate over the remaining part of the line to parse the part between <> or "".

如果使用正则表达式不是一个选项,你将不得不实现自己的解析,但这很容易出错,很可能你会错过一些怪异的边缘情况。如果必须的话,我将通过逐行遍历文件来实现这样的解析。我会检查该行是否包含#include作为子字符串并验证它之前是否只有空格。如果该行确实具有所提到的子字符串,那么我将迭代该行的剩余部分以解析<>或“”之间的部分。

#1


0  

First of all an include may be placed either between <> or between "" so you have to check for both.

首先,包含可以放在<>之间或“”之间,所以你必须检查两者。

Next - best way to do what you try is using regular expressions. If you are using c++11 they will be part of the standard and you should be good to go. If that is not the case maybe try boost::regex for instance.

下一步 - 你尝试做的最好方法是使用正则表达式。如果您使用的是c ++ 11,它们将成为标准的一部分,您应该很高兴。如果不是这种情况,可以试试boost :: regex。

If using regular expressions is not an option you will have to implement your own parsing but this will be error-prone and most probably you will miss some freaky edge-case. If I have to I would implement such parsing by going over the file line by line. Than I would check if the line contains #include as substring and verify there are only whitespaces before it. If the line does have the mentioned substring I would than iterate over the remaining part of the line to parse the part between <> or "".

如果使用正则表达式不是一个选项,你将不得不实现自己的解析,但这很容易出错,很可能你会错过一些怪异的边缘情况。如果必须的话,我将通过逐行遍历文件来实现这样的解析。我会检查该行是否包含#include作为子字符串并验证它之前是否只有空格。如果该行确实具有所提到的子字符串,那么我将迭代该行的剩余部分以解析<>或“”之间的部分。