文本文件到字符串数组?

时间:2022-02-13 01:49:14

I want to load a txt file into an array like file() does in php. I want to be able to access different lines like array[N] (which should contain the entire line N from the file), then I would need to remove each array element after using it to the array will decrease size until reaching 0 and the program will finish. I know how to read the file but I have no idea how to fill a string array to be used like I said. I am using gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) to compile.

我想将一个txt文件加载到一个数组,比如php中。我希望能够访问像数组[N]这样的不同的行(它应该包含来自文件的整行N),然后在将数组元素使用到数组后,需要删除每个数组元素,直到达到0,程序将完成。我知道如何读取文件,但我不知道如何填充字符串数组,就像我说的那样。我正在使用gcc 4.4.3版(Ubuntu 4.4.3-4ubuntu5)编译。

How can I achieve this?

我如何做到这一点?

3 个解决方案

#1


1  

I suggest you read your file into an array of pointers to strings which would allow you to index and delete the lines as you have specified. There are efficiency tradeoffs to consider with this approach as to whether you count the number of lines ahead of time or allocate/extend the array as you read each line. I would opt for the former.

我建议您将文件读入一个指向字符串的指针数组,这样您就可以按指定的方式索引和删除这些行。这种方法可以考虑效率的权衡,比如在阅读每一行时,是否要提前数出行数,或者分配/扩展数组。我会选择前者。

  1. Read the file, counting the number of line terminators you see (ether \n or \r\n)
  2. 读取文件,计算您看到的行终止符的数量(以太\n或\r\n)
  3. Allocate a an array of char * of that size
  4. 分配一个相同大小的char *数组
  5. Re-read the file, line by line, using malloc() to allocate a buffer for each and pointed to by the next array index
  6. 使用malloc()逐行重新读取文件,为每个文件分配一个缓冲区,并由下一个数组索引指向它

For your operations:

对你的操作:

  • Indexing is just array[N]
  • 索引数组只是[N]
  • Deleting is just freeing the buffer indexed by array[N] and setting the array[N] entry to NULL
  • 删除只是释放数组[N]索引的缓冲区,并将数组[N]项设置为NULL

UPDATE:

更新:

The more memory efficient approach suggested by @r.. and @marc-van-kempen is a good optimization over malloc()ing each line at a time, that is, slurp the file into a single buffer and replace all the line terminators with '\0'

@r提出的内存效率更高的方法。@marc-van-kempen是对malloc()每次对每一行进行优化的一个很好的优化,也就是说,将文件拖放到一个缓冲区中,并用'\0'替换所有的行终止符

Assuming you've done that and you have a big buffer as char *filebuf and the number of lines is int num_lines then you can allocate your indexing array something like this:

假设你已经这样做了并且你有一个很大的缓冲区叫char *filebuf并且行数是int num_lines那么你可以分配你的数组索引像这样:

char *lines[] = (char **)malloc(num_lines + 1); // Allocates array of pointers to strings
lines[num_lines] = NULL; // Terminate the array as another way to stop you running off the end

char *p = filebuf; // I'm assuming the first char of the file is the start of the first line
int n;
for (n = 0; n < num_lines; n++) {
  lines[i] = p;
  while (*p++ != '\0') ; // Seek to the end of this line
  if (n < num_lines - 1) {
    while (*p++ == '\0')  ; // Seek to the start the next line (if there is one)
  }
}

With a single buffer approach "deleting" a line is merely a case of setting lines[n] to NULL. There is no free()

使用单一的缓冲区方法“删除”一行仅仅是将行[n]设置为NULL的情况。没有免费的()

#2


2  

Proposed algorithm:

算法:

  1. Use fseek, ftell, fseek to seek to end, determine file length, and seek back to beginning.
  2. 使用fseek、ftell、fseek查找结束,确定文件长度,并返回到开始。
  3. malloc a buffer big enough for the whole file plus null-termination.
  4. malloc是一个足够大的缓冲区,用于整个文件加上空终止。
  5. Use fread to read the whole file into the buffer, then write a 0 byte at the end.
  6. 使用fread将整个文件读入缓冲区,然后在末尾写入一个0字节。
  7. Loop through the buffer byte-by-byte and count newlines.
  8. 循环遍历每个字节的缓冲区并计数换行符。
  9. Use malloc to allocate that number + 1 char * pointers.
  10. 使用malloc分配那个数字+ 1 char *指针。
  11. Loop through the buffer again, assigning the first pointer to point to the beginning of the buffer, and successive pointers to point to the byte after a newline. Replace the newline bytes themselves with 0 (null) bytes in the process.
  12. 再次循环遍历缓冲区,分配第一个指针指向缓冲区的开始,并在换行后连续指针指向字节。在进程中以0 (null)字节替换换行字节本身。

One optimization: if you don't need random access to the lines (indexing them by line number), do away with the pointer array and just replace all the newlines with 0 bytes. Then s+=strlen(s)+1; advances to the next line. You'll need to add some check to make sure you don't advance past the end (or beginning if you're doing this in reverse) of the buffer.

一种优化方法是:如果不需要对这些行进行随机访问(按行号对它们进行索引),那么就不要使用指针数组,只需用0字节替换所有的新行。然后s + = strlen(s)+ 1;前进到下一行。您将需要添加一些检查,以确保您不会在缓冲区的末尾(或者如果您正在反向执行此操作,则从开始)前进。

Either way, this method is very efficient (no memory fragmentation) but has a couple drawbacks:

无论哪种方式,这种方法都非常有效(没有内存碎片),但也有一些缺点:

  • You can't individually free lines; you can only free the whole buffer once you finish.
  • 你不能单独的*行;您只能在完成之后释放整个缓冲区。
  • You have to overwrite the newlines. Some people prefer to have them kept in the in-memory structure.
  • 你必须重写新行。有些人喜欢将它们保存在内存结构中。
  • If the file ended with a newline, the last "line" in your pointer array will be zero-length. IMO this is the sane interpretation of text files, but some people prefer considering the empty string after the last newline a non-line and considering the last proper line "incomplete" if it doesn't end with a newline.
  • 如果文件以换行结束,则指针数组中的最后一行将为零。在我看来,这是对文本文件的合理解释,但有些人更喜欢把最后一行后面的空字符串看作非行,如果最后一行没有以换行结束,则认为最后一行是“不完整的”。

#3


1  

Two slightly different ways to achieve this, one is more memory friendly, the other more cpu friendly.

有两种稍微不同的方法来实现这一点,一个是内存更友好,另一个是对cpu更友好。

I memory friendly

我记忆友好

  1. Open the file and get its size (use fstat() and friends) ==> size
  2. 打开文件并获取它的大小(使用fstat()和friends) =>大小
  3. allocate a buffer of that size ==> char buf[size];
  4. 分配该大小的缓冲区=> char buf[size];
  5. scan through the buffer counting the '\n' (or '\n\r' == DOS or '\r' == MAC) ==> N
  6. 扫描计算“\n”(或“\n\r”= DOS或“\r”= MAC) => n的缓冲区
  7. Allocate an array: char *lines[N]
  8. 分配数组:char *lines[N]
  9. scan through the buffer again and point lines[0] to &buf[0], scan for the first '\n' or '\r' and set it to '\0' (delimiting the string), set lines[1] to the first character after that that is not '\n' or '\r', etc.
  10. 再次扫描缓冲区,将行[0]指向&buf[0],扫描第一个'\n'或'\r',并将其设置为'\0'(分隔字符串),然后将行[1]设置为第一个字符,而不是'\n'或'\r'等等。

II cpu friendly

二世cpu友好

  1. Create a linked list structure (if you don't know how to do this or don't want to, have a look at 'glib' (not glibc!), a utility companion of gtk.
  2. 创建一个链表结构(如果您不知道如何做或者不想做,请查看gtk的实用伙伴“glib”(不是glibc!)
  3. Open the file and start reading the lines using fgets(), malloc'ing each line as you go along.
  4. 打开文件并开始使用fgets()读取行,并在执行过程中对每一行进行malloc处理。
  5. Keep a linked list of lines ==> list and count the total number of lines
  6. 保存一个line ==>链表,并计算行总数
  7. Allocate an array: char *lines[N];
  8. 分配数组:char *lines[N];
  9. Go through the linked list and assign the pointer to each element to its corresponding array element
  10. 遍历链表并将指针分配给每个元素的对应数组元素
  11. Free the linked list (not its elements!)
  12. 释放链表(不是它的元素!)

#1


1  

I suggest you read your file into an array of pointers to strings which would allow you to index and delete the lines as you have specified. There are efficiency tradeoffs to consider with this approach as to whether you count the number of lines ahead of time or allocate/extend the array as you read each line. I would opt for the former.

我建议您将文件读入一个指向字符串的指针数组,这样您就可以按指定的方式索引和删除这些行。这种方法可以考虑效率的权衡,比如在阅读每一行时,是否要提前数出行数,或者分配/扩展数组。我会选择前者。

  1. Read the file, counting the number of line terminators you see (ether \n or \r\n)
  2. 读取文件,计算您看到的行终止符的数量(以太\n或\r\n)
  3. Allocate a an array of char * of that size
  4. 分配一个相同大小的char *数组
  5. Re-read the file, line by line, using malloc() to allocate a buffer for each and pointed to by the next array index
  6. 使用malloc()逐行重新读取文件,为每个文件分配一个缓冲区,并由下一个数组索引指向它

For your operations:

对你的操作:

  • Indexing is just array[N]
  • 索引数组只是[N]
  • Deleting is just freeing the buffer indexed by array[N] and setting the array[N] entry to NULL
  • 删除只是释放数组[N]索引的缓冲区,并将数组[N]项设置为NULL

UPDATE:

更新:

The more memory efficient approach suggested by @r.. and @marc-van-kempen is a good optimization over malloc()ing each line at a time, that is, slurp the file into a single buffer and replace all the line terminators with '\0'

@r提出的内存效率更高的方法。@marc-van-kempen是对malloc()每次对每一行进行优化的一个很好的优化,也就是说,将文件拖放到一个缓冲区中,并用'\0'替换所有的行终止符

Assuming you've done that and you have a big buffer as char *filebuf and the number of lines is int num_lines then you can allocate your indexing array something like this:

假设你已经这样做了并且你有一个很大的缓冲区叫char *filebuf并且行数是int num_lines那么你可以分配你的数组索引像这样:

char *lines[] = (char **)malloc(num_lines + 1); // Allocates array of pointers to strings
lines[num_lines] = NULL; // Terminate the array as another way to stop you running off the end

char *p = filebuf; // I'm assuming the first char of the file is the start of the first line
int n;
for (n = 0; n < num_lines; n++) {
  lines[i] = p;
  while (*p++ != '\0') ; // Seek to the end of this line
  if (n < num_lines - 1) {
    while (*p++ == '\0')  ; // Seek to the start the next line (if there is one)
  }
}

With a single buffer approach "deleting" a line is merely a case of setting lines[n] to NULL. There is no free()

使用单一的缓冲区方法“删除”一行仅仅是将行[n]设置为NULL的情况。没有免费的()

#2


2  

Proposed algorithm:

算法:

  1. Use fseek, ftell, fseek to seek to end, determine file length, and seek back to beginning.
  2. 使用fseek、ftell、fseek查找结束,确定文件长度,并返回到开始。
  3. malloc a buffer big enough for the whole file plus null-termination.
  4. malloc是一个足够大的缓冲区,用于整个文件加上空终止。
  5. Use fread to read the whole file into the buffer, then write a 0 byte at the end.
  6. 使用fread将整个文件读入缓冲区,然后在末尾写入一个0字节。
  7. Loop through the buffer byte-by-byte and count newlines.
  8. 循环遍历每个字节的缓冲区并计数换行符。
  9. Use malloc to allocate that number + 1 char * pointers.
  10. 使用malloc分配那个数字+ 1 char *指针。
  11. Loop through the buffer again, assigning the first pointer to point to the beginning of the buffer, and successive pointers to point to the byte after a newline. Replace the newline bytes themselves with 0 (null) bytes in the process.
  12. 再次循环遍历缓冲区,分配第一个指针指向缓冲区的开始,并在换行后连续指针指向字节。在进程中以0 (null)字节替换换行字节本身。

One optimization: if you don't need random access to the lines (indexing them by line number), do away with the pointer array and just replace all the newlines with 0 bytes. Then s+=strlen(s)+1; advances to the next line. You'll need to add some check to make sure you don't advance past the end (or beginning if you're doing this in reverse) of the buffer.

一种优化方法是:如果不需要对这些行进行随机访问(按行号对它们进行索引),那么就不要使用指针数组,只需用0字节替换所有的新行。然后s + = strlen(s)+ 1;前进到下一行。您将需要添加一些检查,以确保您不会在缓冲区的末尾(或者如果您正在反向执行此操作,则从开始)前进。

Either way, this method is very efficient (no memory fragmentation) but has a couple drawbacks:

无论哪种方式,这种方法都非常有效(没有内存碎片),但也有一些缺点:

  • You can't individually free lines; you can only free the whole buffer once you finish.
  • 你不能单独的*行;您只能在完成之后释放整个缓冲区。
  • You have to overwrite the newlines. Some people prefer to have them kept in the in-memory structure.
  • 你必须重写新行。有些人喜欢将它们保存在内存结构中。
  • If the file ended with a newline, the last "line" in your pointer array will be zero-length. IMO this is the sane interpretation of text files, but some people prefer considering the empty string after the last newline a non-line and considering the last proper line "incomplete" if it doesn't end with a newline.
  • 如果文件以换行结束,则指针数组中的最后一行将为零。在我看来,这是对文本文件的合理解释,但有些人更喜欢把最后一行后面的空字符串看作非行,如果最后一行没有以换行结束,则认为最后一行是“不完整的”。

#3


1  

Two slightly different ways to achieve this, one is more memory friendly, the other more cpu friendly.

有两种稍微不同的方法来实现这一点,一个是内存更友好,另一个是对cpu更友好。

I memory friendly

我记忆友好

  1. Open the file and get its size (use fstat() and friends) ==> size
  2. 打开文件并获取它的大小(使用fstat()和friends) =>大小
  3. allocate a buffer of that size ==> char buf[size];
  4. 分配该大小的缓冲区=> char buf[size];
  5. scan through the buffer counting the '\n' (or '\n\r' == DOS or '\r' == MAC) ==> N
  6. 扫描计算“\n”(或“\n\r”= DOS或“\r”= MAC) => n的缓冲区
  7. Allocate an array: char *lines[N]
  8. 分配数组:char *lines[N]
  9. scan through the buffer again and point lines[0] to &buf[0], scan for the first '\n' or '\r' and set it to '\0' (delimiting the string), set lines[1] to the first character after that that is not '\n' or '\r', etc.
  10. 再次扫描缓冲区,将行[0]指向&buf[0],扫描第一个'\n'或'\r',并将其设置为'\0'(分隔字符串),然后将行[1]设置为第一个字符,而不是'\n'或'\r'等等。

II cpu friendly

二世cpu友好

  1. Create a linked list structure (if you don't know how to do this or don't want to, have a look at 'glib' (not glibc!), a utility companion of gtk.
  2. 创建一个链表结构(如果您不知道如何做或者不想做,请查看gtk的实用伙伴“glib”(不是glibc!)
  3. Open the file and start reading the lines using fgets(), malloc'ing each line as you go along.
  4. 打开文件并开始使用fgets()读取行,并在执行过程中对每一行进行malloc处理。
  5. Keep a linked list of lines ==> list and count the total number of lines
  6. 保存一个line ==>链表,并计算行总数
  7. Allocate an array: char *lines[N];
  8. 分配数组:char *lines[N];
  9. Go through the linked list and assign the pointer to each element to its corresponding array element
  10. 遍历链表并将指针分配给每个元素的对应数组元素
  11. Free the linked list (not its elements!)
  12. 释放链表(不是它的元素!)