优化我的read()循环C(两个循环合二为一)

时间:2023-01-18 19:55:28

I need to read files and store them in mainbuff and mainbuff2.

我需要读取文件并将它们存储在mainbuff和mainbuff2中。

I should use only syscalls like open(),read(),write(), etc.

我应该只使用open(),read(),write()等系统调用。

I don't want to store them in stack,what if it will be very large? Heap alloc is better.

我不想将它们存储在堆栈中,如果它会非常大的话呢?堆分配更好。

this code works:

这段代码有效:

...
    char charbuf;
    char *mainbuff1=malloc(100);
    char *mainbuff2=malloc(100);
    while (read(file1, &charbuf, 1)!=0)
            mainbuff1[len++]=charbuf;
    while (read(file2, &charbuf, 1)!=0)
            mainbuff2[len2++]=charbuf;
...

But mainbuff is only 100 chars. Better solution is alloc mainbuff after counting chars in file like this:

但是主要的只有100个字符。更好的解决方案是在计算文件中的字符后分配mainbuff,如下所示:

...
    char charbuf;
    while (read(file1, &charbuf, 1)!=0)
            len++;
    while (read(file2, &charbuf, 1)!=0)
            len2++;
    char *mainbuff1=malloc(len);
    char *mainbuff2=malloc(len2);
...

and then again repeat while loop and read bytes into mainbuff.

然后再次重复while循环并将字节读入mainbuff。

But 2 loops(first will read and count and second will read) will be non-efficient and slow for large files. Need to do it in one or something else more efficient. Please,help! Have no idea!

但是2个循环(第一个将读取和计数,第二个将读取)对于大型文件将是无效且慢的。需要在一个或更高效的其他方面做到这一点。请帮忙!不知道!

7 个解决方案

#1


7  

You can use fstat to get the file size instead of reading twice.

您可以使用fstat来获取文件大小而不是读取两次。

#include <sys/stat.h>

int main() {
    struct stat sbuf;
    int fd = open("filename", O_RDWR);
    fstat(fd, &sbuf);
    char *buf = malloc(sbuf.st_size + 1);
}

But, really, the time to worry about efficiency is after it works too slowly.

但是,实际上,担心效率的时间是在它工作太慢之后。

#2


5  

If this is indeed a place where optimizations are needed, then what you really should optimize is the following two things:

如果这确实是需要优化的地方,那么您真正应该优化的是以下两件事:

  • buffer allocation
  • number of calls to read() and write()
  • read()和write()的调用次数

For small buffers of 100 to 1000 bytes, there's no reason to use malloc() and the like, just allocate the buffer on the stack, it's going to be the fastest. Unless, of course, you want to return pointers to these buffers from the function, in which case you probably should use malloc(). Otherwise, you should consider using global/static arrays instead of dynamically allocated ones.

对于100到1000字节的小缓冲区,没有理由使用malloc()等,只需在堆栈上分配缓冲区,它就会是最快的。当然,除非你想从函数返回指向这些缓冲区的指针,在这种情况下你可能应该使用malloc()。否则,您应该考虑使用全局/静态数组而不是动态分配的数组。

As for the I/O calls, call read() and write() with the entire buffer size. Don't call them to read or write single bytes. Transitions to the kernel and back do have cost.

对于I / O调用,使用整个缓冲区大小调用read()和write()。不要让它们读取或写入单个字节。转换到内核和返回确实有成本。

Further, if you expect to need to work with fairly large files in RAM, consider using file mapping.

此外,如果您希望在RAM中使用相当大的文件,请考虑使用文件映射。

#3


4  

stat et al. allow you to get the file size. http://linux.die.net/man/2/fstat

stat等。允许您获取文件大小。 http://linux.die.net/man/2/fstat

Or, if you can't use that, lseek http://linux.die.net/man/2/lseek (pay particular attention to the return value)

或者,如果你不能使用它,请查看http://linux.die.net/man/2/lseek(特别注意返回值)

If you can't use that either, you can always realloc your buffer as you go.

如果您也不能使用它,您可以随时重新分配缓冲区。

I'm leaving it up to you to implement it since this is obviously an assignment. ;)

我将它留给你来实现它,因为这显然是一项任务。 ;)

#4


2  

Before optimizing anything you have to profile your code. Many tools are available to do that:

在优化任何内容之前,您必须分析您的代码。有许多工具可以做到这一点:

  • valgrind
  • Intel VTune
  • AQTime
  • AMD CodeAnalyst

#5


1  

define an array that automatically straightforward extensions. like this

定义一个自动直接扩展的数组。喜欢这个

#include <stdio.h>
#include <stdlib.h>

typedef struct dynarray {
    size_t size;
    size_t capacity;
    char *array;
} DynArray;

DynArray *da_make(size_t init_size){
    DynArray *da;
    if(NULL==(da=(DynArray*)malloc(sizeof(DynArray)))){
        perror("memory not enough");
        exit(-1);
    }
    if(NULL==(da->array=(char*)malloc(sizeof(char)*init_size))){
        perror("memory not enough");
        exit(-1);
    }
    da->size = 0;
    da->capacity=init_size;
    return da;
}

void da_add(DynArray *da, char value){
    da->array[da->size] = value;
    if(++da->size == da->capacity){
        da->array=(char*)realloc(da->array, sizeof(char)*(da->capacity += 1024));
        if(NULL==da){
            perror("memory not enough");
            exit(-1);
        }
    }
}

void da_free(DynArray *da){
    free(da->array);
    free(da);
}

int main(void) {
    DynArray *da;
    char charbuf;
    int i;

    da = da_make(128);
    while(read(0, &charbuf, 1)!=0)
        da_add(da, charbuf);
    for(i=0;i<da->size;++i)
        putchar(da->array[i]);
    da_free(da);
    return 0;
}

#6


0  

Why do you need everything in memory? You can have chunks of reads, process, read next chunk etc.,
Unless you have enough memory, you cannot keep all in your buff. What is your goal?

为什么你需要记忆中的一切?你可以拥有大量的读取,处理,读取下一个块等等。除非你有足够的内存,否则你无法将所有内容保留在你的buff中。你的目标是什么?

#7


0  

If, as you say, you're only using system calls, you may be able to get away with using the entire heap as a buffer.

如果您正在使用系统调用,那么您可以将整个堆用作缓冲区。

#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>

size_t sz;
void fix(x){signal(SIGSEGV,fix);sbrk(sz *= 2);}
int main() {
    sz = getpagesize();
    signal(SIGSEGV,fix);
    char *buf = sbrk(sz);
    int fd = open("filename", O_RDWR);
    read(fd, buf, -1);
}

But if you happen to call a library function which uses malloc, Kablooey!

但是如果你碰巧调用了一个使用malloc的库函数,Kablooey!

The brk and sbrk functions give you direct access to the same heap that malloc uses. But without any of malloc's "overhead". And without any of malloc's features, like free, realloc. sbrk is called with a size in bytes and returns a void *. brk is called with a pointer value (ie. you just imagine the pointer into existence and declare it to brk, in a way), and returns a void *.

brk和sbrk函数使您可以直接访问malloc使用的同一堆。但没有任何malloc的“开销”。并且没有任何malloc的功能,例如free,realloc。调用sbrk的大小以字节为单位并返回void *。使用指针值调用brk(即,您只是想象指针存在并以某种方式将其声明为brk),并返回void *。

By using brk or sbrk to allocate memory, it uses the same space that malloc will try to setup and use on the first call to malloc or realloc. And many library functions use malloc under the hood, so there are many ways for this code to break. It's a very bizarre and interesting area.

通过使用brk或sbrk来分配内存,它使用malloc将在第一次调用malloc或realloc时尝试设置和使用的相同空间。许多库函数都使用了malloc,因此有很多方法可以解决这个问题。这是一个非常奇怪和有趣的领域。

The signal handler here is also very dangerous. It gives you automatic unlimited space, but of course, if you run into any other kind of segmentation violation, like dereferencing a NULL-pointer, the handler cannot fix that, and it can no longer crash. So this can send the program into a nasty loop: retrying the memory access, allocating more space, retrying the memory access, allocating more space.

这里的信号处理程序也非常危险。它为您提供自动无限空间,但当然,如果您遇到任何其他类型的分段违规,如解除引用NULL指针,处理程序无法解决该问题,并且它不会再崩溃。所以这可以将程序发送到一个讨厌的循环:重试内存访问,分配更多空间,重试内存访问,分配更多空间。

#1


7  

You can use fstat to get the file size instead of reading twice.

您可以使用fstat来获取文件大小而不是读取两次。

#include <sys/stat.h>

int main() {
    struct stat sbuf;
    int fd = open("filename", O_RDWR);
    fstat(fd, &sbuf);
    char *buf = malloc(sbuf.st_size + 1);
}

But, really, the time to worry about efficiency is after it works too slowly.

但是,实际上,担心效率的时间是在它工作太慢之后。

#2


5  

If this is indeed a place where optimizations are needed, then what you really should optimize is the following two things:

如果这确实是需要优化的地方,那么您真正应该优化的是以下两件事:

  • buffer allocation
  • number of calls to read() and write()
  • read()和write()的调用次数

For small buffers of 100 to 1000 bytes, there's no reason to use malloc() and the like, just allocate the buffer on the stack, it's going to be the fastest. Unless, of course, you want to return pointers to these buffers from the function, in which case you probably should use malloc(). Otherwise, you should consider using global/static arrays instead of dynamically allocated ones.

对于100到1000字节的小缓冲区,没有理由使用malloc()等,只需在堆栈上分配缓冲区,它就会是最快的。当然,除非你想从函数返回指向这些缓冲区的指针,在这种情况下你可能应该使用malloc()。否则,您应该考虑使用全局/静态数组而不是动态分配的数组。

As for the I/O calls, call read() and write() with the entire buffer size. Don't call them to read or write single bytes. Transitions to the kernel and back do have cost.

对于I / O调用,使用整个缓冲区大小调用read()和write()。不要让它们读取或写入单个字节。转换到内核和返回确实有成本。

Further, if you expect to need to work with fairly large files in RAM, consider using file mapping.

此外,如果您希望在RAM中使用相当大的文件,请考虑使用文件映射。

#3


4  

stat et al. allow you to get the file size. http://linux.die.net/man/2/fstat

stat等。允许您获取文件大小。 http://linux.die.net/man/2/fstat

Or, if you can't use that, lseek http://linux.die.net/man/2/lseek (pay particular attention to the return value)

或者,如果你不能使用它,请查看http://linux.die.net/man/2/lseek(特别注意返回值)

If you can't use that either, you can always realloc your buffer as you go.

如果您也不能使用它,您可以随时重新分配缓冲区。

I'm leaving it up to you to implement it since this is obviously an assignment. ;)

我将它留给你来实现它,因为这显然是一项任务。 ;)

#4


2  

Before optimizing anything you have to profile your code. Many tools are available to do that:

在优化任何内容之前,您必须分析您的代码。有许多工具可以做到这一点:

  • valgrind
  • Intel VTune
  • AQTime
  • AMD CodeAnalyst

#5


1  

define an array that automatically straightforward extensions. like this

定义一个自动直接扩展的数组。喜欢这个

#include <stdio.h>
#include <stdlib.h>

typedef struct dynarray {
    size_t size;
    size_t capacity;
    char *array;
} DynArray;

DynArray *da_make(size_t init_size){
    DynArray *da;
    if(NULL==(da=(DynArray*)malloc(sizeof(DynArray)))){
        perror("memory not enough");
        exit(-1);
    }
    if(NULL==(da->array=(char*)malloc(sizeof(char)*init_size))){
        perror("memory not enough");
        exit(-1);
    }
    da->size = 0;
    da->capacity=init_size;
    return da;
}

void da_add(DynArray *da, char value){
    da->array[da->size] = value;
    if(++da->size == da->capacity){
        da->array=(char*)realloc(da->array, sizeof(char)*(da->capacity += 1024));
        if(NULL==da){
            perror("memory not enough");
            exit(-1);
        }
    }
}

void da_free(DynArray *da){
    free(da->array);
    free(da);
}

int main(void) {
    DynArray *da;
    char charbuf;
    int i;

    da = da_make(128);
    while(read(0, &charbuf, 1)!=0)
        da_add(da, charbuf);
    for(i=0;i<da->size;++i)
        putchar(da->array[i]);
    da_free(da);
    return 0;
}

#6


0  

Why do you need everything in memory? You can have chunks of reads, process, read next chunk etc.,
Unless you have enough memory, you cannot keep all in your buff. What is your goal?

为什么你需要记忆中的一切?你可以拥有大量的读取,处理,读取下一个块等等。除非你有足够的内存,否则你无法将所有内容保留在你的buff中。你的目标是什么?

#7


0  

If, as you say, you're only using system calls, you may be able to get away with using the entire heap as a buffer.

如果您正在使用系统调用,那么您可以将整个堆用作缓冲区。

#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>

size_t sz;
void fix(x){signal(SIGSEGV,fix);sbrk(sz *= 2);}
int main() {
    sz = getpagesize();
    signal(SIGSEGV,fix);
    char *buf = sbrk(sz);
    int fd = open("filename", O_RDWR);
    read(fd, buf, -1);
}

But if you happen to call a library function which uses malloc, Kablooey!

但是如果你碰巧调用了一个使用malloc的库函数,Kablooey!

The brk and sbrk functions give you direct access to the same heap that malloc uses. But without any of malloc's "overhead". And without any of malloc's features, like free, realloc. sbrk is called with a size in bytes and returns a void *. brk is called with a pointer value (ie. you just imagine the pointer into existence and declare it to brk, in a way), and returns a void *.

brk和sbrk函数使您可以直接访问malloc使用的同一堆。但没有任何malloc的“开销”。并且没有任何malloc的功能,例如free,realloc。调用sbrk的大小以字节为单位并返回void *。使用指针值调用brk(即,您只是想象指针存在并以某种方式将其声明为brk),并返回void *。

By using brk or sbrk to allocate memory, it uses the same space that malloc will try to setup and use on the first call to malloc or realloc. And many library functions use malloc under the hood, so there are many ways for this code to break. It's a very bizarre and interesting area.

通过使用brk或sbrk来分配内存,它使用malloc将在第一次调用malloc或realloc时尝试设置和使用的相同空间。许多库函数都使用了malloc,因此有很多方法可以解决这个问题。这是一个非常奇怪和有趣的领域。

The signal handler here is also very dangerous. It gives you automatic unlimited space, but of course, if you run into any other kind of segmentation violation, like dereferencing a NULL-pointer, the handler cannot fix that, and it can no longer crash. So this can send the program into a nasty loop: retrying the memory access, allocating more space, retrying the memory access, allocating more space.

这里的信号处理程序也非常危险。它为您提供自动无限空间,但当然,如果您遇到任何其他类型的分段违规,如解除引用NULL指针,处理程序无法解决该问题,并且它不会再崩溃。所以这可以将程序发送到一个讨厌的循环:重试内存访问,分配更多空间,重试内存访问,分配更多空间。