thttpd增加gzip压缩响应报文体功能,以减少传输数据量

时间:2021-04-01 11:40:36

thttpd

thttpd是一个非常小巧的轻量级web server,它非常非常简单,仅仅提供了HTTP/1.1和简单的CGI支持,在其官方网站上有一个与其他web server(如Apache, Zeus等)的对比图+Benchmark,可以参考参考。此外,thttpd 也类似于lighttpd,对于并发请求不使用fork()来派生子进程处理,而是采用多路复用(Multiplex)技术来实现。因此效能很好。

thttpd 支持多种平台,如FreeBSD, SunOS, Solaris, BSD, Linux, OSF等。对于小型web server而言,速度快似乎是一个代名词,通过官方站提供的Benchmark,可以这样认为:thttpd至少和主流的web server一样快,在高负载下更快,因为其资源占用小的缘故。

thttpd还有一个较为引人注目的特点:基于URL的文件流量限制,这对于下载的流量控制而言是非常方便的。象Apache就必须使用插件实现,效率较thttpd低。

安装调试,见:

http://blog.csdn.net/21aspnet/article/details/7045845

http://blog.csdn.net/orzlzro/article/details/7568338

HTTP协议压缩

对响应报文体进行压缩,可以减少报文传输的数据量, 以提高页面响应速度。特别是对当今web应用丰富的情况下, 页面形成了很大的脚本, 则效果明显。

压缩服务器端和客户端使用同一种压缩算法。HTTP协议对压缩算法有什么规定? 是怎么协商压缩算法的?

如下描述:

1、 HTTP支持的压缩算法包括 gzip compress deflate identity(不压缩)

2、 HTTP协议规定, 客户端发起的请求中使用accept-encoding报文头,告知服务器端, 客户端可以接受哪几种压缩算法, 服务器端分析此头域值, 知道其支持解压的算法, 如果算法服务器端也支持, 则服务器端对响应报文体, 进行压缩, 将压缩后的内容, 作为报文体传给客户端,报文头中要包括content-encoding,其值指明压缩使用的算法。 注意content-length这时候, 就是压缩后的内容长度。

如下图报文头,accept-encoding 和 content-encoding:

thttpd增加gzip压缩响应报文体功能,以减少传输数据量

http://www.w3.org/Protocols/rfc2616/rfc2616.txt

Content coding values indicate an encoding transformation that has
been or can be applied to an entity. Content codings are primarily
used to allow a document to be compressed or otherwise usefully
transformed without losing the identity of its underlying media type
and without loss of information. Frequently, the entity is stored in
coded form, transmitted directly, and only decoded by the recipient. content-coding = token All content-coding values are case-insensitive. HTTP/1.1 uses
content-coding values in the Accept-Encoding (section 14.3) and
Content-Encoding (section 14.11) header fields. Although the value
describes the content-coding, what is more important is that it
indicates what decoding mechanism will be required to remove the
encoding. The Internet Assigned Numbers Authority (IANA) acts as a registry for
content-coding value tokens. Initially, the registry contains the
following tokens: gzip An encoding format produced by the file compression program
"gzip" (GNU zip) as described in RFC 1952 [25]. This format is a
Lempel-Ziv coding (LZ77) with a 32 bit CRC. compress
The encoding format produced by the common UNIX file compression
program "compress". This format is an adaptive Lempel-Ziv-Welch
coding (LZW). Use of program names for the identification of encoding formats
is not desirable and is discouraged for future encodings. Their
use here is representative of historical practice, not good
design. For compatibility with previous implementations of HTTP,
applications SHOULD consider "x-gzip" and "x-compress" to be
equivalent to "gzip" and "compress" respectively. deflate
The "zlib" format defined in RFC 1950 [31] in combination with
the "deflate" compression mechanism described in RFC 1951 [29]. identity
The default (identity) encoding; the use of no transformation
whatsoever. This content-coding is used only in the Accept-
Encoding header, and SHOULD NOT be used in the Content-Encoding
header. New content-coding value tokens SHOULD be registered; to allow
interoperability between clients and servers, specifications of the
content coding algorithms needed to implement a new value SHOULD be
publicly available and adequate for independent implementation, and
conform to the purpose of content coding defined in this section.

gzip压缩工具

http://www.gzip.org/

gzip工具的网站上指出:

Can I adapt the gzip sources to perform in-memory compression?

Use the zlib data compression library instead.

zlib

http://www.zlib.net/

https://github.com/madler/zlib

主要的压缩函数 http://www.zlib.net/manual.html#Basic


ZEXTERN int ZEXPORT deflate OF((z_streamp strm, int flush));

deflate compresses as much data as possible, and stops when the input buffer becomes empty or the output buffer becomes full. It may introduce some output latency (reading input without producing any output) except when forced to flush.

通用封装的压缩函数, compress compress2, 这两方法,不会生成gzip格式头:

使用方法参考: http://blog.csdn.net/turingo/article/details/8148264

生成gzip格式头的压缩函数,  gzcompress gzdecompress   参考:

http://www.oschina.net/code/snippet_65636_22542

gzcompress 是我们今天无需要使用的函数, 服务器端压缩, 报文体。

修改要点

对file_address内容进行压缩, 压缩存储内存开辟以compressBound计算大小,存储地址为file_address_gz,

gzcompress 执行压缩行为, 存储在file_address_gz中,

同时修改, send_mime调用的 len。

#include <zlib.h> 
#include <zconf.h>

static int
really_start_request( httpd_conn* hc, struct timeval* nowP ) 。。。

else
    {
    hc->file_address = mmc_map( hc->expnfilename, &(hc->sb), nowP );
    if ( hc->file_address == (char*) 0 )
        {
        httpd_send_err( hc, 500, err500title, "", err500form, hc->encodedurl );
        return -1;
        }

/* 计算压缩结果*/
        uLong blen = 0; 
printf("enter file address %s!\n", hc->file_address); 
        /* 计算缓冲区大小,并为其分配内存 */ 
        blen = compressBound(hc->sb.st_size+1); /* 压缩后的长度是不会超过blen的 */ 
        if((hc->file_address_gz = (char*)malloc(sizeof(char) * blen)) == NULL) 
        { 
            printf("no enough memory!\n"); 
            return -1; 
        } 
     
        /* 压缩 */ 
        if(gzcompress(hc->file_address, hc->sb.st_size, hc->file_address_gz, &blen) != Z_OK) 
        { 
            printf("compress failed!\n"); 
            return -1; 
        } 
   
    send_mime(
        hc, 200, ok200title, hc->encodings, "", hc->type, blen,
        hc->sb.st_mtime );
    }

return 0;
    }

send_mime中 hc->encoding内容(gzip),由于accept-encoding值决定,如果其值含有gzip,则此值为gzip

for ( i = 0; i < n_enc_tab; ++i )
    {
    if ( (ext_len == enc_tab[i].ext_len && strncasecmp( ext, enc_tab[i].ext, ext_len ) == 0)
        /* 客户端请求支持gzip,服务器端对于非gzip文件, 可以采用gzip压缩算法 */
        || ( strcasestr(hc->accepte, "gzip") && strncasecmp( "gz", enc_tab[i].ext, ext_len ) == 0 ) )
    {
    if ( n_me_indexes < sizeof(me_indexes)/sizeof(*me_indexes) )
        {
        me_indexes[n_me_indexes] = i;
        ++n_me_indexes;
        }
    goto next;
    }
    }
/* No encoding extension found.  Break and look for a type extension. */
break;

发送阶段 handle_send 函数中, 将 file_address修改为 file_address_gz

/* Do we need to write the headers first? */
    if ( hc->responselen == 0 )
    {
    /* No, just write the file. */
    sz = write(
        hc->conn_fd, &(hc->file_address_gz[c->next_byte_index]),
        MIN( c->end_byte_index - c->next_byte_index, max_bytes ) );
    }
    else
    {
    /* Yes.  We'll combine headers and file into a single writev(),
    ** hoping that this generates a single packet.
    */
    struct iovec iv[2];

iv[0].iov_base = hc->response;
    iv[0].iov_len = hc->responselen;
    iv[1].iov_base = &(hc->file_address_gz[c->next_byte_index]);
    iv[1].iov_len = MIN( c->end_byte_index - c->next_byte_index, max_bytes );
    sz = writev( hc->conn_fd, iv, 2 );
    }

实验结果

以下载http协议为测试对象, http://www.w3.org/Protocols/rfc2616/rfc2616.txt

未实现gzip压缩前, 响应文件大小为 422KB,响应时间为 19ms, 加载时间为 373ms

thttpd增加gzip压缩响应报文体功能,以减少传输数据量

实现gzip压缩后, 响应文件大小为 115KB,响应时间为 38ms, 加载时间为 396ms

thttpd增加gzip压缩响应报文体功能,以减少传输数据量

从上面两者对比,可以看出, 响应时间变长, 可以理解为服务器端进行压缩耗时 和 客户端进行解压耗时, 这两个原因的耗时, 仅仅会比原来增加 20ms, 本文是执行的本地局域网测试,

但是体积缩小了进四分之一,很是显著, 如果是考虑互联网上的环境仅仅增加2oms的事件, 可以让体积降低四分之是很划算的, 因为互联网上的耗时都耗费在传输上, 体积上减少四分之一, 则传输速度提高四倍,

例如互联网上资源很有可能传输都要以秒计算时间,原来3秒, 压缩后0.7s:

thttpd增加gzip压缩响应报文体功能,以减少传输数据量

响应报文内容:

thttpd增加gzip压缩响应报文体功能,以减少传输数据量