libcurl多线程使用及高并发性能测试

您所在的位置：网站首页 › go异步实现高并发请求 › libcurl多线程使用及高并发性能测试

libcurl多线程使用及高并发性能测试

2023-07-14 14:16| 来源: 网络整理| 查看: 265

libcurl是一个不错的socket库，而且又是开源的。如果仅仅是简单的HTTP请求，那么只需要几行代码就能轻松实现。不过要用libcurl实现高效、高频率的HTTP请求就需要对libcurl有深入的了解才行。如果阅读英文无障碍的话，那么libcurl自带的示例程序和帮助文档就是最好的老师。

一、多线程HTTP请求

libcurl提供多线程和异步请求来实现大批量HTTP请求，可参见multithread.c和multi-app.c两个示例程序。这两种批量HTTP请求的方式在测试环境下都能正常运行，但使用异步请求总是会出现问题，于是将目标转向多线程请求。

多线程HTTP请求要注意的几个问题：

1. 千万不要在多线程之间共享同一个CURL对象

在libcurl中，第一步要做的就是使用curl_easy_int函数来初始化一个CURL对象，每个CURL对应一个HTTP连接。于是，在批量请求时为了省去每次进行HTTP连接的时间，会对多个HTTP请求使用同一个CURL对象。这在非多线程状态下是不会出问题的，但在多线程下则会出问题。具体原因未知，网上查找到的资料对此解释不太详细。

所以我们需要为每一个线程建立一个CURL对象：

void threadfunc( void *p ) { CURL *curl; curl = curl_easy_init(); ... ... ... curl_easy_cleanup( curl ); }

2. 避免多个线程中同时调用curl_global_init函数

在多线程环境下，应在主线程中使用curl_global_init和curl_global_cleanup函数。

第一次调用 curl_easy_init()时，curl_easy_init 会调用 curl_global_init，在单线程环境下，这不是问题。但是多线程下就不行了，因为curl_global_init不是线程安全的。在多个线程中调用curl_easy_int，然后如果两个线程同时发现curl_global_init还没有被调用，同时调用 curl_global_init，悲剧就发生了。这种情况发生的概率很小，但可能性是存在的。

int main() { curl_global_init( CURL_GLOBAL_ALL );

/* 创建多线程代码 */ ... ...

curl_global_cleanup(); return 0; }

3. 域名解析的设定

引用：

libcurl 有个很好的特性，它甚至可以控制域名解析的超时。但是在默认情况下，它是使用alarm + siglongjmp 实现的。用alarm在多线程下做超时，本身就几乎不可能。如果只是使用alarm，并不会导致程序崩溃，但是，再加上siglongjmp，就要命了（程序崩溃的很可怕，core中几乎看不出有用信息），因为其需要一个sigjmp_buf型的全局变量，多线程修改它。（通常情况下，可以每个线程一个 sigjmp_buf 型的变量，这种情况下，多线程中使用 siglongjmp 是没有问题的，但是libcurl只有一个全局变量，所有的线程都会用）。具体是类似 curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 30L) 的超时设置（发生在域名解析阶段），导致alarm的使用，如前所述，这在多线程中是有冲突的。解决方式是禁用掉alarm这种超时， curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L)。这样，多线程中使用超时就安全了。但是域名解析就没了超时机制，碰到很慢的域名解析，也很麻烦。文档的建议是 Consider building libcurl with c-ares support to enable asynchronous DNS lookups, which enables nice timeouts for name resolves without signals. c-ares 是异步的 DNS 解决方案。参考：http://gcoder.blogbus.com/logs/54871550.html

4. DNS共享

参考文章：http://blog.csdn.NET/colinw/article/details/6534025

由于每个CURL对象都会连接一次服务器，如果发送1000次HTTP请求都连接到同一服务器，libcurl就会返回大量连接错误和接收错误，为此使用DNS共享是很有必要的。

void set_share_handle(CURL* curl_handle) { static CURLSH* share_handle = NULL; if (!share_handle) { share_handle = curl_share_init(); curl_share_setopt(share_handle, CURLSHOPT_SHARE, CURL_LOCK_DATA_DNS); } curl_easy_setopt(curl_handle, CURLOPT_SHARE, share_handle); curl_easy_setopt(curl_handle, CURLOPT_DNS_CACHE_TIMEOUT, 60 * 5); }

void threadfunc( void *p ) { CURL *curl; set_share_handle( curl ); curl = curl_easy_init(); ... ... ... curl_easy_cleanup( curl ); }

二、接收gzip数据及解压缩gzip

假如一个网页有180KB大小，使用gzip算法压缩后可能就只有60KB大小。目前绝大部分网站都支持gzip，这样用户向网站请求获取的数据是gzip格式的，下载会用户电脑后再由浏览器对gzip数据进行解压，这样可以大大提高网站的浏览速度。

要让libcurl接受gzip编码很简单，只需要加入一行代码：

curl_easy_setopt(curl, CURLOPT_ENCODING, "gzip");

关键的问题是如何解压缩gzip数据，这需要用到zlib库。下面是从网上找的一个解压gzip数据的函数：

/* HTTP gzip decompress */ /* 参数1.压缩数据 2.数据长度 3.解压数据 4.解压后长度 */ int CHttp::httpgzdecompress(Byte *zdata, uLong nzdata, Byte *data, uLong *ndata) { int err = 0; z_stream d_stream = {0}; /* decompression stream */ static char dummy_head[2] = { 0x8 + 0x7 * 0x10, (((0x8 + 0x7 * 0x10) * 0x100 + 30) / 31 * 31) & 0xFF, }; d_stream.zalloc = (alloc_func)0; d_stream.zfree = (free_func)0; d_stream.opaque = (voidpf)0; d_stream.next_in = zdata; d_stream.avail_in = 0; d_stream.next_out = data; if(inflateInit2(&d_stream, 47) != Z_OK) return -1; while (d_stream.total_out < *ndata && d_stream.total_in < nzdata) { d_stream.avail_in = d_stream.avail_out = 1; /* force small buffers */ if((err = inflate(&d_stream, Z_NO_FLUSH)) == Z_STREAM_END) break; if(err != Z_OK ) { if(err == Z_DATA_ERROR) { d_stream.next_in = (Bytef*) dummy_head; d_stream.avail_in = sizeof(dummy_head); if((err = inflate(&d_stream, Z_NO_FLUSH)) != Z_OK) { return -1; } } else return -1; } } if(inflateEnd(&d_stream) != Z_OK) return -1; *ndata = d_stream.total_out; return 0; }

使用示例：

//解压缩buffer中的数据 int ndesize = 1024000;//此处长度需要足够大以容纳解压缩后数据 char *szdebuffer = new char[ndesize]; memset( szdebuffer, 0, ndesize ); int err; //错误变量的定义 /* 执行httpgzdecompress后，会在ndesize中保存解压后的数据长度 */ err = httpgzdecompress( ( Byte* ) szbuffer.c_str(), ( uLong ) szbuffer.size(), ( Byte* ) szdebuffer, ( uLong* ) &ndesize );

if ( err == Z_OK ) { /* 成功解压 */ }

参考文章：

安装zlib http://www.linuxidc.com/Linux/2012-06/61982p2.htm

gzip的压缩与解压缩 http://www.cppblog.com/woaidongmao/archive/2011/06/05/148089.html

三、长连接高并发高性能 1 背景介绍

项目中需要用到Curl频繁调用的情况，发现curl接口调用速度缓慢。为了实现curl高性能，高并发，需要研究如何实现高性能高并发。研究方向有三个。

（1）长连接。考虑采用长连接的方式去开发。首先研究下长连接和短连接的性能区别。curl内部是通过socket去连接通讯。socket每次连接最为耗时，如果能够复用连接，长时间连接，减少每次socket连接的时间，则可以大大减少时间，提高效率。

（2）多线程。单个线程下载速度毕竟有限，使用多线程去调用接口。实现高并发高性能，需要考虑资源分配和冲突的问题。

（3）异步调用。和socket异步调用的原理类似。同步调用会阻塞等待，造成CPU占用率高，电脑卡死等问题。异步调用则是数据接收完成后才会取通知调用成功，处理数据。

2 长短连接实测分析 2.1 长连接参数设置说明

Curl提供了三个参数来设置

/* 设置TCP连接为长连接 */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPALIVE, 1L);

/* 设置长连接的休眠时间*/

curl_easy_setopt(curl, CURLOPT_TCP_KEEPIDLE, 120L);

/* 设置心跳发送时间，心使得socket长时间保活，小于KEEPIDLE时间 */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPINTVL, 20L);

/* 设置连接的超时时间，大于心跳时间*/

curl_easy_setopt(curl, CURLOPT_TIMEOUT, 30);

2.2 长短连接区别 2.2.1 短连接

短连接一般分为4步骤：初始化、设置参数、执行请求、清理资源。即使用curl_easy_setopt设置该curl为长连接，因为最后被curl_easy_cleanup(curl)，所以这个socket连接会被中断销毁，不会保持长连接。具体步骤如下：

（1）CURL* curl = curl_easy_init();//创建一个curl对象

（2）curl_easy_setopt(curl,……);//可以设置多个参数url，result

（3）res = curl_easy_perform(curl);//执行请求

（4）curl_easy_cleanup(curl);//清除curl

实例代码如下：

int CHttpClient::Get(const std::string & strUrl, std::string & strResponse)

{

int res;

CURL* curl = curl_easy_init();

if (NULL == curl)

{

return CURLE_FAILED_INIT;

}

if (m_bDebug)

{

curl_easy_setopt(curl, CURLOPT_VERBOSE, 1);

curl_easy_setopt(curl, CURLOPT_DEBUGFUNCTION, OnDebug);

}

curl_easy_setopt(curl, CURLOPT_URL, strUrl.c_str());

curl_easy_setopt(curl, CURLOPT_READFUNCTION, NULL);

curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, OnWriteData);

curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void *)&strResponse);

/* enable TCP keep-alive for this transfer */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPALIVE, 1L);

/* keep-alive idle time to 120 seconds */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPIDLE, 120L);

/* interval time between keep-alive probes: 60 seconds */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPINTVL, 20L);

curl_easy_setopt(curl, CURLOPT_TIMEOUT, 30);

/**

* 当多个线程都使用超时处理的时候，同时主线程中有sleep或是wait等操作。

* 如果不设置这个选项，libcurl将会发信号打断这个wait从而导致程序退出。

//curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L);

curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 20);

res = curl_easy_perform(curl);

if (res != 0)

{

//FIRE_ERROR(" Get error %d", res);

}

//CurlMutiTreadMutex::GetInstance()->muti_curl_easy_cleanup(curl);

curl_easy_cleanup(curl);

return res;

}

2.2.2 长连接

长连接是我们创建了curl对象之后，不立刻使用curl_easy_cleanup清理掉，而是保存起来，下一个请求，只要重新设置url，执行请求，就可以复用以前的socket连接。

示例代码如下

头文件

CURL* GetCurl();

CURL* CreateCurl();

void PutCurl(CURL* curl);

QVector m_VectCurl;

QMutex m_mutex;

源文件

CURL* RestClientPool::GetCurl()

{

CURL* curl = NULL;

m_mutex.lock();

if (m_VectCurl.size()>0)

{

curl = m_VectCurl.front();

m_VectCurl.pop_front();

}

m_mutex.unlock();

if(curl==NULL)

{

curl = CreateCurl();

}

return curl;

}

CURL* RestClientPool::CreateCurl()

{

CURL* curl = curl_easy_init();

if (NULL == curl)

{

return NULL;

}

if (m_bDebug)

{

curl_easy_setopt(curl, CURLOPT_VERBOSE, 1);

curl_easy_setopt(curl, CURLOPT_DEBUGFUNCTION, OnDebug);

}

//curl_easy_setopt(curl, CURLOPT_URL, strUrl.c_str());

curl_easy_setopt(curl, CURLOPT_READFUNCTION, NULL);

curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, OnWriteData);

//curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void *)&strResponse);

/* enable TCP keep-alive for this transfer */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPALIVE, 1L);

/* keep-alive idle time to 120 seconds */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPIDLE, 300L);

/* interval time between keep-alive probes: 60 seconds */

curl_easy_setopt(curl, CURLOPT_TCP_KEEPINTVL, 20L);

curl_easy_setopt(curl, CURLOPT_TIMEOUT, 30);

/**

* 当多个线程都使用超时处理的时候，同时主线程中有sleep或是wait等操作。

* 如果不设置这个选项，libcurl将会发信号打断这个wait从而导致程序退出。

curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L);

curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 20);

return curl;

}

void RestClientPool::PutCurl(CURL* curl)

{

m_mutex.lock();

m_VectCurl.push_back(curl);

m_mutex.unlock();

}

int RestClientPool::Get(const std::string & strUrl, std::string & strResponse)

{

int res;

//CURL* curl = CurlMutiTreadMutex::GetInstance()->muti_curl_easy_init();

CURL* curl = GetCurl();

if (NULL == curl)

{

return CURLE_FAILED_INIT;

}

curl_easy_setopt(curl, CURLOPT_URL, strUrl.c_str());

curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void *)&strResponse);

res = curl_easy_perform(curl);

if (res != 0)

{

printf("req error %d",res);

}

PutCurl(curl);

return res;

}

2.3 长短连接测试分析

用上述的长连接和短连接进行测试，分四种情况进行测试分析。

（1） shot连接循环调用1000次url1；

（2） long连接循环调用1000次url1；

（3） long连接循环调用1000次url2；

（4） long连接循环调用1000次，每次循环中各调用一次url1和一次url2；

测试程序代码

#include

#include"RestClientPool.h"

#include "RestClient.h"

#include

using namespace std;

int main(int argc, char *argv[])

{

QCoreApplication a(argc, argv);

CHttpClient m_shotclient;

RestClientPool m_longClient;

QDateTime StartTime = QDateTime::currentDateTime();

string strUrl = "http://qt.gtimg.cn/q=sz002415";

string strUrl2= "http://hq.sinajs.cn/list=sz002415";

string strResponse = "";

for (int i=0;i

【本文地址】

公司简介

联系我们