Python get post 异步 python 异步http请求 | 您所在的位置:网站首页 › 获取url内容操作中传递了url › Python get post 异步 python 异步http请求 |
requests大家应该都不陌生是,urllib和urllib2的改进版,我们做爬虫的时候经常要用到这个第三方库来请求网页并且获取网页内容 但是requests的方法有个不好的就是不能等待,也就是说无法放在await后面,因此无法使用requests库在协程程序中实现请求 因此就出现了aiohttp用来实现异步网页请求,可以理解为异步版本的requests 基本用法官方推荐使用ClientSession()函数来调用网页请求等相关方法 首先,我们在协程中使用ClientSession()的get()或者request()方法来请求网页 requests的使用: import requests def hello(url): resp = requests.get(url) return resp.text() url = "http://www.baidu.com" print(hello(url))使用aiohttp实现: from aiohttp import ClientSession async def hello(url): async with ClientSession() as session: async with session.get(url) as resp: print(resp.status) resp = await resp.text() print(resp) loop = asyncio.get_event_loop() url = "http://baidu.com" loop.run_until_complete(hello(url))使用async和await关键字将函数异步化使用字典传参: from aiohttp import ClientSession parameters = {'1':123, '2':456} async def request(): # 获取session async with ClientSession() as session: async with session.get('http://www.baidu.com',params=parameters) as res: print(res.url)使用list传参: from aiohttp import ClientSession parameters = [('1',123), ('2',456)] async def request(): async with ClientSession() as session: async with session.get('http:www.baidu.com', params=parameters) as res: print(res.url)使用字符串来传参: from aiohttp import ClientSession async def http_request(): async with ClientSession() as session: ##获取session async with session.get('http://www.baidu,com',params='key1=value1') as res: print(res.url)响应内容读取resp.text() 自动服务器端返回的内容进行解码--decode resp.read() 返回字节流 resp.json() 返回json格式数据 自定义header和cookieheader是为了模拟浏览器加的请求头部,cookie用于保存用户状态 添加请求头 async with session.post(url, data=json.dumps(payload), headers=headers) as resp: resp = await resp.text()为所有的连接设置共同的请求头: async with aiohttp.ClientSession( headers={"Authorization": "Basic bG9naW46cGFzcw=="}) as session: async with session.get("http://httpbin.org/headers") as r: json_body = await r.json() # 下面的数据有进行base64编码,所以后面会有== print(json_body['headers']['Authorization'] == \ 'Basic bG9naW46cGFzcw==')给服务器发送cookie,通过ClientSession传递一个cookie参数: url = 'http://www.baidu,com' cookies = {'cookies_are': 'working'} async with ClientSession(cookies=cookies) as session: async with session.get(url) as resp: print(await resp.json() == {"cookies": {"cookies_are": "working"}})控制同时连接的数量(连接池)也就是同时请求的数量,提高session的复用,防止无限增长 conn = aiohttp.TCPConnector(limit=30)#同时最大进行连接的连接数为30,默认是100,limit=0的时候是无限制限制通同时打开连接到同一端点的数量((host,port,is_ssl)3的倍数) conn = aiohttp.TCPConnector(limit_per_host=30)#默认是0顺序访问多个连接普通的requests(同步方式)实现: import requests urls = [url1,url2,url3] for url in urls: print(requests.get(url).text)异步访问: 我们需要将任务包装在asyncio的Future对象中,然后将Future对象列表作为任务传递给事件循环。 import asyncio from aiohttp import ClientSession async def fetch(url): async with ClientSession() as session: ##获取session async with session.get(url) as resp: print(resp.status) resp = await resp.text() print(resp) tasks = [] url = "http://localhost:8080/{}" for i in range(5): task = asyncio.ensure_future(fetch(url.format(i))) #将任务包装在asyncio的Future对象中 tasks.append(task) loop = asyncio.get_event_loop() loop.run_until_complete(asyncio.wait(tasks))注意点等待异步response时,记得使用await,response.read()是一个异步操作,这意味着它不会立即返回结果,仅仅返回生成器。这些生成器需要被调用跟运行,但是这并不是默认行为,Python3.4中加入的yield from以及Python3.5中加入的await便是为此而生。它们将迭代这些生成器极限并发问题 10K连接数问题:因为连接数太大而导致的异常。一种解决办法就是把限制的连接数量提高,但是不太明智。另一种好的办法是加入一些同步机制,可以在asyncio.Samaphore()加入最大任务限制为1000(不要超过1024)会出现连接数过大的异常: import requests r = 10000 url = "http://localhost:8080/{}" for i in range(r): res = requests.get(url.format(i)) delay = res.headers.get("DELAY") d = res.headers.get("DATE") print("{}:{} delay {}".format(d, res.url, delay))设置最大连接的数的代码: import random import asyncio from aiohttp import ClientSession async def fetch(url): async with ClientSession() as session: async with session.get(url) as response: delay = response.headers.get("DELAY") date = response.headers.get("DATE") print("{}:{} with delay {}".format(date, response.url, delay)) return await response.read() async def bound_fetch(sem, url): # getter function with semaphore async with sem: await fetch(url) async def run(loop, r): url = "http://localhost:8080/{}" tasks = [] # create instance of Semaphore sem = asyncio.Semaphore(1000) for i in range(r): # pass Semaphore to every GET request task = asyncio.ensure_future(bound_fetch(sem, url.format(i))) tasks.append(task) responses = asyncio.gather(*tasks) await responses number = 10000 loop = asyncio.get_event_loop() future = asyncio.ensure_future(run(loop, number)) loop.run_until_complete(future) |
CopyRight 2018-2019 实验室设备网 版权所有 |