Python aiohttp使用实例记录

aiohttp是一个基于asyncio的异步HTTP网络模块，它即提供了服务端，又提供了客户端。其中，我们用服务端可以搭建一个支持异步处理的服务器，之歌服务器就是用来处理请求并返回响应的。
而客户端可以用来发起请求，类似于使用requests发起一个HTTP请求然后获得响应，但request发起的是同步的网络请求，aiohttp则是异步的。

基本实例

import aiohttp
import asyncio
 
async def fetch(session,url):  # 每个异步方法前都要统一加async来修饰
    async with session.get(url) as response:  # with as语句前也需要加async来修饰，表示为，支持异步的上下文管理器  运行session.get(url)赋值给response
        return await response.text(),response.status
 
async def main():
    async with aiohttp.ClientSession() as session:
        html,status = await fetch(session,'https://cuiqingcai.com')
        print(f'html：{html[:100]}...')
        print(f'ststus：{status}')
 
if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

URL参数设置

对于URL参数的设置，我们可以借助params参数，传入一个字典即可

import aiohttp
import asyncio
 
async def main():
    params = {'name':'germey','age':25}
    async with aiohttp.ClientSession() as session:
        async with session.get('https://www.httpbin.org/get',params=params) as response:
            print(await response.text())
 
if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

POST请求

对于POST表单提交，其对应的请求头中的Content-Type为application/x-www-form-urlencoded,可用如下方法实现：

import aiohttp
import asyncio
 
async def main():
    data = {'name':'germey','age':25}
    async with aiohttp.ClientSession() as session:
        async with session.post('https://www.httpbin.org/post',data=data) as response:
            print(await response.text())
 
if __name__ =='__main':
    asyncio.get_event_loop().run_until_complete(main())

对于POST JSON数据提交，其对应的请求头中的Content-Type为application/json，只需将post方法中的data参数改成json即可，代码如下：

async def main():
    data = {'name':'germey','age':25}
    async with aiohttp.ClientSession() as session:
        async with session.post('https://www.httpbin.org/post',json=data) as response:
            print(await response.text())

并发限制

由于aiohttp可以支持非常高的并发量，如几万，十万，百万都是可以做到的，但面对如此高的并发两，目标网站很可能无法在短时间内响应，而且有瞬间将目标网站爬挂掉的风险，需要控制爬取的并发量

一般情况下，可以借助asyncio的Semaphore来控制并发量

import asyncio
import aiohttp
 
CONCURUENCY = 5  # 声明最大并发量 5
URL = 'http://www.baidu.com'  # 声明爬取目标 百度
 
semaphore = asyncio.Semaphore(CONCURUENCY)  # 创建信号量对象，用来控制最大并发量
session = None
 
async def scrape_api():
    async with semaphore:  # 使用async with将semaphore作为上下文对象，信号量便可以控制进入爬取的最大协程数，即我们声明的CONCURUENCY的值
        print('scrping',URL)
        async with session.get(URL) as response:
            await asyncio.sleep(1)
            return await response.text()
 
async def main():
    global session
    session = aiohttp.ClientSession()
    scrape_index_tasks = [asyncio.ensure_future(scrape_api()) for _ in range(10000)]  # 此处声明10000个task，将其传递给asyncio.gather(*scrape_index_tasks)执行，限制后最大并发为5
    await asyncio.gather(*scrape_index_tasks)
 
if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

编程笔记 » Python aiohttp使用实例记录

Python aiohttp使用实例记录

基本实例

URL参数设置

POST请求

并发限制

相关文章

Hi，您需要填写昵称和邮箱！