我不会接受我不要的未来
哪怕是命中注定

异步使用线程池与进程池

Concurrent.futures 这个模块可以和异步连接,具有线程池和进程池。管理并发编程,处理非确定性的执行流程,同步功能。

使用 requests 的异步

目标文章:http://www.budejie.com

代码如下:

import asyncio, requests,aiohttp
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# ThreadPoolExecutor :线程池
# ProcessPoolExecutor:进程池
from bs4 import BeautifulSoup
from requests.exceptions import RequestException
​
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36',
}
​
def crawl(i):
  url = f'http://www.budejie.com/{i}'
  try:
    html = requests.get(url, headers=headers)
    if html.status_code == 200:
      soup = BeautifulSoup(html.text, 'lxml')
      lis = soup.select(".j-r-list ul li div .u-txt a")
      for li in lis:
        print(li.get_text())
    return "ok"
  except RequestException:
    return None
​
async def main():
  loop = asyncio.get_event_loop() # 获取循环事件
  tasks = []
  with ThreadPoolExecutor(max_workers=10)as t:
    # 10 个线程,10 个任务
    for i in range(1, 10):
      tasks.append(loop.run_in_executor(t, crawl, i))
  #     task.append(loop.run_in_executor(放入你的线程,爬虫函数,爬虫函数参数)
​
  # 以下代码可以不写
  # await asyncio.wait(tasks)
  # for result in await asyncio.wait(tasks):
  #   print(result)# 当你执行的爬虫函数有返回信息时使用
  #   pass
​
if __name__ == '__main__':
  start_time = time.time()
  loop = asyncio.get_event_loop()
  loop.run_until_complete(main())
  loop.close()
  print(time.time() - start_time)

编写程序测试时间,建议不要同时运行,注释掉其他运行方法再运行:

import asyncio, requests,aiohttp
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from bs4 import BeautifulSoup
from requests.exceptions import RequestException
​
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36',
}
​
def crawl(i):
  url = f'http://www.budejie.com/{i}'
  try:
    html = requests.get(url, headers=headers)
    if html.status_code == 200:
      soup = BeautifulSoup(html.text, 'lxml')
      lis = soup.select(".j-r-list ul li div .u-txt a")
      for li in lis:
        pass
      #   print(li.get_text())
    return "ok"
  except RequestException:
    return None
​
if __name__ == '__main__':
  start_time_1 = time.time()
  for i in range(1, 10):
    crawl(i)
  print("单线程时间:>>>", time.time() - start_time_1)
​
  start_time_2 = time.time()
  with ThreadPoolExecutor(max_workers=10)as t:
    for i in range(1, 10):
      t.submit(crawl, i)
  print("线程池时间:>>>", time.time() - start_time_2)
​
  start_time_3 = time.time()
  with ProcessPoolExecutor(max_workers=10)as t:
    for i in range(1, 10):
      t.submit(crawl, i)
  print("进程池时间:>>>", time.time() - start_time_3)

我们来分析一下输出结果,我们会分析进程池花费的时间会比线程池更多,这是为什么呢?

多线程非常适合 I/O 密集型,不适合 CPU 密集型;

进程池创建销毁的资源开销大,创建一个进程所耗费的资源要比创建一个线程耗费的时间大很多,销毁它也需要很长的时间。(准备工作非常多)

def rsynctask( tasks):
"""顺序执行协程任务"""
    new_ loop = asyncio.new event_ loop( )
    asyncio. set_ event_ loop( new_ 1oop)
    loop = asyncio.get_ event_ 1oop( )
    loop. run_ until_ complete( asyncio. gather( * tasks))
    loop.run_ until_ complete (1oop . shutdown_ asyncgens( ))
    loop. close( )
def rsynctaskwait(tasks):
"""并行执行协程任务"""
    new_ loop = asyncio.new event_ loop( )
    asyncio. set_ event_ 1oop(new_ 1oop)
    loop =
    asyncio.get_ event_ 1oop( )
    loop. run_ until complete( asyncio. wait(tasks))
    loop.run_ until_ complete (1oop . shutdown_ asyncgens( ))
    loop. close( )

 

赞(0)
未经允许不得转载:技术搬运工 » 异步使用线程池与进程池
分享到: 更多 (0)

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

我们不生产技术 我们只是技术的搬运工