首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >ProtocolError:接收到的requests_async中用空格括起来的头部值

ProtocolError:接收到的requests_async中用空格括起来的头部值
EN

Stack Overflow用户
提问于 2019-12-11 23:07:01
回答 1查看 50关注 0票数 0

为RSS提要编写异步抓取器,有时某些站点会出现以下错误,例如:

代码语言:javascript
复制
In [1]: import requests_async as requests

In [2]: headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Ch
   ...: rome/79.0.3945.79 Safari/537.36'}

In [3]: r = await requests.get('https://albumorientedpodcast.com/category/album-oriented/feed/', headers=headers)

以下是此错误的完整回溯:

代码语言:javascript
复制
Traceback (most recent call last):
  File "rss_parser.py", line 55, in rss_downloader
    response = await requests.get(rss, headers=headers)
  File "C:\Python3\lib\site-packages\requests_async\api.py", line 11, in get
    return await request("get", url, params=params, **kwargs)
  File "C:\Python3\lib\site-packages\requests_async\api.py", line 6, in request
    return await session.request(method=method, url=url, **kwargs)
  File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 79, in request
    resp = await self.send(prep, **send_kwargs)
  File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 157, in send
    async for resp in self.resolve_redirects(r, request, **kwargs):
  File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 290, in resolve_redirects
    resp = await self.send(
  File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 136, in send
    r = await adapter.send(request, **kwargs)
  File "C:\Python3\lib\site-packages\requests_async\adapters.py", line 48, in send
    response = await self.pool.request(
  File "C:\Python3\lib\site-packages\http3\interfaces.py", line 49, in request
    return await self.send(request, verify=verify, cert=cert, timeout=timeout)
  File "C:\Python3\lib\site-packages\http3\dispatch\connection_pool.py", line 130, in send
    raise exc
  File "C:\Python3\lib\site-packages\http3\dispatch\connection_pool.py", line 120, in send
    response = await connection.send(
  File "C:\Python3\lib\site-packages\http3\dispatch\connection.py", line 56, in send
    response = await self.h2_connection.send(request, timeout=timeout)
  File "C:\Python3\lib\site-packages\http3\dispatch\http2.py", line 52, in send
    status_code, headers = await self.receive_response(stream_id, timeout)
  File "C:\Python3\lib\site-packages\http3\dispatch\http2.py", line 126, in receive_response
    event = await self.receive_event(stream_id, timeout)
  File "C:\Python3\lib\site-packages\http3\dispatch\http2.py", line 159, in receive_event
    events = self.h2_state.receive_data(data)
  File "C:\Python3\lib\site-packages\h2\connection.py", line 1463, in receive_data
    events.extend(self._receive_frame(frame))
  File "C:\Python3\lib\site-packages\h2\connection.py", line 1486, in _receive_frame
    frames, events = self._frame_dispatch_table[frame.__class__](frame)
  File "C:\Python3\lib\site-packages\h2\connection.py", line 1560, in _receive_headers_frame
    frames, stream_events = stream.receive_headers(
  File "C:\Python3\lib\site-packages\h2\stream.py", line 1055, in receive_headers
    events[0].headers = self._process_received_headers(
  File "C:\Python3\lib\site-packages\h2\stream.py", line 1298, in _process_received_headers
    return list(headers)
  File "C:\Python3\lib\site-packages\h2\utilities.py", line 335, in _reject_pseudo_header_fields
    for header in headers:
  File "C:\Python3\lib\site-packages\h2\utilities.py", line 291, in _reject_connection_header
    for header in headers:
  File "C:\Python3\lib\site-packages\h2\utilities.py", line 275, in _reject_te
    for header in headers:
  File "C:\Python3\lib\site-packages\h2\utilities.py", line 264, in _reject_surrounding_whitespace
    raise ProtocolError(
h2.exceptions.ProtocolError: Received header value surrounded by whitespace b'3.vie _dca '

同时,同一站点通常通过公共请求库加载:

代码语言:javascript
复制
In [1]: import requests

In [2]: headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Ch
   ...: rome/79.0.3945.79 Safari/537.36'}

In [3]: r = requests.get('https://albumorientedpodcast.com/category/album-oriented/feed/', headers=headers)

In [4]: r
Out[4]: <Response [200]>

我试图找到关于这个错误的至少一些信息,但一无所获。谁能告诉我,我可以做什么,以避免类似的错误,并正常加载网站?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-12-12 01:58:46

requests-async已经存档,但它的github页面包含一个指向后续版本- httpx的链接。

httpx似乎有类似的语法,并积极维护。

考虑试一试:许多bug可能已经在那里修复了。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59288707

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档