首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Dataframe "ValueError:数据被压缩为snappy,但是我们没有安装它“

Dataframe "ValueError:数据被压缩为snappy,但是我们没有安装它“
EN

Stack Overflow用户
提问于 2018-05-15 00:21:48
回答 1查看 700关注 0票数 2

python似乎已经安装-- Dask返回一个ValueError。

朱庇特和工人的舵手:

代码语言:javascript
复制
env:
  - name: EXTRA_CONDA_PACKAGES
    value: numba xarray s3fs python-snappy pyarrow ruamel.yaml -c conda-forge
  - name: EXTRA_PIP_PACKAGES
    value: dask-ml --upgrade

容器显示python-snappy (通过conda列表)

dataframe是从Apache钻机生成的多部分拼花文件中加载的:

代码语言:javascript
复制
files = ['s3://{}'.format(f) for f in fs.glob(path='{}/*.parquet'.format(filename))]
df = dd.read_parquet(files)

在dataframe上运行len(df)返回:

代码语言:javascript
复制
distributed.utils - ERROR - Data is compressed as snappy but we don't have this installed
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/distributed/utils.py", line 622, in log_errors
    yield
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 921, in _handle_report
    six.reraise(*clean_exception(**msg))
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/tcp.py", line 203, in read
    msg = yield from_frames(frames, deserialize=self.deserialize)
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    return
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 315, in wrapper
    future.set_result(_value_from_stopiteration(e))
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 75, in from_frames
    res = _from_frames()
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 61, in _from_frames
    return protocol.loads(frames, deserialize=deserialize)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/core.py", line 96, in loads
    msg = loads_msgpack(small_header, small_payload)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/core.py", line 171, in loads_msgpack
    " installed" % str(header['compression']))
ValueError: Data is compressed as snappy but we don't have this installed

有人可以建议正确的配置或补救步骤吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-05-15 01:30:45

这个错误实际上不是来自读取您的拼花文件,而是来自于Dask如何在机器之间压缩数据。您可能可以通过在所有客户端/调度程序/工作舱上一致安装python-snappy来解决这一问题。

您应该执行以下任一操作:

  1. jupyterworker荚的conda包列表中删除python。如果您使用的是pyarrow,那么这是不必要的,我相信Arrow在C++级别包含了snappy。
  2. python-snappy添加到scheduler吊舱中

FWIW我个人推荐lz4用于snappy上的机器间压缩.

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50340721

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档