我试图使用雅典娜对我们的一个S3存储桶做一些分析,但我得到了一些错误,我无法解释这些错误,也无法找到解决方案。
我遵循的指南是https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory-athena-query.html。
我昨天创建了我的S3清单,现在已经收到了S3的第一份报告。格式为Apache ORC,最后一次导出显示为昨天,存储的附加字段为Size、Last modified、Storage class、Encryption。
我可以看到存储在s3://{my-inventory-bucket}/{my-bucket}/{my-inventory}下的数据,所以我知道那里有数据。
清单存储桶和清单配置的默认加密都开启了SSE-S3加密。
为了创建表,我使用以下查询:
CREATE EXTERNAL TABLE my_table (
`bucket` string,
key string,
version_id string,
is_latest boolean,
is_delete_marker boolean,
size bigint
)
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://{my-inventory-bucket}/{my-bucket}/{my-inventory}/hive/';一旦创建了表,我就使用以下命令加载数据:
MSCK REPAIR TABLE my_table;加载数据的结果显示数据已加载:
Partitions not in metastore: my_table=2021-07-17-00-00
Repair: Added partition to metastore my_table=2021-07-17-00-00加载完成后,我使用以下命令验证数据是否可用:
SELECT DISTINCT dt FROM my_table ORDER BY 1 DESC limit 10;以下哪项输出:
1 2021-07-17-00-00现在,如果我像下面这样运行,一切都运行得很好,并且我得到了预期的结果:
SELECT key FROM my_table ORDER BY 1 DESC limit 10;但是一旦我包含了size列,我就会收到一个错误:
SELECT key, size FROM my_table ORDER BY 1 DESC limit 10;
Your query has the following error(s):
HIVE_CURSOR_ERROR: Failed to read ORC file: s3://{my-inventory-bucket}/{my-bucket}/{my-inventory}/data/{UUID}.orc
This query ran against the "my_table" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: {UUID}.我觉得我的尺码栏出了点问题。有人能帮我解决这个问题吗?
发布于 2021-07-18 19:57:14
太让人沮丧了。我想我在这里找到了答案:https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html
IsLatest – Set to True if the object is the current version of the object. (This field is not included if the list is only for the current version of objects.)删除该列解决了问题。
https://stackoverflow.com/questions/68428789
复制相似问题