我正在尝试构建一个包含ids成对比较的Hive表。例如,假设我们从表A开始
id val
a 10
b 11
c 12我想要创建一个表B
id_A id_B val_A val_B val_diff
a b 10 11 -1
a c 10 12 -2
b c 11 12 -1我相信这需要一个自我加入,但我不知道该如何做。任何想法或引路都会受到赞赏。
我试过:
CREATE TABLE C STORED AS ORC AS
SELECT a.id AS id_A,
b.id AS id_B,
a.val AS val_A,
b.val AS val_B,
a.val - b.val AS val_diff
FROM A AS a
INNER JOIN B as b
ON a.id != b.id;但这没有用:
FAILED: SemanticException [Error 10017]: Line 4:3 Both left and right aliases encountered in JOIN 'id'更新:问题扩展。
假设我有A'表
id val category
a 10 X
b 11 X
c 12 X
d 13 Y
e 14 Y
f 15 Y如何只在每个category的元素之间进行成对比较?
即表B'
id_A id_B val_A val_B val_diff cat
a b 10 11 -1 X
a c 10 12 -2 X
b c 11 12 -1 X
d e . . . Y
d f . . . Y
e f . . . Y发布于 2018-03-13 21:20:01
您可以使用自连接:
select a.id as id_a, b.id as id_b, a.val as val_a, b.val as val_b,
(a.val - b.val) as val_diff
from t a join
t b
on a.id < b.id;在蜂巢中,您可能需要:
select a.id as id_a, b.id as id_b, a.val as val_a, b.val as val_b,
(a.val - b.val) as val_diff
from t a cross join
t b
where a.id < b.id;https://stackoverflow.com/questions/49265933
复制相似问题