我有一个名为Attributes的表,它包含用ItemId标识的项的属性名和值的袋子。
╔════════╦═══════╦══════════╗
║ ItemId ║ Name ║ Value ║
╠════════╬═══════╬══════════╣
║ 1 ║ color ║ green ║
║ 1 ║ mood ║ happy ║
║ 1 ║ age ║ 5 ║
║ 1 ║ type ║ A ║
║ 2 ║ color ║ blue ║
║ 2 ║ mood ║ sad ║
║ 2 ║ age ║ 5 ║
║ 2 ║ type ║ B ║
║ 3 ║ color ║ red ║
║ 3 ║ mood ║ angry ║
║ 3 ║ age ║ 5 ║
║ 3 ║ type ║ B ║
║ 4 ║ color ║ yellow ║
║ 4 ║ mood ║ whatever ║
║ 4 ║ age ║ 7 ║
║ 5 ║ color ║ green ║
║ 5 ║ mood ║ happy ║
║ 5 ║ age ║ 2 ║
║ 5 ║ type ║ D ║
╚════════╩═══════╩══════════╝下面是一个具有上述结构和数据的SQLFiddle:http://sqlfiddle.com/#!17/08c4b/1
我想得到一组属性名的列表,这些属性名在一起是不同的。ItemId + Name组合是唯一的(同一项的同一属性不能有多个值)。
在上面的示例中,这样的组将是color + mood,因为以下内容总是正确的:
green时,心情是happyred时,心情是angryblue时,心情是sadyellow时,心情是whatever例如,如果有一个具有颜色red和mood happy的额外项,这将使上述相关性失效。
此外,在这组数据中:
- item 1 has age `5` and type `A`
- item 2 also has age `5`, but has type `B`
- item 1 is `green` and has type `A`
- item 5 is also `green`, but has type `D`
是否有可能编写SQL语句来自动发现这些属性之间的关联?
发布于 2017-11-08 09:53:42
这绝对是可能的。一种可能不是最简单的方法就是这样做。
with pairs as (
select l.*, r.name as name2, r.value as value2
from Attribute l join Attribute r on l.ItemId = r.ItemId and l.name < r.name),
counts as (
select name,name2,count(distinct value2)
from pairs l join pairs r using (name,value,name2,value2)
where l.itemid <= r.itemid group by name,value,name2)
select name,name2 from counts group by name, name2 having max(count)=1;这个版本假设缺少的属性与所有的事物相关,这些属性可能是或不可能是预期的。
发布于 2017-11-08 10:12:07
在等待答案的时候,我想出了一个自己的解决方案,并决定在这里发布,尽管Michał's answer看起来更好(更简洁,可能更有效率):
with associations as (
-- associations of
select
a1."ItemId" as Id1,
a2."ItemId" as Id2,
a1."Name" as Name,
a1."Value" as Value1,
a2."Value" as Value2
from Attribute a1
join Attribute a2
on a1."ItemId" < a2."ItemId"
and a1."Name" = a2."Name"
),
names as (
select distinct "Name"
from Attribute
)
select *
from names n1
join names n2
on n1."Name" < n2."Name"
and not exists (
-- try to find a miscorrelation
select *
from associations s1
join associations s2
on s1.Id1 = s2.Id1
and s1.Id2 = s2.Id2
and s1.name in (n1."Name", n2."Name")
and s2.name in (n1."Name", n2."Name")
and s1.value1 = s1.value2
and s2.value1 != s2.value2
)
;SQLFiddle链接:http://sqlfiddle.com/#!17/08c4b/32
https://stackoverflow.com/questions/47175486
复制相似问题