在将表划分为组时,我遇到了一个有趣的问题。我有一群游客--每个人都会说一种语言和/或是一个家庭的一部分。我需要将表分成几个组,但我想让家人和类似的语言使用者聚在一起。
假设我想把游客分成3组(如果一个团体必须更大,那是可以接受的)。这个解决方案并不是那么聪明,它完全填满了所有的团队,但我正在做一个尽力而为的方法。
输入:
TouristID | LanguageID | FamilyID
---------------------------------
1 | 1 | 1
2 | 1 | 1
3 | 1 | 1
4 | 2 | 1
5 | 3 | 2
6 | 4 | 2
7 | 5 | 3
8 | 5 | 4
9 | 7 | 5预期结果:
TouristID | GroupID
-------------------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 2
6 | 2
7 | 3
8 | 3
9 | 2第一组是由所有说一种语言的人组成的,包括一个不能被排除在外的家庭成员。
第2组由两个家族成员(5,6)和一个随机成员(9)组成,组成3组。
第三组由两个相同的语言使用者(7,8)组成。
我做了什么,
INSERT TouristGroup
SELECT
t.TouristID,
DENSE_RANK() OVER (ORDER BY GroupID) AS [GroupID]
FROM Tourists t
CROSS APPLY (
SELECT MIN(TouristID) AS [GroupID]
FROM Tourists t2
WHERE
( t2.LanguageID = t.LanguageID
OR t2.FamilyID = t.FamilyID )
) x;
INSERT Groups
SELECT GroupID, COUNT(*)
FROM TouristGroup
GROUP BY GroupID;
declare
@matchID int = 0,
@currentCount int,
@desiredCount int = 0,
@candidateGroupID int = null,
@chunk int = 1
while exists (
select null
from Groups g
left join Matches m
on m.GroupID = g.GroupID
where m.GroupID is null
)
begin
set @currentCount = null
set @candidateGroupID = null
select
@currentCount = isnull(SUM([Count]), 0)
from Matches m
join Groups g
on g.GroupID = m.GroupID
where m.MatchID = @matchID
if @CurrentCount is not null
begin
set @desiredCount = @chunk - @desiredCount
select top 1
@candidateGroupID = g.GroupID
from Groups g
left join Matches m
on m.GroupID = g.GroupID
where g.[Count] <= @desiredCount
and m.GroupID is null
order by [Count] DESC
if @candidateGroupID is not null
begin
insert Matches
select @matchID, @candidateGroupID
end
else begin
set @matchID = @matchID + 1
end
end
else begin
set @matchid = @matchID + 1
end
end 问题
是否有更好的方法对表进行分区,但根据多列将行分组?
发布于 2013-03-21 21:21:47
这将产生你的“第一步”。也许它比你现在拥有的更好(没有循环)。
SELECT t.TouristID, DENSE_RANK() OVER (ORDER BY x.GroupNum) as GroupId
FROM Tourists t
CROSS APPLY (SELECT MIN(TouristId) AS GroupNum
FROM @Tourist t2
WHERE t2.LanguageId = t.LanguageId OR t2.FamilyId = t.FamilyId
) x至于至少有三个组成员的另一个要求,如果可能的话,您可能需要做一个与您正在做的类似的循环(我不确定它是否可以改进,因为您还没有共享它)。
以下是我对“第二步”的建议:
DECLARE @MinGroupSize int = 3, @rc int = 1
WHILE @rc>0
BEGIN
WITH GroupCount AS (
SELECT GroupID, COUNT(*) AS GroupCount
FROM TouristGroup
GROUP BY GroupID
), CandidateGroups AS (
SELECT TOP 1 gc1.GroupID AS ShortGroupId, singleton.GroupID as SingletonGroupID
FROM GroupCount gc1
CROSS APPLY (SELECT TOP 1 GroupID
FROM GroupCount AS gc2
WHERE gc2.GroupCount = 1 AND gc2.GroupID != gc1.GroupID
ORDER BY gc2.GroupID
) AS singleton
WHERE gc1.GroupCount < @MinGroupSize
ORDER BY GroupCount DESC, gc1.GroupID ASC
)
UPDATE tg
SET GroupID = cg.ShortGroupID
FROM TouristGroup tg
JOIN CandidateGroups cg ON cg.SingletonGroupID = tg.GroupID;
SET @rc = @@ROWCOUNT;
END
--
-- If you're anal like me and want to eliminate gaps in GroupID values
--
UPDATE tg
SET GroupID = tg2.GroupID
FROM TouristGroup tg
JOIN (SELECT TouristID, DENSE_RANK() OVER (ORDER BY GroupID) AS [GroupID]
FROM TouristGroup) AS tg2 ON tg2.TouristID = tg.TouristID
WHERE tg.GroupID != tg2.GroupID;这将查找小于所需最小组大小的组,并找到一个单例组(只有一个成员),并使用另一个GroupID更新单例,一个接一个地执行此操作,直到没有更多的候选组为止。按照顺序(由GroupCount降序,然后由GroupID升序)选择较小的组,以便首先填充较大的组。只选择单个人进行更新,这样自然组就不会被分解。
https://stackoverflow.com/questions/15537391
复制相似问题