文章/答案/技术大牛

发布

社区首页 >问答首页 >分区表，但基于多列进行分组

问分区表，但基于多列进行分组
EN

Stack Overflow用户

提问于 2013-03-21 00:32:43

回答 1查看 1.6K关注 0票数 0

在将表划分为组时，我遇到了一个有趣的问题。我有一群游客--每个人都会说一种语言和/或是一个家庭的一部分。我需要将表分成几个组，但我想让家人和类似的语言使用者聚在一起。

假设我想把游客分成3组(如果一个团体必须更大，那是可以接受的)。这个解决方案并不是那么聪明，它完全填满了所有的团队，但我正在做一个尽力而为的方法。

输入：

TouristID | LanguageID | FamilyID
---------------------------------
    1     |     1      |    1
    2     |     1      |    1
    3     |     1      |    1
    4     |     2      |    1
    5     |     3      |    2
    6     |     4      |    2
    7     |     5      |    3
    8     |     5      |    4
    9     |     7      |    5

预期结果：

TouristID | GroupID
-------------------
    1     |    1
    2     |    1
    3     |    1
    4     |    1
    5     |    2
    6     |    2
    7     |    3
    8     |    3
    9     |    2

第一组是由所有说一种语言的人组成的，包括一个不能被排除在外的家庭成员。

第2组由两个家族成员(5，6)和一个随机成员(9)组成，组成3组。

第三组由两个相同的语言使用者(7，8)组成。

我做了什么，

INSERT TouristGroup
SELECT
  t.TouristID,
  DENSE_RANK() OVER (ORDER BY GroupID) AS [GroupID]
FROM Tourists t
CROSS APPLY (
  SELECT MIN(TouristID) AS [GroupID]
  FROM Tourists t2
  WHERE
    ( t2.LanguageID = t.LanguageID
    OR t2.FamilyID = t.FamilyID )
) x;

INSERT Groups
SELECT GroupID, COUNT(*)
FROM TouristGroup
GROUP BY GroupID;

declare 
  @matchID int = 0,
  @currentCount int,
  @desiredCount int = 0,
  @candidateGroupID int = null,
  @chunk int = 1

while exists (
  select null
  from Groups g
  left join Matches m
    on m.GroupID = g.GroupID
  where m.GroupID is null
)
begin
  set @currentCount = null
  set @candidateGroupID = null

  select
    @currentCount = isnull(SUM([Count]), 0)
  from Matches m
  join Groups g
    on g.GroupID = m.GroupID
  where m.MatchID = @matchID

  if @CurrentCount is not null
  begin
    set @desiredCount = @chunk - @desiredCount

    select top 1
      @candidateGroupID = g.GroupID
    from Groups g
    left join Matches m
      on m.GroupID = g.GroupID
    where g.[Count] <= @desiredCount
      and m.GroupID is null
    order by [Count] DESC

    if @candidateGroupID is not null
    begin
      insert Matches
      select @matchID, @candidateGroupID
    end
    else begin
      set @matchID = @matchID + 1
    end
  end
  else begin
    set @matchid = @matchID + 1
  end
end

问题

是否有更好的方法对表进行分区，但根据多列将行分组？

sqlperformance

sql

sql-server

sql-server-2008

tsql

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-03-21 21:21:47

这将产生你的“第一步”。也许它比你现在拥有的更好(没有循环)。

SELECT t.TouristID, DENSE_RANK() OVER (ORDER BY x.GroupNum) as GroupId
FROM Tourists t
CROSS APPLY (SELECT MIN(TouristId) AS GroupNum 
             FROM @Tourist t2 
             WHERE t2.LanguageId = t.LanguageId OR t2.FamilyId = t.FamilyId
            ) x

至于至少有三个组成员的另一个要求，如果可能的话，您可能需要做一个与您正在做的类似的循环(我不确定它是否可以改进，因为您还没有共享它)。

以下是我对“第二步”的建议：

DECLARE @MinGroupSize int = 3, @rc int = 1
WHILE @rc>0
BEGIN
    WITH GroupCount AS (
    SELECT GroupID, COUNT(*) AS GroupCount
    FROM TouristGroup
    GROUP BY GroupID
    ), CandidateGroups AS (
    SELECT TOP 1 gc1.GroupID AS ShortGroupId, singleton.GroupID as SingletonGroupID
    FROM GroupCount gc1
    CROSS APPLY (SELECT TOP 1 GroupID
                 FROM GroupCount AS gc2
                 WHERE gc2.GroupCount = 1 AND gc2.GroupID != gc1.GroupID
                 ORDER BY gc2.GroupID
                 ) AS singleton
    WHERE gc1.GroupCount < @MinGroupSize
    ORDER BY GroupCount DESC, gc1.GroupID ASC
    )
    UPDATE tg
    SET GroupID = cg.ShortGroupID
    FROM TouristGroup tg
    JOIN CandidateGroups cg ON cg.SingletonGroupID = tg.GroupID;
    SET @rc = @@ROWCOUNT;
END
--
-- If you're anal like me and want to eliminate gaps in GroupID values
--
UPDATE tg
SET GroupID = tg2.GroupID
FROM TouristGroup tg
JOIN (SELECT TouristID,  DENSE_RANK() OVER (ORDER BY GroupID) AS [GroupID]
      FROM TouristGroup) AS tg2 ON tg2.TouristID = tg.TouristID
WHERE tg.GroupID != tg2.GroupID;

这将查找小于所需最小组大小的组，并找到一个单例组(只有一个成员)，并使用另一个GroupID更新单例，一个接一个地执行此操作，直到没有更多的候选组为止。按照顺序(由GroupCount降序，然后由GroupID升序)选择较小的组，以便首先填充较大的组。只选择单个人进行更新，这样自然组就不会被分解。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/15537391

复制

相似问题

问分区表，但基于多列进行分组
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问分区表，但基于多列进行分组EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问分区表，但基于多列进行分组
EN