我有一张桌子来保存一个过程。
每个进程都由项组成,每个项都有其在处理过程中收集的值。
进程由客户端执行。
下面是使用虚拟数据的示例数据库方案:http://sqlfiddle.com/#!15/36af4
我需要从这些表格中提取一些信息:
我还需要找到项目的最新和最老进程的进程ID和客户ID:
请注意,对于特定生命周期中的特定项,最老进程的进程ID不能与该生命周期中该项的最小值的进程ID相匹配。
我需要所有这些信息的每一个生命的每一个项目。一个项目可以在不同的客户端中有一个进程,所以我不能按客户端分组,因为这会使项目可能重复。流程也是如此,因为一个项可以出现在不同的进程中,我也不能按进程分组。
这是我自己所能走的最远的地方:
SELECT
PV.ID_ITEM AS ID_ITEM,
PV.ITEM_LIFE AS LIFE,
COUNT(PV.ID_ITEM) AS TOTAL_ITEM_PROCESS,
MIN(P.DATE_TIME) AS OLDEST_PROCESS,
MAX(P.DATE_TIME) AS NEWEST_PROCESS,
MAX(GREATEST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MAX_ITEM_VALUE,
MIN(LEAST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MIN_ITEM_VALUE
FROM PROCESS P
JOIN PROCESS_VALUES PV ON P._ID = PV.ID_PROCESS
GROUP BY PV.ID_ITEM, PV.ITEM_LIFE;但是,我不知道如何在这个查询中添加最老的和最新的进程的客户机和进程in,而不必在group by子句中添加这些相同的列。例如,如果我按组中的客户端ID添加,如果在不同的客户端中有一个进程,某些项将被复制。
我们不能使用MAX或MIN来获取进程ID,因为返回的ID不能与收集到的时间戳匹配。较高的ID并不总是具有最新的时间戳。
对于小提琴中提供的数据,这应该是输出:
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+
| ITEM ID | ITEM LIFE | TOTAL PROCESSES PER ITEM PER LIFE | OLDEST PROCESS PER ITEM PER LIFE | NEWEST PROCESS PER ITEM PER LIFE | MAX ITEM VALUE PER ITEM PER LIFE | MIN ITEM VALUE PER ITEM PER LIFE | PROCESS ID OF THE OLDEST PROCESS | PROCESS ID OF THE NEWEST PROCESS | CLIENT ID OF THE OLDEST PROCESS | CLIENT ID OF THE NEWEST PROCESS |
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+
| 230 | 1 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 6.5 | 1.5 | 1 | 2 | 100 | 100 |
| 230 | 2 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 5.5 | 2.5 | 1 | 2 | 100 | 100 |
| 231 | 1 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 7.5 | 1.5 | 1 | 2 | 100 | 100 |
| 231 | 2 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 10.8 | 4.5 | 1 | 2 | 100 | 100 |
| 232 | 1 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 5.6 | 0.5 | 1 | 2 | 100 | 100 |
| 232 | 2 | 1 | '2018-01-01 10:00:00' | '2018-01-02 10:00:00' | 2.5 | 25.5 | 1 | 2 | 100 | 100 |
| 530 | 1 | 2 | '2018-01-05 13:00:00' | '2018-01-06 13:00:00' | 11.5 | 1.5 | 4 | 3 | 400 | 300 |
| 531 | 1 | 2 | '2018-01-05 13:00:00' | '2018-01-06 13:00:00' | 9.5 | 1.5 | 4 | 3 | 400 | 300 |
| 532 | 1 | 2 | '2018-01-05 13:00:00' | '2018-01-06 13:00:00' | 13.5 | 4.5 | 4 | 3 | 400 | 300 |
+---------+-----------+-----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+如何向查询中添加更多列,而不需要将它们放入组中呢?
我们使用PostgreSQL。
发布于 2018-08-13 14:39:03
它看起来是一个常见的问题,top-n-per-group有点扭曲。
这里有一种方法,我在这里使用ROW_NUMBER方法。另一种方法是使用横向连接。哪种方法更快取决于数据的分布。
您可以在dba.se 每组检索n行上阅读非常详细的答案。(这个问题是针对Server的,但Postgres具有所有相同的特性,因此所有答案都适用于Postgres)。
SQL Fiddle
我调整了您的样本数据,并使所有的ID_CLIENT不同,以便能够在结果中看到它们。
CREATE TABLE PROCESS (
_ID BIGINT,
DATE_TIME TIMESTAMP WITH TIME ZONE,
ID_CLIENT BIGINT
);
CREATE TABLE PROCESS_VALUES (
ID_PROCESS BIGINT,
ID_ITEM BIGINT,
ITEM_LIFE INTEGER,
ITEM_VALUE_1 REAL,
ITEM_VALUE_2 REAL,
ITEM_VALUE_3 REAL
);
INSERT INTO PROCESS VALUES(1, '2018-01-01 10:00:00', 100);
INSERT INTO PROCESS VALUES(2, '2018-01-02 10:00:00', 200);
INSERT INTO PROCESS VALUES(3, '2018-01-06 13:00:00', 300);
INSERT INTO PROCESS VALUES(4, '2018-01-05 13:00:00', 400);
INSERT INTO PROCESS_VALUES VALUES(1, 230, 1, 5.5, 6.5, 1.5);
INSERT INTO PROCESS_VALUES VALUES(1, 231, 1, 1.5, 7.5, 3.5);
INSERT INTO PROCESS_VALUES VALUES(1, 232, 1, 5.6, 3.5, 0.5);
INSERT INTO PROCESS_VALUES VALUES(2, 230, 2, 5.5, 2.5, 4.5);
INSERT INTO PROCESS_VALUES VALUES(2, 231, 2, 10.8, 6.5, 4.5);
INSERT INTO PROCESS_VALUES VALUES(2, 232, 2, 25.5, 6.5, 2.5);
INSERT INTO PROCESS_VALUES VALUES(3, 530, 1, 1.5, 6.5, 8.5);
INSERT INTO PROCESS_VALUES VALUES(3, 531, 1, 3.5, 6.5, 1.5);
INSERT INTO PROCESS_VALUES VALUES(3, 532, 1, 6.5, 7.0, 4.5);
INSERT INTO PROCESS_VALUES VALUES(4, 530, 1, 1.5, 11.5, 4.5);
INSERT INTO PROCESS_VALUES VALUES(4, 531, 1, 9.5, 8.5, 1.5);
INSERT INTO PROCESS_VALUES VALUES(4, 532, 1, 5.5, 13.5, 4.5);查询
我使用CTE编写了这个查询,以使其具有可读性。如果您想要的话,您可以将它们都内联到一个大型查询中。你应该试着测试什么跑得更快。
CTE_RN是最初的两个表,连同两组行号连接在一起,一个用于最老的进程,另一个用于最新的(这是ORDER BY DATE_TIME ASC/DESC的用途);每个项目,每个生命周期(这是PARTITION BY的作用)。
CTE_OLDEST只留下最老的进程。
CTE_NEWEST只留下最新的进程。
CTE_GROUPS是来自计算所有摘要的问题的查询。
最终的SELECT加入了有关最古老和最新进程的信息摘要。
WITH
CTE_RN
AS
(
SELECT
PROCESS.DATE_TIME
,PROCESS._ID AS ID_PROCESS
,PROCESS.ID_CLIENT
,PROCESS_VALUES.ID_ITEM
,PROCESS_VALUES.ITEM_LIFE
,PROCESS_VALUES.ITEM_VALUE_1
,PROCESS_VALUES.ITEM_VALUE_2
,PROCESS_VALUES.ITEM_VALUE_3
,ROW_NUMBER() OVER (PARTITION BY PROCESS_VALUES.ID_ITEM, PROCESS_VALUES.ITEM_LIFE
ORDER BY PROCESS.DATE_TIME ASC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY PROCESS_VALUES.ID_ITEM, PROCESS_VALUES.ITEM_LIFE
ORDER BY PROCESS.DATE_TIME DESC) AS rn2
FROM
PROCESS
INNER JOIN PROCESS_VALUES ON PROCESS._ID = PROCESS_VALUES.ID_PROCESS
)
,CTE_OLDEST
AS
(
SELECT
ID_ITEM
,ITEM_LIFE
,ID_PROCESS
,ID_CLIENT
,DATE_TIME
FROM CTE_RN
WHERE rn1 = 1
)
,CTE_NEWEST
AS
(
SELECT
ID_ITEM
,ITEM_LIFE
,ID_PROCESS
,ID_CLIENT
,DATE_TIME
FROM CTE_RN
WHERE rn2 = 1
)
,CTE_GROUPS
AS
(
SELECT
ID_ITEM
,ITEM_LIFE
,COUNT(ID_ITEM) AS TOTAL_ITEM_PROCESS
,MIN(DATE_TIME) AS OLDEST_PROCESS
,MAX(DATE_TIME) AS NEWEST_PROCESS
,MAX(GREATEST(ITEM_VALUE_1, ITEM_VALUE_2, ITEM_VALUE_3)) AS MAX_ITEM_VALUE
,MIN(LEAST(ITEM_VALUE_1, ITEM_VALUE_2, ITEM_VALUE_3)) AS MIN_ITEM_VALUE
FROM CTE_RN
GROUP BY
ID_ITEM, ITEM_LIFE
)
SELECT
CTE_GROUPS.ID_ITEM
,CTE_GROUPS.ITEM_LIFE
,CTE_GROUPS.TOTAL_ITEM_PROCESS
,CTE_GROUPS.MAX_ITEM_VALUE
,CTE_GROUPS.MIN_ITEM_VALUE
,CTE_OLDEST.DATE_TIME AS OLDEST_DATE_TIME
,CTE_OLDEST.ID_PROCESS AS OLDEST_ID_PROCESS
,CTE_OLDEST.ID_CLIENT AS OLDEST_ID_CLIENT
,CTE_NEWEST.DATE_TIME AS NEWEST_DATE_TIME
,CTE_NEWEST.ID_PROCESS AS NEWEST_ID_PROCESS
,CTE_NEWEST.ID_CLIENT AS NEWEST_ID_CLIENT
FROM
CTE_GROUPS
INNER JOIN CTE_OLDEST
ON CTE_OLDEST.ID_ITEM = CTE_GROUPS.ID_ITEM
AND CTE_OLDEST.ITEM_LIFE = CTE_GROUPS.ITEM_LIFE
INNER JOIN CTE_NEWEST
ON CTE_NEWEST.ID_ITEM = CTE_GROUPS.ID_ITEM
AND CTE_NEWEST.ITEM_LIFE = CTE_GROUPS.ITEM_LIFE
ORDER BY
ID_ITEM, ITEM_LIFE结果
| id_item | item_life | total_item_process | max_item_value | min_item_value | oldest_date_time | oldest_id_process | oldest_id_client | newest_date_time | newest_id_process | newest_id_client |
|---------|-----------|--------------------|----------------|----------------|----------------------|-------------------|------------------|----------------------|-------------------|------------------|
| 230 | 1 | 1 | 6.5 | 1.5 | 2018-01-01T10:00:00Z | 1 | 100 | 2018-01-01T10:00:00Z | 1 | 100 |
| 230 | 2 | 1 | 5.5 | 2.5 | 2018-01-02T10:00:00Z | 2 | 200 | 2018-01-02T10:00:00Z | 2 | 200 |
| 231 | 1 | 1 | 7.5 | 1.5 | 2018-01-01T10:00:00Z | 1 | 100 | 2018-01-01T10:00:00Z | 1 | 100 |
| 231 | 2 | 1 | 10.8 | 4.5 | 2018-01-02T10:00:00Z | 2 | 200 | 2018-01-02T10:00:00Z | 2 | 200 |
| 232 | 1 | 1 | 5.6 | 0.5 | 2018-01-01T10:00:00Z | 1 | 100 | 2018-01-01T10:00:00Z | 1 | 100 |
| 232 | 2 | 1 | 25.5 | 2.5 | 2018-01-02T10:00:00Z | 2 | 200 | 2018-01-02T10:00:00Z | 2 | 200 |
| 530 | 1 | 2 | 11.5 | 1.5 | 2018-01-05T13:00:00Z | 4 | 400 | 2018-01-06T13:00:00Z | 3 | 300 |
| 531 | 1 | 2 | 9.5 | 1.5 | 2018-01-05T13:00:00Z | 4 | 400 | 2018-01-06T13:00:00Z | 3 | 300 |
| 532 | 1 | 2 | 13.5 | 4.5 | 2018-01-05T13:00:00Z | 4 | 400 | 2018-01-06T13:00:00Z | 3 | 300 |结果与你预期的结果不完全一致,但我相信问题中有排字。你说过你想要最古老和最新的“每个生命项目”,而项目230,231,232每个生命只有一个过程,所以他们最古老的和最新的过程是一样的。
您可以在我的查询结果中看到。
发布于 2018-08-11 00:17:09
我认为您混淆了Group By的概念,但是请尝试如下:
SELECT
PV.ID_ITEM AS ID_ITEM,
pv.id_process as PROCESS_ID,
p.id_client as CLIENT_ID,
PV.ITEM_LIFE AS LIFE,
COUNT(PV.ID_ITEM) AS TOTAL_ITEM_PROCESS,
MIN(P.DATE_TIME) AS OLDEST_PROCESS,
MAX(P.DATE_TIME) AS NEWEST_PROCESS,
MAX(GREATEST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MAX_ITEM_VALUE,
MIN(LEAST(PV.ITEM_VALUE_1, PV.ITEM_VALUE_2, PV.ITEM_VALUE_3)) AS MIN_ITEM_VALUE
FROM PROCESS P
JOIN PROCESS_VALUES PV ON P._ID = PV.ID_PROCESS
GROUP BY PV.ID_ITEM, PV.ITEM_LIFE,pv.id_process,p.id_client;如果您想要显示一个客户端运行的进程数,Group By非常有用。
发布于 2018-08-13 23:31:21
我的答案使用窗口函数来计算值,而不使用group。我认为你给出的输出有错误。但是,请修改此查询,以获取所需的值。
例如,如果需要对所有记录使用最低的process_id,则删除该列的分区by子句
select *
from (with prc as (
select a.*,
GREATEST(a.ITEM_VALUE_1, a.ITEM_VALUE_2, a.ITEM_VALUE_3) as great_pv,
LEAST(a.ITEM_VALUE_1, a.ITEM_VALUE_2, a.ITEM_VALUE_3) as least_pv
from PROCESS_VALUES a
)
SELECT
PV.ID_ITEM AS ID_ITEM,
PV.ITEM_LIFE AS LIFE,
count(*) over(partition by PV.ID_ITEM, PV.ITEM_LIFE ) as "TOTAL PROCESSES PER ITEM PER LIFE",
row_number() over(partition by PV.ID_ITEM, PV.ITEM_LIFE ) as rn,
first_value(P.DATE_TIME) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) as "OLDEST PROCESS PER ITEM PER LIFE",
last_value(P.DATE_TIME) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) as "NEWEST PROCESS PER ITEM PER LIFE",
last_value(pv.great_pv) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by pv.great_pv) as "MAX_ITEM_VALUE",
first_value(pv.great_pv) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by pv.great_pv) as "MIN_ITEM_VALUE",
first_value(pv.ID_PROCESS) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "PROCESS ID OF THE OLDEST PROCESS",
last_value(pv.ID_PROCESS) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "PROCESS ID OF THE NEWEST PROCESS",
first_value(p.ID_CLIENT) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "CLIENT ID OF THE OLDEST PROCESS",
last_value(p.ID_CLIENT) over(partition by PV.ID_ITEM, PV.ITEM_LIFE order by P.DATE_TIME) "CLIENT ID OF THE NEWEST PROCESS"
FROM PROCESS P
JOIN prc PV ON P._ID = PV.ID_PROCESS
) inn
where rn = 1
order by ID_ITEM, lifehttps://stackoverflow.com/questions/51794085
复制相似问题