我想要创建一列“治愈”,当客户在have1出现在have2后的当天或随后两天离开时,该列的值为1。Data have1是定义客户端输入和离开特定状态的数据集。have2显示客户端何时被联系。
这是我做的代码:
proc SQL;
create table want2 as
select b.*
, case when b.Outcome = "Answered" or
b.Outcome = "Answerphone Message Left" or
b.Outcome = "Answerphone Message Not Left" or
b.Outcome = "No Answer"
and
a.Start_date <= b.Date <= a.End_date
and
a.End_date <= b.Date+2
then 1 else 0 END as Cured
from have1 a, have2 b
where a.ID=b.ID;
quit;我正在使用的数据集是:
data have1;
infile datalines dlmstr=' | ';
input ID Start_date :ddmmyy10. End_date :ddmmyy10.;
format date start_date date9.;
datalines;
ID | Start_date | End_date
1 | 01/01/2021 | 03/01/2021
1 | 20/01/2021 | 21/01/2021
2 | 05/01/2021 | 07/01/2021
3 | 10/01/2021 | 30/01/2021
3 | 25/01/2021 | 25/01/2021
;;;
run;
data have2;
infile datalines dlmstr=' | ';
input ID Date :ddmmyy10. Outcome ;
format Date date9.;
datalines;
ID | Date | Outcome
1 | 01/01/2021 | Answered
2 | 05/01/2021 | Asnwerphone Message Left
3 | 12/01/2021 | Answerphone Message Left
3 | 25/01/2021 | No Answer
;;;
run;在这种情况下,我应该得到以下内容:
ID | Date | Outcome | Cured
1 | 01/01/2021 | Answered | 1
2 | 05/01/2021 | Asnwerphone Message Left | 1
3 | 12/01/2021 | Answerphone Message Left | 0
3 | 25/01/2021 | No Answer | 1ID1治愈了,因为他在01/01之后2天离开了have1
ID2也治愈了,因为他们在05/01之后2天离开了have1
ID3没有治愈,因为他们在12/01之后15天以上离开了have1
ID3治愈了,因为他们在同一天离开了have1
这是该代码的结果:
ID| Date | Outcome | Cured
1 01JAN2021 Answered 1
1 01JAN2021 Answered 0
2 05JAN2021 Asnwerph 0
3 12JAN2021 Answerph 0
3 25JAN2021 No Answe 0
3 12JAN2021 Answerph 0
3 25JAN2021 No Answe 1这是不对的:首先,它重复观察,但它没有准确地选择那些应该是1在治愈(ID 2)。
你知道什么可能是错的吗?
发布于 2021-02-23 00:48:02
我怀疑你想:
select b.*,
(case when b.Outcome in ('Answered', 'Answerphone Message Left',
'Answerphone Message Not Left', 'No Answer') and
a.Start_date <= b.Date and
b.Date < a.End_date and
a.End_date <= b.Date+2
then 1 else 0
end) as Cured 我也想知道你是否真的想要笛卡尔的产品。通常,您希望join具有特定的join条件。
发布于 2021-02-23 00:47:48
在任何编程代码(超越SAS和SQL)中,每当您将OR与AND一起使用时,总是用括号分隔条件:
CASE
WHEN (b.Outcome = "Answered" OR
b.Outcome = "Answerphone Message Left" OR
b.Outcome = "Answerphone Message Not Left" OR
b.Outcome = "No Answer")
AND
(a.Start_date <= b.Date < a.End_date)
AND
(a.End_date <= b.Date+2)
THEN 1
ELSE 0
END AS Cured 更好的方法是使用IN操作符来收集值,避免使用OR
CASE
WHEN b.Outcome IN ("Answered",
"Answerphone Message Left",
"Answerphone Message Not Left",
"No Answer")
AND
a.Start_date <= b.Date < a.End_date
AND
a.End_date <= b.Date+2
THEN 1
ELSE 0
END AS Cured 发布于 2021-02-23 13:52:32
听起来,如果该ID的任何结束日期符合您的标准,您希望对结果进行分组,并设置CURED=1。所以使用MAX()聚合函数。
我不知道您为什么要测试结果的值,因为您的示例中没有未包含在要测试的值列表中的值。
data have1;
input ID Start_date :yymmdd. End_date :yymmdd.;
format Start_date End_date yymmdd10.;
datalines4;
1 2021-01-01 2021-01-03
1 2021-01-20 2021-01-21
2 2021-01-05 2021-01-07
3 2021-01-10 2021-01-30
3 2021-01-25 2021-01-25
;;;;
data have2;
input ID Date :yymmdd. Outcome $40.;
format Date yymmdd10.;
datalines4;
1 2021-01-01 Answered
2 2021-01-05 Answerphone Message Left
3 2021-01-12 Answerphone Message Left
3 2021-01-25 No Answer
4 2021-01-25 No Answer
;;;;
proc SQL;
create table want2 as
select b.id
, b.date
, b.outcome
, max(
case when (a.Start_date <= b.Date <= a.End_date)
and (a.End_date <= b.Date+2) then 1
else 0
end
) as Cured
from have1 a
right join have2 b
on a.id=b.id
group by b.id,b.date,b.outcome
;
quit;结果:
Obs ID Date Outcome Cured
1 1 2021-01-01 Answered 1
2 2 2021-01-05 Answerphone Message Left 1
3 3 2021-01-12 Answerphone Message Left 0
4 3 2021-01-25 No Answer 1
5 4 2021-01-25 No Answer 0按dates或M顺序显示日期的PS只会混淆一半的观众.
https://stackoverflow.com/questions/66325116
复制相似问题