首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何从具有一定要求的长数据中计算周期持续时间

如何从具有一定要求的长数据中计算周期持续时间
EN

Stack Overflow用户
提问于 2021-08-18 09:33:04
回答 1查看 46关注 0票数 1

我有一个很大的数据集,每个月记录一个人的就业状况,为期一年(>6.000.000次来自>500.000人的观察)。看起来是这样的:

代码语言:javascript
复制
ID <- c(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
Month <- c("Jan", "Jan", "Jan", "Feb", "Feb", "Feb", "Mar", "Mar", "Mar", "Apr", "Apr", "Apr", 
           "May", "May", "May", "Jun", "Jun", "Jun", "Jul", "Jul", "Jul", "Aug", "Aug", "Aug", 
           "Sep", "Sep", "Sep", "Oct", "Oct", "Oct", "Nov", "Nov", "Nov", "Dec", "Dec", "Dec")
Activity <- c("Unemployed", "Unemployed", "Other", "Employed", "Employed", "Unemployed", "Employed", 
         "Employed", "Employed", "Unemployed", "Unemployed", "Unemployed", "Unemployed", "Unemployed", 
         "Unemployed", "Employed", "Employed", "Employed", "Other", "NA", "Unemployed", "Employed", "Unemployed", 
         "Employed", "Inactive", "Unemployed", "Employed", "Employed", "Employed", "Unemployed", 
         "Other", "Unemployed", "Employed", "Unemployed", "Other", "Other")
df <- data.frame(ID, Month, Activity)
df[order(ID),]

   ID Month   Activity
1   1   Jan Unemployed
4   1   Feb   Employed
7   1   Mar   Employed
10  1   Apr Unemployed
13  1   May Unemployed
16  1   Jun   Employed
19  1   Jul      Other
22  1   Aug   Employed
25  1   Sep   Inactive
28  1   Oct   Employed
31  1   Nov      Other
34  1   Dec Unemployed
2   2   Jan Unemployed
5   2   Feb   Employed
8   2   Mar   Employed
11  2   Apr Unemployed
14  2   May Unemployed
17  2   Jun   Employed
20  2   Jul         NA
23  2   Aug Unemployed
26  2   Sep Unemployed
29  2   Oct   Employed
32  2   Nov Unemployed
35  2   Dec      Other
3   3   Jan      Other
6   3   Feb Unemployed
9   3   Mar   Employed
12  3   Apr Unemployed
15  3   May Unemployed
18  3   Jun   Employed
21  3   Jul Unemployed
24  3   Aug   Employed
27  3   Sep   Employed
30  3   Oct Unemployed
33  3   Nov   Employed
36  3   Dec      Other

我想要实现的是创建一个新的数据集,它在每个就业期()中都有一行,但要求在发生失业之前和之后,只包括就业时间。,这意味着我只想包括人们从失业到就业,再回到失业状态的就业时间,并能够计算出每个就业期间的持续时间。同样需要的是每个就业期的开始和结束月份。最后的数据文件应该如下所示:

代码语言:javascript
复制
  ID Starting Ending Duration
1  1      Feb    Mar        2
3  2      Feb    Mar        2
5  2      Oct    Oct        1
6  3      Mar    Mar        1
7  3      Jun    Jun        1
8  3      Aug    Sep        2
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-08-18 09:51:34

只有那些在失业期间你可以使用的工作时间-

代码语言:javascript
复制
library(dplyr)

df %>%
  arrange(ID) %>%
  group_by(ID, grp = data.table::rleid(Activity)) %>%
  summarise(Activity = first(Activity),
            Starting = first(Month), 
            Ending = last(Month), 
            Duration = n()) %>%
  #Keep only those 'Employed' rows that are surrounded by 'Unemployed' ones
  filter(Activity == 'Employed', 
         lag(Activity) == 'Unemployed', lead(Activity) == 'Unemployed') %>%
  ungroup %>%
  select(-grp, -Activity)

#     ID Starting Ending Duration
#  <int> <chr>    <chr>     <int>
#1     1 Feb      Mar           2
#2     2 Feb      Mar           2
#3     2 Oct      Oct           1
#4     3 Mar      Mar           1
#5     3 Jun      Jun           1
#6     3 Aug      Sep           2
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68829871

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档