文章/答案/技术大牛

发布

社区首页 >问答首页 >如何根据某一列的特定条件对数据集进行重组？

问如何根据某一列的特定条件对数据集进行重组？
EN

Stack Overflow用户

提问于 2016-03-17 15:43:23

回答 2查看 66关注 0票数 1

我的数据集是基于疾病计数的。许多变量都是分类的，如WeekSeries、MonthSeries和YearSeries。这些标签是指疾病计数在我的时间序列数据中属于哪个星期、月份和年份。

我面临的问题是构建另一个数据表，该数据表将基于WeekSeries、MonthSeries和YearSeries对计数进行求和。我需要我的方法来决定WeekSeries 1将被编码为TS1 =1还是TS2=1。例如，在原始数据中，您可以看到第三个观察结果不是在TS1中，而是在TS2中，因为它在TS2中，所以也有HolidaysPerSeason=10。

我希望用这种方法来确定，如果WeekSeries 1中的大多数观测属于TS1=1和HolidaysPerSeason =11，那么这将是WeekSeries=1的最终类别。

原始数据

 WeekSeries  Counts  TS1  TS2  TS3  TS4  TS5  TS6  HolidaysPerSeason
     1         0      1    0    0    0    0    0          11
     1         1      1    0    0    0    0    0          11
     1         1      0    1    0    0    0    0          10

理想格式

WeekSeries  Counts  TS1  TS2  TS3  TS4  TS5  TS6  HolidaysPerSeason
     1        2      1    0    0    0    0    0          11

这种格式对于建立回归模型和其他分析是必要的。

这是与我真实数据相似的假数据：

    # a couple of the variables within my data
    JulianDate<-c(10985, 10986,10987)
    DateRcd<-c(NA,NA,"2000-01-31")
    Counts<-c(0,1,1)
    Day<-c("Sat","Sun","Mon")
    Weekend<-c(1,1,0)
    Season<-c(1,1,2)
    HolidaysPerSeason<-c(11,11,10)
    TS1<-c(1,1,0)
    TS2<-c(0,0,1)
    TS3<-c(0,0,0)
    TS4<-c(0,0,0)
    TS5<-c(0,0,0)
    TS6<-c(0,0,0)
    WeekSeries<-c(1,1,1)
    YearSeries<-c(1,1,1)
    MonthSeries<-c(1,1,1)
    mydata<-data.table(JulianDate,DateRcd,Counts,Day,Weekend,Season,HolidaysPerSeason, TS1,TS2,TS3,TS4,TS5,TS6,YearSeries,MonthSeries,WeekSeries) #data simulation

我尝试使用data.table()函数基于WeekSeries进行聚合，然后将其与原始数据合并，以构建理想的分析格式。

我最接近成功的尝试

install.packages("data.table")
library(data.table)

DT <- data.table(mydata)
mydata1<-DT[, by = list(WeekSeries)] #doesn't work
mydata2<-DT[,sum(CountsofCholera), by=WeekSeries] #loses all the other variables
idealdata<-merge(mydata2,mydata,by.x=mydata2$WeekSeries) #attempts to regain  the lost variable, this doesn't work because the datasets are not the same length

我能做些什么来恢复其他的分类变量呢？

data.table

merge

dataframe

回答 2

Stack Overflow用户

回答已采纳

发布于 2016-03-17 16:14:56

这可以在几个点上进行优化，但是应该给您一个基本的想法：

# sum up counts and count number of rows with identical values for the last several columns
DT[, .(Count = sum(Counts), .N), by = c(tail(names(DT), -4))][
   # assign same count number = total count to each row within same WeekSeries
   , Count := sum(Count), by = WeekSeries][
   # extract most frequent row (i.e. one with largest N, computed in line 1)
   , .SD[which.max(N)], by = WeekSeries]
#   WeekSeries Weekend Season HolidaysPerSeason TS1 TS2 TS3 TS4 TS5 TS6 YearSeries MonthSeries Count N
#1:          1       1      1                11   1   0   0   0   0   0          1           1     2 2

票数 4

Stack Overflow用户

发布于 2016-03-17 16:09:44

group_by是你要找的吗？比如，像这样的东西？您应该安装dplyr和data.table。

mydata_new <- mydata %>% group_by(WeekSeries, TS1, HolidaysPerSeason) %>% summarise(count = n())

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/36065343

复制

相似问题

问如何根据某一列的特定条件对数据集进行重组？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何根据某一列的特定条件对数据集进行重组？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何根据某一列的特定条件对数据集进行重组？
EN