首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在R中记录逗号分隔项

在R中记录逗号分隔项
EN

Stack Overflow用户
提问于 2017-10-18 18:21:40
回答 1查看 38关注 0票数 1

我有一个数据框架(df2),它有两个变量,心情和PartOfTown,其中心情是一个多选择的(也就是任何选项的组合都允许),问题等级是一个人的幸福感,PartOfTown描述了地理位置。

问题是,这些中心使用的是不同的情绪,城市北部的中心使用NorthCode,南部的中心使用SouthCode (df1)。

我希望数据集(df2)中的所有条目都被重新编码为SouthCode,这样我就得到了一个像df3这样的数据集。我想要一个通用的解决方案,因为可能有新的条目,新的组合,目前没有在数据集中。任何关于这件事的想法都将不胜感激。

中心代码和情绪定义:

代码语言:javascript
复制
df1 <- data.frame(NorthCode=c(4,5,6,7,99),NorthDef=c("happy","sad","tired","energetic","other"),SouthCode=c(7,8,9,5,99),SouthDef=c("happy","sad","tired","energetic","other"))

起点:

代码语言:javascript
复制
df2 <- data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"),Region=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south"))

预期结果:

代码语言:javascript
复制
df3 <- data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"),PartofTown=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south"))

当前尝试:试图通过拆分条目开始,但无法使其工作。

代码语言:javascript
复制
unlist(strsplit(df2$Mood, ","))
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-18 19:35:48

您在使用strsplit的路径上是正确的,但是您需要将stringsAsFactors =F添加到as.data.frame()中,以确保心情是一个字符向量,而不是一个因素。在此之后,可以将分隔的元素保持为列表,并将旧代码与新代码与lapply()匹配。

代码语言:javascript
复制
df1 <- 
  data.frame(NorthCode=c(4,5,6,7,99),
             NorthDef=c("happy","sad","tired","energetic","other"),
             SouthCode=c(7,8,9,5,99),
             SouthDef=c("happy","sad","tired","energetic","other"), 
             stringsAsFactors = F)

df2 <- 
  data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"),
             Region=c("north","north","north","north","north","north","north","south","south","south","south"    ,"south","south","south"), 
             stringsAsFactors = F)

df3 <- 
  data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"),
             PartofTown=c("north","north","north","north","north","north","north","south","south","south","south"   ,"south","south","south"),
             stringsAsFactors = F)

# Split the Moods into separate values
splitCodes <- strsplit(df2$Mood, ",")
# Add the Region as the name of each element in the new list
names(splitCodes) <- df2$Region

# Recode the values by matching the north values to the south values
recoded <- 
  lapply(
    seq_along(splitCodes),
    function(x){
      ifelse(rep(names(splitCodes[x]) == "north", length(splitCodes[[x]])),
             df1$SouthCode[match(splitCodes[[x]], df1$NorthCode)],
             splitCodes[[x]])
    }
  )

# Add the recoded values back to df2
df2$recoded <- 
  sapply(recoded,
         paste,
         collapse = ",")

# Check if the recoded values match your desired values    
identical(df2$recoded, df3$Mood)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46816788

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档