我有这个数据框架
structure(list(rule.id = c(1, 2), rules = structure(1:2, .Label = c("Lamp1.1,Lamp1.2",
"Lamp2.1,Lamp2.2"), class = "factor")), .Names = c("rule.id",
"rules"), row.names = c(NA, -2L), class = "data.frame")
# rule.id rules
#1 1 Lamp1.1,Lamp1.2
#2 2 Lamp2.1,Lamp2.2我需要用分隔符逗号(",")在"rules“列上拆分多个逗号(不只是两个类似的例子),然后将其转换为规范化格式,并保持相关的rule.id值与原始df保持一致。结果应该如下所示:
structure(list(rule.id = c(1, 1, 2, 2), lhs = c("Lamp1.1", "Lamp1.2",
"Lamp2.1", "Lamp2.1")), .Names = c("rule.id", "lhs"), row.names = c(NA,
-4L), class = "data.frame")
# rule.id lhs
#1 1 Lamp1.1
#2 1 Lamp1.2
#3 2 Lamp2.1
#4 2 Lamp2.1我有一个代码,负责str拆分和标准化(long)格式,但不确定如何处理rule.id需求
lhs.norm <- as.data.frame(
cbind(
rules.df$ruleid,
unlist(strsplit(
unlist(lapply(strsplit(unlist(lapply(as.character(rules.df$rules),function(x) substr(x,2,nchar(x)))), "} =>", fixed = T), function(x) x[1]))
,","))))感谢@acrun解决方案
cSplit(rules.df.lhs, "lhs", ",", "long"))我为1M行设定了19秒的基准(结果大约是200万行)
发布于 2016-12-17 16:02:34
我们可以使用来自cSplit的splitstackshape
library(splitstackshape)
cSplit(df, "rules", ",", "long")
# rule.id rules
#1: 1 Lamp1.1
#2: 1 Lamp1.2
#3: 2 Lamp2.1
#4: 2 Lamp2.2如果这是一个巨大的数据集,我们可以使用stringi来拆分
library(stringi)
lst <- stri_split_fixed(df$rules, ",")
df2 <- data.frame(rule.id = rep(df$rule.id, lengths(lst)),
rules = unlist(lst))
df2
# rule.id rules
#1 1 Lamp1.1
#2 1 Lamp1.2
#3 2 Lamp2.1
#4 2 Lamp2.2另一个选择是data.table
library(data.table)
setDT(df)[, strsplit(as.character(rules), ","), by = rule.id]https://stackoverflow.com/questions/41200415
复制相似问题