我有以下数据集,其中p1是由sp1列表示的植物种类数,p2是由sp2列表示的植物种类数,等等。例如,我想创建一个名为Count1的新变量,它计算每行小麦品种的总数,并将其粘贴到新变量Count1中。例如,在第9行(ID=7)中,我们总共有7个小麦植株,或者在第7行(ID=5)中,我们没有小麦,所以在Count1变量中它是0。如果你能帮我解决这个问题,我将不胜感激。
plt <- data.frame(ID = c(0:10), p1 = c(1,1,1,8,8,8,8,8,4,4,4),
sp1 = c('wheat', 'wheat', 'wheat', 'barley','barley',
'barley','barley','barley', 'rice','rice','rice'),
p2 = c(0,0,0,2,2,2,2,2,2,2,2),
sp2 = c(0,0,0,'rice', 'rice', 'rice', 'rice', 'wheat',
'wheat', 'wheat','wheat'),
p3 = c(0,0,2,2,2,2, 5,5,5,5,5),
sp3= c(0,0,0, 'rice', 'rice', 'rice', 'wheat', 'wheat',
'wheat', 'wheat', 'wheat'))发布于 2022-02-25 17:33:45
您可以使用sapply和paste检查您的条件,然后应用和
plt$Count1 <- rowSums(sapply(1:length(grep("^sp",colnames(plt))), function(x)
ifelse(plt[, paste0("sp", x)] == "wheat", plt[, paste0("p", x)], 0)))
plt
ID p1 sp1 p2 sp2 p3 sp3 Count1
1 0 1 wheat 0 0 0 0 1
2 1 1 wheat 0 0 0 0 1
3 2 1 wheat 0 0 2 0 1
4 3 8 barley 2 rice 2 rice 0
5 4 8 barley 2 rice 2 rice 0
6 5 8 barley 2 rice 2 rice 0
7 6 8 barley 2 rice 5 wheat 5
8 7 8 barley 2 wheat 5 wheat 7
9 8 4 rice 2 wheat 5 wheat 7
10 9 4 rice 2 wheat 5 wheat 7
11 10 4 rice 2 wheat 5 wheat 7发布于 2022-02-25 17:39:45
使用tidyverse,您可以尝试以下操作。首先,使用pivot_longer以长格式放置数据,并删除sp为零的行。
然后,对于每个ID和sp,将p之和相加。然后,如果需要,可以将数据重新放入宽格式,并返回到原始数据。这将计算每种类型(小麦、大麦、水稻)的总和。
library(tidyverse)
plt %>%
pivot_longer(cols = -ID, names_to = c(".value", "number"), names_pattern = "(p|sp)(\\d+)") %>%
filter(sp != 0) %>%
group_by(ID, sp) %>%
summarise(total = sum(p)) %>%
pivot_wider(id_cols = ID, names_from = sp, values_from = total, values_fill = 0) %>%
right_join(plt)输出
ID wheat barley rice p1 sp1 p2 sp2 p3 sp3
<int> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr>
1 0 1 0 0 1 wheat 0 0 0 0
2 1 1 0 0 1 wheat 0 0 0 0
3 2 1 0 0 1 wheat 0 0 2 0
4 3 0 8 4 8 barley 2 rice 2 rice
5 4 0 8 4 8 barley 2 rice 2 rice
6 5 0 8 4 8 barley 2 rice 2 rice
7 6 5 8 2 8 barley 2 rice 5 wheat
8 7 7 8 0 8 barley 2 wheat 5 wheat
9 8 7 0 4 4 rice 2 wheat 5 wheat
10 9 7 0 4 4 rice 2 wheat 5 wheat
11 10 7 0 4 4 rice 2 wheat 5 wheat发布于 2022-02-25 18:10:01
这是一个非有效的方法,但它更“直观”。本和安德烈的答案是一个更好的解决方案,因为它的紧凑,并可能,它的表现。
然而,这是一个想法:
word <- "wheat" # take class you want
indexes <- which(plt==word,arr.ind = TRUE) # find indices in matrix which contains that word
new_matrix <- subset(plt,is.element(row.names(plt),unique(indexes[,1]))==TRUE) # subset that matrix
count_w <- 0; k <- 1;rowcounts <- c();
for (j in 1:nrow(new_matrix)){ # loop by rows
a <- which(new_matrix[j,]==word) # which columns contains the value "word"?
for (i in a){
count_w <- count_w + new_matrix[j,i-1] # Just sum the columns
}
rowcounts[k] <- count_w # save the counts
count_w <- 0; # and start again for another row
k <- k+1
}
Count_1 <- rep(0,nrow(plt));p <- 1
for (k in sort(unique(indexes[,1]))){ # To be consistent with original data frame, new column of counts must have same dimension, so let's fill it with zeros and substitute with counts vector
Count_1[k] <- rowcounts[p]
p <- p+1;
}
final_matrix <- cbind(plt,Count_1) # just combine您可以看到23行代码(在我的例子中)可以用更少的代码替换(参见Ben &Andre的答案)!
https://stackoverflow.com/questions/71268234
复制相似问题