首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R前导滞后函数在组内汇总并计算百分比

R前导滞后函数在组内汇总并计算百分比
EN

Stack Overflow用户
提问于 2021-06-13 00:48:36
回答 1查看 43关注 0票数 0

下面是我的数据框的样子:

这是它的dput结构。

代码语言:javascript
复制
structure(list(tier_1 = c("Organic Search", "Organic Search", 
"Organic Search", "Organic Search", "Organic Search", "Organic Search", 
"Organic Search", "Organic Search", "Organic Search", "Organic Search", 
"Organic Social", "Organic Social", "Organic Social", "Organic Social", 
"Organic Social", "Organic Social", "Organic Social", "Paid Search", 
"Paid Search", "Paid Search", "Paid Search", "Paid Search", "Paid Search", 
"Paid Search", "Paid Search", "Paid Search", "Paid Social", "Paid Social", 
"Paid Social", "Paid Social", "Paid Social", "Paid Social", "Paid Social", 
"Paid Social", "Paid Social"), sequence_number = c(1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L), count_of_sequence_numbers = c(1176L, 460L, 119L, 41L, 21L, 
5L, 8L, 6L, 2L, 1L, 133L, 52L, 11L, 2L, 2L, 1L, 1L, 7516L, 1090L, 
284L, 90L, 36L, 21L, 12L, 6L, 2L, 1979L, 674L, 99L, 30L, 11L, 
2L, 3L, 2L, 1L), percent = c(0.637744034707158, 0.249457700650759, 
0.0645336225596529, 0.022234273318872, 0.0113882863340564, 0.0027114967462039, 
0.00433839479392625, 0.00325379609544469, 0.00108459869848156, 
0.000542299349240781, 0.655172413793103, 0.25615763546798, 0.0541871921182266, 
0.00985221674876847, 0.00985221674876847, 0.00492610837438424, 
0.00492610837438424, 0.827662151745402, 0.120030833608633, 0.0312740887567449, 
0.00991080277502478, 0.00396432111000991, 0.00231252064750578, 
0.0013214403700033, 0.000660720185001652, 0.000220240061667217, 
0.704019921736037, 0.23977232301672, 0.0352187833511206, 0.0106723585912487, 
0.00391319815012451, 0.000711490572749911, 0.00106723585912487, 
0.000711490572749911, 0.000355745286374956)), row.names = c(NA, 
-35L), groups = structure(list(tier_1 = c("Organic Search", "Organic Social", 
"Paid Search", "Paid Social"), .rows = structure(list(1:10, 11:17, 
    18:26, 27:35), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

df <- df %>% 
  group_by(tier_1, sequence_number) %>%
  summarize(count_of_sequence_numbers = length(sequence_number)) %>%
  mutate(percent = count_of_sequence_numbers / sum(count_of_sequence_numbers)) %>%
  filter(sequence_number <= 10)

通过使用上面的代码,我能够得到百分比列,特别是关于count / sum(count)的部分。

然而,我确实有一个问题,那就是百分比是不正确的。当引用sequence_number =2时,count_of_sequence_numbers中的值应该从count_of_sequence_numbers中的值中减去sequence_number =1的值(在同一类别中)。当引用sequence_number =3时,count_of_sequence_numbers中的所有内容都应该从sequence_number =2和sequence_number = 3时的count_of_sequence_numbers中减去。

我的意思是,我真的需要一个序列号的计数,对于sequence_number = 1,不包括2-10,当它为2时,不包括3-10,依此类推。1176的值实际上应该是1176 - 460 - 119 - 41 - 21 -5 -8 -6- 2 -1。460值应为460 - 119 - 41 - 21 -5 -8 -6 -2 -1。然后,百分比应该从那里计算出来。

我尝试了一个领先函数,但我不认为这是一个有效的方法。:/那个-1175数字特别让我紧张。

代码语言:javascript
复制
df <- df %>%
    group_by(tier_1) %>%
    arrange(tier_1, sequence_number) %>%
    mutate(diff = count_of_sequence_numbers - lead(count_of_sequence_numbers, default = first(count_of_sequence_numbers)))

如果我更改为lead(count_of_sequence_numbers,默认值=0),我会得到更好的行为,但这仍然不是我想要做的事情,即用同一组中序列号更大的所有其他值的总和减去这个值。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-06-13 01:18:32

这是您要查找的输出吗?

代码语言:javascript
复制
df %>%
  arrange(tier_1, -sequence_number) %>%
  group_by(tier_1) %>%   # already grouped this way, only including for clarity
  mutate(cuml = cumsum(lag(count_of_sequence_numbers, default = 0)),
         diff = count_of_sequence_numbers - cuml) %>%
  ungroup()


## A tibble: 35 x 6
#   tier_1         sequence_number count_of_sequence_numbers  percent  cuml  diff
#   <chr>                    <int>                     <int>    <dbl> <dbl> <dbl>
# 1 Organic Search              10                         1 0.000542     0     1
# 2 Organic Search               9                         2 0.00108      1     1
# 3 Organic Search               8                         6 0.00325      3     3
# 4 Organic Search               7                         8 0.00434      9    -1
# 5 Organic Search               6                         5 0.00271     17   -12
# 6 Organic Search               5                        21 0.0114      22    -1
# 7 Organic Search               4                        41 0.0222      43    -2
# 8 Organic Search               3                       119 0.0645      84    35
# 9 Organic Search               2                       460 0.249      203   257
#10 Organic Search               1                      1176 0.638      663   513
## … with 25 more rows
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67951014

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档