我有一个数据文件,如下所示:
Year Person Office
2005 Peter Boston
2007 Peter Boston
2008 Peter Chicago
2009 Peter New York
2011 Peter New York
2003 Amy Seattle
2004 Amy Boston
2006 Amy Chicago
2007 Amy Chicago我想要计算一个办公室人员级别的规范化度量(计数),它捕获了一个人在进入当前办公室之前所经历的办公室数量。在到达当前位置之前,该度量是以总年数来标准化的。以下是理想的输出。对彼得来说,波士顿是他的第一个办公室,因此,他对波士顿的标准测量计数是0。对彼得来说,芝加哥是他的第二个办公室,他花了2008-2005年=3年才来到芝加哥办事处。因此,他对芝加哥的归一化度量计数为1/3。
Office Person Count
Boston Peter 0
Boston Amy 1
Chicago Peter 1/3
Chicago Amy 2/3
New York Peter 1/2
Seattle Amy 0发布于 2022-04-28 07:46:07
你可以用
library(dplyr)
df %>%
group_by(Person, Office) %>%
slice_min(Year) %>%
arrange(Year) %>%
add_count() %>%
group_by(Person) %>%
mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
.keep = "unused") %>%
ungroup()这会返回
# A tibble: 6 x 3
Person Office Count
<chr> <chr> <dbl>
1 Amy Seattle 0
2 Amy Boston 1
3 Peter Boston 0
4 Amy Chicago 0.667
5 Peter Chicago 0.333
6 Peter New_York 0.5 发布于 2022-04-28 08:15:54
library(tidyverse)
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y))
# Year Person Office x y test
# <int> <chr> <chr> <dbl> <int> <dbl>
# 1 2005 Peter "Boston" 0 0 0
# 2 2008 Peter "Chicago" 1 3 0.333
# 3 2009 Peter "New York" 2 4 0.5
# 4 2003 Amy "Seattle " 0 0 0
# 5 2004 Amy "Boston" 1 1 1
# 6 2006 Amy "Chicago" 2 3 0.667如果要将计数表示为分数,则可以使用包pracma中的辅助函数来减少分数。
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y)) %>%
mutate(frac = ifelse(x == 0,
0,
ifelse(x/y == 1, 1,
paste0(x / pracma::gcd(x,y), "/", y / pracma::gcd(x,y)))
)
) %>%
select(-x, -y)
# Year Person Office count frac
# <int> <chr> <chr> <dbl> <chr>
# 1 2005 Peter "Boston" 0 0
# 2 2008 Peter "Chicago" 0.333 1/3
# 3 2009 Peter "New York" 0.5 1/2
# 4 2003 Amy "Seattle " 0 0
# 5 2004 Amy "Boston" 1 1
# 6 2006 Amy "Chicago" 0.667 2/3 数据:
cities <- read.delim(text = "Year,Person,Office
2005,Peter,Boston
2007,Peter,Boston
2008,Peter,Chicago
2009,Peter,New York
2011,Peter,New York
2003,Amy,Seattle
2004,Amy,Boston
2006,Amy,Chicago
2007,Amy,Chicago", sep = ",")https://stackoverflow.com/questions/72038933
复制相似问题