我是新的R,并已经搜索了网站寻找一个解决方案-我已经找到了许多类似的,但略有不同的问题。我很困惑。
我在这个结构中有一个数据集:
SURVEY_ID CHILD_NAME CHILD_AGE
Survey1 Billy 4
Survey2 Claude 12
Survey2 Maude 6
Survey2 Constance 3
Survey3 George 22
Survey4 Marjoram 14
Survey4 LeBron 37我试图更广泛地分析数据,这样每一行只有一个唯一的SURVEY_ID,更重要的是,( b)为第二个、第三个等等创建一个新列,用于多个子行的调查。
因此,结果将是:
SURVEY_ID CHILD_NAME1 CHILD_NAME2 CHILD_NAME3 CHILD_AGE1 CHILD_AGE2 CHILD_AGE3
Survey1 Billy 4
Survey2 Claude Maude Constance 12 6 3
Survey3 George 22
Survey4 Marjoram Lebron 14 37实际数据有数千项调查,“儿童姓名”和“儿童年龄”的数量可能高达10。让我感到困惑的是,创建新列的问题不是根据现有的价值名称,而是只有在有多个孩子的情况下。
发布于 2022-05-26 03:15:54
使用R基:
reshape(transform(df, time = ave(SURVEY_ID, SURVEY_ID, FUN=seq)),
v.names = c('CHILD_NAME', 'CHILD_AGE'),
direction = 'wide', idvar = 'SURVEY_ID', sep = '_')
SURVEY_ID CHILD_NAME_1 CHILD_AGE_1 CHILD_NAME_2 CHILD_AGE_2 CHILD_NAME_3 CHILD_AGE_3
1 Survey1 Billy 4 <NA> NA <NA> NA
2 Survey2 Claude 12 Maude 6 Constance 3
5 Survey3 George 22 <NA> NA <NA> NA
6 Survey4 Marjoram 14 LeBron 37 <NA> NA使用tidyverse:
library(tidyverse)
df %>%
group_by(SURVEY_ID) %>%
mutate(name = row_number()) %>%
pivot_wider(SURVEY_ID, values_from = c(CHILD_NAME, CHILD_AGE))
# A tibble: 4 x 7
# Groups: SURVEY_ID [4]
SURVEY_ID CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3 CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3
<chr> <chr> <chr> <chr> <int> <int> <int>
1 Survey1 Billy NA NA 4 NA NA
2 Survey2 Claude Maude Constance 12 6 3
3 Survey3 George NA NA 22 NA NA
4 Survey4 Marjoram LeBron NA 14 37 NA使用data.table
library(data.table)
dcast(setDT(df), SURVEY_ID~rowid(SURVEY_ID), value.var = c('CHILD_AGE', 'CHILD_NAME'))
SURVEY_ID CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3 CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3
1: Survey1 4 NA NA Billy <NA> <NA>
2: Survey2 12 6 3 Claude Maude Constance
3: Survey3 22 NA NA George <NA> <NA>
4: Survey4 14 37 NA Marjoram LeBron <NA>https://stackoverflow.com/questions/72386255
复制相似问题