我有一份酒店住宿信息。每一行代表新的一天。向量填充了以下选项: 1. 'first start‘-表示开始一个人首次入住酒店2.‘'NA’-代表个人在酒店停留的时间(不能是住宿的开始或结束) 3. ' end‘-代表个人住宿的结束(可以是他们任何住宿的结束。是的,个人可以多次停留。4.‘另一个开始’-表示在第一次停留后开始停留。可以是第二次、第三次或第四次,等等(有些人会去同一家酒店10+次) 5. 'first start end‘-表示第一次入住只住一天的人6. 'another start end’-表示不是第一次住的人只住一天
我还有一个person id变量。
Here是我所拥有的和我想要的东西的样本
Person_ID Have Want
[1,] "1" "first start" "1"
[2,] "1" "NA" "1"
[3,] "1" "NA" "1"
[4,] "1" "end" "1"
[5,] "1" "another start" "2"
[6,] "1" "NA" "2"
[7,] "1" "NA" "2"
[8,] "1" "NA" "2"
[9,] "1" "end" "2"
[10,] "1" "another start" "3"
[11,] "1" "NA" "3"
[12,] "1" "end" "3"
[13,] "1" "another start" "4"
[14,] "1" "NA" "4"
[15,] "1" "end" "4"
[16,] "1" "another start end" "5"
[17,] "1" "another start" "6"
[18,] "1" "NA" "6"
[19,] "1" "end" "6"
[20,] "1" "another start end" "7"
[21,] "1" "another start end" "8"
[22,] "2" "first start" "1"
[23,] "2" "NA" "1"
[24,] "2" "end" "1"
[25,] "3" "first start end" "1"
[26,] "3" "another start" "2"
[27,] "3" "NA" "2"
[28,] "3" "end" "2"
[29,] "4" "first start end" "1"
[30,] "4" "another start end" "2"
[31,] "4" "another start" "3"
[32,] "4" "NA" "3"
[33,] "4" "end" "3"我尝试使用循环,但我的文件有大约500,000行长,并且自动运行花费了太长的时间,任何有效方法的建议都将不胜感激!谢谢!
发布于 2016-10-28 23:08:02
您可以使用tidyverse包。假设您有一个名为df的矩阵,其中包含数据:
library(tidyverse)
result <- df %>%
as_tibble() %>%
mutate_at("Have", funs(if_else(. %in% c("end", "NA"), NA_character_, .))) %>%
fill(Have) %>%
group_by(Person_ID) %>%
mutate(Want = as.factor(Have) %>% forcats::fct_inorder() %>% as.numeric())https://stackoverflow.com/questions/40307557
复制相似问题