我试图在一个时间序列中过滤珊瑚的人口数据。我有一组珊瑚,每三个月测量一次。我想做的是在某一点的最大直径为9、10或11 mm和b.)清除之前在珊瑚普查中大于9,10或11毫米的珊瑚。值得注意的是,我也想过滤珊瑚的大小范围和在下一个TimeStep,没有离开9-11毫米的范围,因为这构成了0增长,我也想包括这些珊瑚。
我已经创建了一个要使用的示例数据库。菌落#1是一个珊瑚的例子,生长超过大小范围(9-11毫米),然后缩小到9个,第1号,我想从数据库中完全删除。
菌落#2开始超过预期的大小范围(9-11毫米),然后缩小到范围内以后。我也希望这个珊瑚被移除,因为我需要确保在这个范围内的珊瑚不会收缩,而是生长到它。
第三群是一个珊瑚生长到大小范围(9-11毫米)而不收缩的例子,这是我想保留的珊瑚,因为它长到了大小范围。
第四群是珊瑚的一个例子,它开始于大小范围以上,因此需要移除。
第五蜂群是一个珊瑚的例子,它开始在山脉下面生长,后来又缩小到这个范围。对于这种情况,我只想包括第一次直径下降到范围内,而不是第二次。这是因为第一次是自然生长,而第二次是收缩及其结果的恢复(我想排除或过滤掉)。
菌落#6是珊瑚的一个例子,它从TimeStep 1的大小范围开始,然后在下一个TimeStep中长大,然后继续生长。我希望维护这个实例中的第一个TimeStep的所有度量,以便计算TimeStep 1和2之间的增长。
殖民地#7是珊瑚的一个例子,它从TimeStep 1的大小范围开始,然后保持在TimeStep 2的范围内。在这种情况下(假设珊瑚没有缩小到以后的大小范围),我想保持所有的测量结果都是TimeStep 1和2。在这种情况下,珊瑚从第一次出现在这个范围内时增长为0,我想将这些珊瑚包括在这个数据库中进行分析。
8号珊瑚是一个例子,它生长到TimeStep 3的大小范围,在TimeStep 4中保持在10 => 9的范围内,然后缩小到预期的范围以下,然后TimeStep 6生长回到这个范围。对于这个蜂群,我希望将TimeStep 4包含在这个珊瑚中,因为珊瑚在TimeStep 3和4之间被认为是相同的大小(因为大小仍然在测量误差范围内)。
9号珊瑚是一个例子,它生长到TimeStep 3的大小范围,停留在TimeStep 4 (10 => 9)中,然后生长在TimeStep 5和TimeStep 6的范围之上。因此,这种珊瑚应该包含在数据库中的所有测量值(TimeStep 1-6),因为这种珊瑚从不收缩。
10号珊瑚是一个例子,它生长到了TimeStep 4的大小范围,然后缩小到TimeStep 5的范围以下,然后在TimeStep 6中超过了它。在这种情况下,我想包括TimeStep 5,因为我想从尺寸范围中得到收缩的程度。因此,只有TimeStep 6应该过滤掉,因为珊瑚缩小到小于大小范围(9-11毫米)。
总之,我想要的代码过滤这个数据库,如果某个点的珊瑚直径为9-11毫米,但以前大于这个范围,从未在这个范围内或低于这个范围,或者从这个范围以下开始,并且从来不属于这个范围,它们就会被完全从数据库中移除。此外,我希望保持任何珊瑚生长到范围,然后缩小到它在数据库中,同时删除第二次落在范围内。我正在寻找一个通用的代码形式,以便能够过滤掉这些情况,使数据库中的所有珊瑚都开始在9-11毫米以下,然后增长到这个范围。谢谢您抽时间见我!
数据库
Data <- structure(list(Site = c("WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI"
), `Module #` = c(116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116),
Side = c("N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N"), TimeStep = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5,
6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6,
1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1,
2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6), Settlement_Area = c(0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336), `Colony #` = c(1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7,
7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10,
10, 10, 10), Location = c("C1", "C1", "C1", "C1", "C1", "C1",
"B1", "B1", "B1", "B1", "B1", "B1", "A1", "A1", "A1", "A1",
"A1", "A1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1",
"D1", "D1", "D1", "D1", "A2", "A2", "A2", "A2", "A2", "A2",
"A4", "A4", "A4", "A4", "A4", "A4", "B3", "B3", "B3", "B3",
"B3", "B3", "C2", "C2", "C2", "C2", "C2", "C2", "B4", "B4",
"B4", "B4", "B4", "B4"), `Taxonomic Code` = c("PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC"), `Cover Code` = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), `Max Diameter (cm)` = c(5, 8, 12, 15, 9, 16, 15, 13,
11, 15, 17, 20, 3, 6, 9, 12, 15, 20, 13, 16, 24, 22, 28,
30, 6, 9, 14, 9, 15, 19, 11, 14, 17, 17, 21, 24, 9, 11, 14,
16, 20, 22, 3, 6, 10, 9, 7, 10, 5, 7, 10, 9, 13, 16, 5, 7,
9, 10, 8, 13)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -60L), spec = structure(list(
cols = list(Site = structure(list(), class = c("collector_character",
"collector")), `Module #` = structure(list(), class = c("collector_double",
"collector")), Side = structure(list(), class = c("collector_character",
"collector")), TimeStep = structure(list(), class = c("collector_double",
"collector")), Settlement_Area = structure(list(), class = c("collector_double",
"collector")), `Colony #` = structure(list(), class = c("collector_double",
"collector")), Location = structure(list(), class = c("collector_character",
"collector")), `Taxonomic Code` = structure(list(), class = c("collector_character",
"collector")), `Cover Code` = structure(list(), class = c("collector_double",
"collector")), `Max Diameter (cm)` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))所需数据库(过滤)
Data_2 <- structure(list(Site = c("WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI"), `Module #` = c(116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116
), Side = c("N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N"), TimeStep = c(1, 2, 3, 4, 1, 2, 3, 4, 5,
6, 1, 2, 3, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 1,
2, 3, 4, 5, 6, 1, 2, 3, 4, 5), Settlement_Area = c(0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336), `Colony #` = c(1, 1, 1,
1, 3, 3, 3, 3, 3, 3, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7,
7, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10), Location = c("C1",
"C1", "C1", "C1", "A1", "A1", "A1", "A1", "A1", "A1", "D1", "D1",
"D1", "A2", "A2", "A2", "A2", "A2", "A2", "A4", "A4", "A4", "A4",
"A4", "A4", "B3", "B3", "B3", "B3", "C2", "C2", "C2", "C2", "C2",
"C2", "B4", "B4", "B4", "B4", "B4"), `Taxonomic Code` = c("PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC"), `Cover Code` = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), `Max Diameter (cm)` = c(5,
8, 12, 15, 3, 6, 9, 12, 15, 20, 6, 9, 14, 11, 14, 17, 17, 21,
24, 9, 11, 14, 16, 20, 22, 3, 6, 10, 9, 5, 7, 10, 9, 13, 16,
5, 7, 9, 10, 8)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -40L), spec = structure(list(cols = list(
Site = structure(list(), class = c("collector_character",
"collector")), `Module #` = structure(list(), class = c("collector_double",
"collector")), Side = structure(list(), class = c("collector_character",
"collector")), TimeStep = structure(list(), class = c("collector_double",
"collector")), Settlement_Area = structure(list(), class = c("collector_double",
"collector")), `Colony #` = structure(list(), class = c("collector_double",
"collector")), Location = structure(list(), class = c("collector_character",
"collector")), `Taxonomic Code` = structure(list(), class = c("collector_character",
"collector")), `Cover Code` = structure(list(), class = c("collector_double",
"collector")), `Max Diameter (cm)` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))发布于 2020-03-12 13:41:52
编辑#2:只是为了总结我所理解的过滤器标准:
。
当你把珊瑚#1保持在你想要的输出中时,我想这并不重要,无论是珊瑚生长到你想要的范围,还是直接跳过它。对吗?
代码:
Data_filtered <- Data %>%
group_by(`Colony #`) %>%
filter(any((TimeStep == 1 & `Max Diameter (cm)` < 12)), # criterion 1
!all(`Max Diameter (cm)` < 9), # criterion 2
row_number() + 1 <= min(which(lag(`Max Diameter (cm)`) > `Max Diameter (cm)`))) # criterion 3
# test whether the filtering worked ok
all_equal(Data_filtered, Data_2)
[1] TRUE这将导致与所需的输出数据框架相同的过滤数据框架(感谢这一点--它使事情变得更容易了!)
https://stackoverflow.com/questions/60646724
复制相似问题