文章/答案/技术大牛

发布

社区首页 >问答首页 >滚动计算识别两列之间的不匹配

问滚动计算识别两列之间的不匹配
EN

Stack Overflow用户

提问于 2021-06-16 11:22:51

回答 1查看 47关注 0票数 1

I的数据包括学生在mid-terms考试和final考试中的成绩。

此数据以SUID.格式排列，其中每一行对应于显示为的唯一学生ID

I的数据还包括关于教师的信息，这是由TUserId显示的。每个教师可以有多个学生，因此在几行中都有一个共同的教师ID。

，我想知道是否有老师在期中给学生同样的分数(如mid_sum所示)，他们在期末考试中给学生的分数不一致(如final_sum所示)。要记录这种不一致性，我想要添加一个列inconsistency.

来记录这种不匹配或Status。

输入：

我的数据df如下所示：

TUserId  SUID   mid_sum final_sum
 115      201   7       1
 115      309   8       2
 115      404   9       1
 209      245   10      2
 209      398   10      2
 209      510   10      2
 209      602   10      1
 371      111   11      2
 371      115   11      2
 371      123   11      3
 371      124   11      2

输出：

对于我的输出，我需要如下所示：

TUserId  SUID   mid_sum final_sum   Status
 115      201   7       1           consistent
 115      309   8       2           consistent
 115      404   9       1           inconsistent
 209      245   10      2           consistent
 209      398   10      2           consistent
 209      510   10      2           consistent
 209      602   10      1           inconsistent
 371      111   11      2           consistent
 371      115   11      2           consistent
 371      123   11      3           inconsistent
 371      124   11      2           consistent

要求：

我的要求如下：

中期成绩要求：

当期中成绩较低时，学生不能有更高的期末成绩(相对于彼此)。例如，当学生的期中成绩低于学生SUID = 309时，学生SUID = 404的期末成绩较高.在这种情况下，我想将SUID = 404标记为inconsistent。

期中分数相近的学生也不能有不同的期末成绩。例如，当学生的期中成绩与教师SUID = 602的其他学生相同时，他们的期末成绩较低。同样，当学生的期中成绩与老师SUID = 123的其他学生相同时，他们的TUserId = 371有更高的期末成绩。

最终评分要求：

然而，同样的final成绩可以分配给期中分数不同的学生.我知道这个要求有点混乱。只要期中分数保持不变或增加，期末成绩就可以保持不变。相反，如果期中分数在老师的范围内开始下降，那么期末成绩就不能保持不变。

此外，如果期中分数在增加，最后的分数也会增加(或者保持与以前的分数相同)。

数据导入dput()

数据框架的dput()如下：

dput(df)

structure(list(
TUserId = c(115L, 115L, 115L, 209L, 209L, 209L, 209L, 371L, 371L, 371L, 371L), 
SUID = c(201L, 309L, 404L, 245L, 398L, 510L, 602L, 111L, 115L, 123L, 124L), 
mid_sum = c(7L, 8L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L),
final_sum = c(1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 2L)), 
class = "data.frame", row.names = c(NA, -11L))

注意：

I学生的mid_sum和final_sum分数按升序排列。我只想找出标记分配不一致的情况。

从实现的角度来看，

总是与以前的值进行比较。

我正在重新发布我的问题，因为最后一个例子没有澄清我的确切需求Identify cases where data sequence changes based on other column UserIDs.。

部分解决办法：

下面的解决方案部分满足了我的要求，但不包括那些期中分数相近的学生最终得分较高的情况。

library(dplyr)
df %>%
  arrange(TUserId, mid_sum) %>%
  group_by(TUserId) %>%
  mutate(
    Status = if_else(
      sign(final_sum - lag(final_sum, default = 0) + lead(final_sum, default = 0)) 
      == sign(mid_sum - lag(mid_sum, default = 0)  + lead(mid_sum, default = 0)),
      "consisent", "inconsistent"
    )
  )

rolling-computation

group-by

multiple-columns

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-06-16 14:38:15

很酷的问题。你的问题解释得很清楚。

请考虑以下代码：

## Rule 1
# we sort by mid sum first, then final sum
# if the cumululative max of the final sum is higher than the current finalsum,
# the mid sum had to be lower

df <- df %>% arrange(mid_sum,final_sum) %>% mutate(inconsistentRule1 = cummax(final_sum)>final_sum)


# Rule 2
# This is a shot in the dark as the inconsitency criteria is a bit fuzzy
# (What if a teacher with only two students on same mid_level assigns different grades,
# which  student is to be considered "inconsistent"? The lower or the higher graded)
# i just used the median, in this case students that deviate from the norm
# are considered the inconsistent ones, works with your example
df <- df %>% group_by(TUserId,mid_sum) %>% mutate(inconsistentRule2= final_sum != median(final_sum))

# combine the rules
df <- df %>% ungroup() %>% 
  mutate(Status=ifelse(
    inconsistentRule1 | inconsistentRule2,
    "inconsistent",
    "consistent"))

# put in order and delete working columns
df %>% arrange(TUserId,SUID) %>%
  select(-c("inconsistentRule1","inconsistentRule2"))

结果是您想要的表。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68001713

复制

相似问题

问滚动计算识别两列之间的不匹配
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问滚动计算识别两列之间的不匹配EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问滚动计算识别两列之间的不匹配
EN