首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从数据集中获取数据

从数据集中获取数据
EN

Stack Overflow用户
提问于 2020-06-02 16:43:35
回答 1查看 103关注 0票数 0
代码语言:javascript
复制
 df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19- 
 timeseries/master/countryReport/raw/rawReport.csv')
 df$countryName = as.character(df$countryName)

我处理了数据集。

我怎样才能在一天内找到报告病人最多、死亡和康复最多的国家?

示例:

报告死亡人数最多的国家是2020年6月1日,报告病例的国家是6月1日,报告死亡人数最多的国家是6月1日。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-02 17:40:31

下面的代码使用dplyr R包创建一个名为records的数据框架,其中包含您想要的数据。通过在R或dplyr中运行install.package("dplyr")来确保安装了RStudio。

代码语言:javascript
复制
## call the dplyr library
library(dplyr)
## read in your data to R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## set the date you wish to query max records for
set.date <- "2020-06-01"
## copy the data to preserve the original
df1 <- df 
## filter the records to only those that match the specified date
df1 <- filter(df1, as.Date(date, "%Y/%m/%d") == as.Date(set.date))
## determine which country had the most confirmed on the specified day
max.confirmed <- df1[which.max(df1$confirmed),]
## format the record to identify it as the record with most confirmed
max.confirmed$confirmed <- paste0("**",max.confirmed$confirmed,"**")
## determine which country had the most deaths on the specified day
max.deaths <- df1[which.max(df1$death),]
## format the record to identify it as the record with most deaths
max.deaths$death <- paste0("**",max.deaths$death,"**")
## determine which country had the most recovered on the specified day
max.recovered <- df1[which.max(df1$recovered),]
## format the record to identify it as the record with most recovered
max.recovered$recovered <- paste0("**",max.recovered$recovered,"**")
## create the reocrds data frame to contain your max records
records <- rbind(max.confirmed, max.deaths, max.recovered)

您可以通过将"2020-06-01"更改为希望查询最大死亡日期和恢复日期来更新希望选择的日期。确保使用"YYYY-MM-DD"格式。

或者,您可以使用readline()函数来要求用户输入他们希望查询最大数据的日期,而不是手动更新代码。

添加(基于注释)如果您想使用今天的数据(或者如果今天的数据不可用,则使用最新的数据),您可以使用以下代码:

代码语言:javascript
复制
## call the dplyr library
library(dplyr)
## read the data into R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## determine the max date contained within the data
max.date <- df[which.max(as.Date(df$day)),"day"]
## copy the data to preserve original
df1 <- df 
## filter the data to only entries from the max day
df1 <- filter(df1, as.Date(date, "%Y/%m/%d") == as.Date(max.date))
## determine the entry with the most deaths
max.deaths <- df1[which.max(df1$death),]
## format the number of deaths as given in the example
max.deaths$death <- paste0("**",max.deaths$death,"**")
## determine the entry with the most recovered
max.recovered <- df1[which.max(df1$recovered),]
## format the number recovered to match the format of the example
max.recovered$recovered <- paste0("**",max.recovered$recovered,"**")
## create a data frame containing our max death and max recovered entries
max.records <- rbind(max.deaths, max.recovered)
## attach a column with the max date which corresponds to the date of the entries selected
max.records$date <- max.date
## organize the data as shown in the example
max.records <- select(max.records, c("day","countryName","death","recovered"))

我希望这能帮到你!

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62157003

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档