文章/答案/技术大牛

发布

问如何在R中批量查询Ids数
EN

Stack Overflow用户

提问于 2020-02-03 08:22:48

回答 4查看 917关注 0票数 5

下面我已经提到了R.

ID       Amount     Date
IK-1     100        2020-01-01
IK-2     110        2020-01-02
IK-3     120        2020-01-03
IK-4     109        2020-01-03
IK-5     104        2020-01-03

我使用ID从MySQL中获取一些使用以下代码的详细信息。

library(RMySQL)

conn<- connection

query<-paste0("SELECT c.ID,e.Parameters, d.status
FROM Table1 c
left outer join Table2 d ON d.seq_id=c.ID
LEFT outer JOIN Table3 e ON e.role_id=d.role
           where c.ID IN (", paste(shQuote(dataframe$ID, type = "sh"),
                                      collapse = ', '),") 
and e.Parameters in
           ('Section1',
           'Section2','Section3',
           'Section4');")

res1 <- dbGetQuery(conn,query)

res2<-res1[res1$Parameters=="Section1",4:5]
colnames(res2)[colnames(res2)=="status"] <- "Section1_Status"

上面的代码运行良好，如果我传递了~1000个ID，但是当每次传递10000或更多ID时，它会引发R终止错误。

如何创建一个循环并分批传递ID以获得10000 Id的最后一个输出。

错误信息：

Warning message:
In dbFetch(rs, n = n, ...) : error while fetching rows

dataframe

dplyr

rmysql

回答 4

Stack Overflow用户

回答已采纳

发布于 2020-02-12 16:16:41

# Load Packages
library(dplyr) # only needed to create the initial dataframe
library(RMySQL)

# create the initial dataframe
df <- tribble(
    ~ID,       ~Amount,     ~Date
    , "IK-1"    , 100       , 2020-01-01
    , "IK-2"    , 110       , 2020-01-02
    , "IK-3"    , 120       , 2020-01-03
    , "IK-4"    , 109       , 2020-01-03
    , "IK-5"    , 104       , 2020-01-03
)

# first helper function
createIDBatchVector <- function(x, batchSize){
    paste0(
        "'"
        , sapply(
            split(x, ceiling(seq_along(x) / batchSize))
            , paste
            , collapse = "','"
        )
        , "'"
    )
}

# second helper function
createQueries <- function(IDbatches){
    paste0("
SELECT c.ID,e.Parameters, d.status
FROM Table1 c
    LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
    LEFT OUTER JOIN Table3 e ON e.role_id = d.role
WHERE c.ID IN (", IDbatches,") 
AND e.Parameters in ('Section1','Section2','Section3','Section4');
")
}

# ------------------------------------------------------------------

# and now the actual script

# first we create a vector that contains one batch per element
IDbatches <- createIDBatchVector(df$ID, 2)

# It looks like this:
# [1] "'IK-1','IK-2'" "'IK-3','IK-4'" "'IK-5'" 

# now we create a vector of SQL-queries out of that
queries <- createQueries(IDbatches)

cat(queries) # use cat to show what they look like

# it looks like this:

# SELECT c.ID,e.Parameters, d.status
# FROM Table1 c
#     LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
#     LEFT OUTER JOIN Table3 e ON e.role_id = d.role
# WHERE c.ID IN ('IK-1','IK-2') 
# AND e.Parameters in ('Section1','Section2','Section3','Section4');
#  
# SELECT c.ID,e.Parameters, d.status
# FROM Table1 c
#     LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
#     LEFT OUTER JOIN Table3 e ON e.role_id = d.role
# WHERE c.ID IN ('IK-3','IK-4') 
# AND e.Parameters in ('Section1','Section2','Section3','Section4');
#  
# SELECT c.ID,e.Parameters, d.status
# FROM Table1 c
#     LEFT OUTER JOIN Table2 d ON d.seq_id =c.ID
#     LEFT OUTER JOIN Table3 e ON e.role_id = d.role
# WHERE c.ID IN ('IK-5') 
# AND e.Parameters in ('Section1','Section2','Section3','Section4');

# and now the loop
df_final <- data.frame() # initialize a dataframe

conn <- connection # open a connection
for (query in queries){ # iterate over the queries
    df_final <- rbind(df_final, dbGetQuery(conn,query))
}

# And here the connection should be closed. (I don't know the function call for this.)

票数 2

Stack Overflow用户

发布于 2020-02-07 16:35:51

在SQL查询之前将ID的数据帧传递到临时表中，然后使用它在您正在使用的ID上进行内部连接，这样可以避免循环。您所要做的就是使用dbWriteTable并在调用它时设置参数temporary = TRUE。

例：

library(DBI)
library(RMySQL)
con <- dbConnect(RMySQL::MySQL(), user='user', 
password='password', dbname='database_name', host='host')
#here we write the table into the DB and then declare it as temporary
dbWriteTable(conn = con, value = dataframe, name = "id_frame", temporary = T)
res1 <- dbGetQuery(con = conn, "SELECT c.ID,e.Parameters, d.status
FROM Table1 c
left outer join Table2 d ON d.seq_id=c.ID
LEFT outer JOIN Table3 e ON e.role_id=d.role
Inner join id_frame idf on idf.ID = c.ID 
and e.Parameters in
       ('Section1',
       'Section2','Section3',
       'Section4');")

这将提高代码的性能，并且不再需要使用where语句在R中循环。如果不正常请告诉我。

票数 5

Stack Overflow用户

发布于 2020-02-06 20:54:57

正如@A.Suliman提供的链接所显示的，这很可能是因为in -子句中有大量的值。以下是一些尝试的解决方案：

批处理

我是一个球迷使用模块化的批处理过程。这假设您正在批处理的ID值是数字的：

num_batches = 100
output_list = list()

for(i in 1:num_batches){
    this_subset = filter(dataframe, ID %% num_batches == (i-1))

    # subsequent processing using this_subset

    output_list[i] = results_from_subsetting
}
output = data.table::rbindlist(output_list)

在您的例子中，ID看起来是XX-123形式(两个字符，一个连字符，后面跟着一些数字)。您可以使用：just_number_part = substr(ID, 4, nchar(ID))将其转换为一个数字。

临时文件写入

如果要将dataframe从R写到sql，那么就不需要这么大的IN-子句，而可以使用join。dbplyr包包括一个可用于向数据库写入临时表的函数copy_to。

这看起来应该是：

library(RMySQL)
library(dbplyr)

conn<- connection

copy_to(conn, dataframe, name = "my_table_name") # copy local table to mysql

query<-paste0("SELECT c.ID,e.Parameters, d.status
FROM Table1 c
INNER JOIN my_table_name a ON a.ID = c.ID # replace IN-clause with inner join
left outer join Table2 d ON d.seq_id=c.ID
LEFT outer JOIN Table3 e ON e.role_id=d.role
WHERE e.Parameters in
           ('Section1',
           'Section2','Section3',
           'Section4');")

res1 <- dbGetQuery(conn,query)

作为参考，我推荐tidyverse文档。您可能还会发现这个问题在使用copy_to编写时对调试很有帮助。

增加超时延迟

当一个-子句中有很多值时，执行查询的速度会慢得多，因为-子句本质上被转换为一个OR语句序列。

根据此链接，您可以通过以下方式更改MySQL的超时选项：

编辑您的my.cnf (MySQL配置文件)
添加超时配置并调整其以适应您的服务器。
- wait_timeout = 28800
- interactive_timeout = 28800

重启MySQL

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60035473

复制

相似问题

问如何在R中批量查询Ids数
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R中批量查询Ids数EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R中批量查询Ids数
EN