文章/答案/技术大牛

发布

社区首页 >问答首页 >Postgresql 11:由于复制超时而终止walsender进程

问Postgresql 11:由于复制超时而终止walsender进程
EN

Database Administration用户

提问于 2021-08-20 00:11:20

回答 1查看 1.6K关注 0票数 4

我发现了一些关于同样错误的问题，但没有找到任何一个回答我的问题。

设置是我有两个Postgres11集群(A和B)，它们利用发布和订阅功能将数据从A复制到B。

A(源DB-发布)

这很好，但是当插入到节点A中的表上的数据卷增加时，通常(并不总是)会出现以下错误。

“由于复制超时而终止wal发件人进程”

通过COPY命令，目前输入的数据量大约为每秒30K行，连续数小时。

早些时候，wal_sender_timeout被设置为5秒，我经常会看到这个错误。然后，我将其增加到1分钟，并且减少了此错误的频率。但是我不想在不了解是什么导致它的情况下继续增加它。我查看了walsender.c的代码，发现它是从这里来的。

if (wal_sender_timeout > 0 && last_processing >= timeout)
     {
         /*
          * Since typically expiration of replication timeout means
          * communication problem, we don't send the error message to the
          * standby.
          */
         ereport(COMMERROR,
                 (errmsg("terminating walsender process due to replication timeout")));
 
         WalSndShutdown();
     }

但是，我仍然不清楚是哪个参数使发送方假设接收节点是非活动的，因此它应该停止wal_sender。

SourceDB

sourcedb=# show wal_sender_timeout;
 wal_sender_timeout
--------------------
 1min
(1 row)

sourcedb=# select * from pg_replication_slots;
             slot_name              |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin |  restart_lsn   | confirmed_flush_lsn

------------------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+----------------+--------------------
-
 sub_target_DB                      | pgoutput | logical   |  16501 | sourcedb | f         | t      |      68229 |      |     98839088 | 116D0/C36886F8 | 116D0/C3E5D370

TargetDB

targetdb=# show wal_receiver_timeout;
 wal_receiver_timeout
----------------------
 1min
(1 row)


targetdb=# show wal_retrieve_retry_interval ;
 wal_retrieve_retry_interval
-----------------------------
 5s
(1 row)

targetdb=# show wal_receiver_status_interval;
 wal_receiver_status_interval
------------------------------
 2s
(1 row)

targetdb=# select * from pg_stat_subscription;
   subid    |              subname               |  pid  | relid |  received_lsn  |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        l
atest_end_time
------------+------------------------------------+-------+-------+----------------+-------------------------------+-------------------------------+----------------+---------
----------------------
 2378695757 | sub_target_DB                      | 62371 |       | 116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09 | 2021-08-20 09:05:15.398471+09 | 116D1/2BA8F170 | 2021-08-
20 09:05:15.398423+09

编辑1:将wal_sender_timeout或wal_receiver_timeout保持在更高的值有什么坏处吗？我知道，如果发生实际故障，WAL段将继续堆积在发送方的pg_wal文件夹中。但是有安全的限制吗？

编辑2:将wal_sender_timeout增加到5分钟，错误开始出现得更频繁。不仅如此，它甚至扼杀了活动订阅，并停止了数据复制。必须重新启动。所以很明显，仅仅增加wal_sender_timeout是没有帮助的。

postgresql

replication

postgresql-11

write-ahead-logging

回答 1

Database Administration用户

发布于 2022-11-14 22:07:01

我知道这是一个老问题，但我能够通过增加wal_sender_timeout来解决这个问题。

如果您得到更多的错误，您可能需要显式地指定参数中的时间增量(使用m或s)。

票数 -1

页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://dba.stackexchange.com/questions/298288

复制

相似问题

问Postgresql 11:由于复制超时而终止walsender进程
EN

SourceDB

TargetDB

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Postgresql 11:由于复制超时而终止walsender进程EN

SourceDB

TargetDB

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Postgresql 11:由于复制超时而终止walsender进程
EN