排查SpringCloudGateway的readAddress(..) failed: Connection reset by peer问题(利用arthas) 您所在的位置:网站首页 obsvered 排查SpringCloudGateway的readAddress(..) failed: Connection reset by peer问题(利用arthas)

排查SpringCloudGateway的readAddress(..) failed: Connection reset by peer问题(利用arthas)

2023-11-08 00:44| 来源: 网络整理| 查看: 265

问题 [id:48b8a8f5-1, L:/网关:37187 - R:应用/应用:应用端口] The connection observed an error, the request cannot be retried as the headers/body were sent io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer [2021-12-28T11:31:06,329] DEBUG Stopping retries since predicate returned false, retry context: iteration=1 exception=io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer backoff={0ms} [2021-12-28T11:31:06,330] ERROR ==>返回错误信息 io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: Error has been observed at the following site(s): |_ checkpoint ⇢ org.springframework.web.cors.reactive.CorsWebFilter [DefaultWebFilterChain] ... |_ checkpoint ⇢ HTTP POST "接口" [ExceptionHandlingWebHandler] Stack trace: 分析

大体一看,是netty报的io异常,然后被reactor-netty抛出到网关返回了。 reactor-netty是SCG的底层依赖,去 https://github.com/reactor/reactor-netty 发现有同样的issue。 大体总结就是

后端应用tomcat有个keepalivetimeout,默认与connecttimeout相同,为20s。也就是说闲置20s的连接会被断开丢弃,主要跟keepalive长连接有关。SCG网关有个连接池,存放与后端应用tomcat的通道连接,默认不释放回收,取用连接方式为FIFO先入先出(取用最早建立的连接)以上两者,其中至少一个没有配置正确,导致SCG连接池释放闲置时间大于tomcat与网关的断开时间,就会导致浏览器向网关请求,网关获取到的连接是已经断开的连接,继续向tomcat请求,tomcat发现使用的是废弃的连接则返回reset,网关抛出异常,但后续接口不受影响。 可简单参考下图。.在这里插入图片描述 解决 ①配置SCG网关 spring: cloud: gateway: httpclient: response-timeout: 10s pool: type: fixed max-idle-time: 5000 max-connections: 200 acquire-timeout: 45000 ②配置网关启动参数:reactor-netty取用连接的规则LIFO -Dreactor.netty.pool.leasingStrategy=lifo ③配置tomcat的闲置断开时间 server: tomcat: connection-timeout: 10000 #根据需要 ⑤配置nacos 元数据 spring: cloud: nacos: discovery: metadata: response-timeout: 10000 connect-timeout: 3000 ④配合nginx关于keepalive的配置 upstream gateway { # ip_hash; server x.x.x.x:xx weight=1; server x.x.x.x:xx weight=1; keepalive 100; keepalive_requests 30000; keepalive_timeout 300s; } location /xxx{ proxy_pass http://gateway; proxy_http_version 1.1; proxy_set_header Connection ""; } 用arthas测试(其他)

使用arthas https://arthas.aliyun.com/doc/index.html

curl -O https://arthas.aliyun.com/arthas-boot.jar java -jar arthas-boot.jar

附加到网关

vmtool --action getInstances --classLoaderClass org.springframework.boot.loader.LaunchedURLClassLoader --className reactor.netty.resources.PooledConnectionProvider --express 'instances[0].defaultPoolFactory.leasingStrategy'

返回

@String[lifo]

则说明修改成功

👉推荐!!!【腾讯云】爆款2核4G云服务器首年74元/年 👉推荐!!!【腾讯云】1核2G5M轻量应用服务器50元/年 【腾讯云】云数据库低至9.9/年!MySQL7.4元/月 【阿里云】ECS云服务器特惠 【阿里云】服务器首购优惠 如果文章对您有帮助,扫个红包码呗

红包码



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有