超时熔断

当被代理服务压力过大或出现异常而处理而响应过慢时，大量请求可能就积压在网关上，这种情况下，网关长时间得不到返回时应该及时取消等待，进行降级并熔断返回，典型的降级熔断过滤器是hystrix，但由于hystrix处于了维护状态，Spring Cloud Gateway将会在以后结束对其的支持，推荐使用resilience4j取代hystrix进行配置，同时也为其提供了start支持。下面便基于resilience4j实现超时熔断，当网关在5s内得不到响应时，就做降级熔断处理，直接返回timeout提示。

首先需要引入resilience4j的依赖：

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
</dependency>

然后自定义超时策略如下：

package top.wteng.gatewayserver.filter;

import io.github.resilience4j.timelimiter.TimeLimiterConfig;
import org.springframework.cloud.circuitbreaker.resilience4j.ReactiveResilience4JCircuitBreakerFactory;
import org.springframework.cloud.client.circuitbreaker.Customizer;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;

import java.time.Duration;

@Component
public class CircuitBreakerFilterFactory {
    private final int TIMEOUT_SECONDS = 5;

    @Bean
    public Customizer<ReactiveResilience4JCircuitBreakerFactory> timeoutFuse() {
        return factory -> {
            // 超时配置
            TimeLimiterConfig timeLimiterConfig = TimeLimiterConfig
                    .custom()
                    .timeoutDuration(Duration.ofSeconds(TIMEOUT_SECONDS)) // 5秒超时
                    .cancelRunningFuture(true) // 取消线程
                    .build();
            factory.configure(builder -> {
                builder.timeLimiterConfig(timeLimiterConfig).build();
            }, "timeoutFuse"); // timeoutFuse为过滤器的id，在路由中配置时依据此id
        };
    }

}

然后定义一个接口，当超时熔断发生时，降级到该接口返回：

package top.wteng.gatewayserver.controller;

import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("fallback")
public class FallbackController {

    @RequestMapping("timeoutFuse")
    public String timeoutFuse() {
        return "timeout !";
    }
}

使用时，只需在配置文件中对路由进行配置即可，如下：

spring:
  application:
    name: gateway-server
  cloud:
    gateway:
      enabled: true
      routes:
      - id: user-rest
        uri: lb://user-rest
        predicates:
        - Path=/user-rest/**
        filters:
        - StripPrefix=1 # 路径截取
        - name: CircuitBreaker # 熔断配置
          args:
            name: timeoutFuse # 策略名称，即id
            fallbackUri: forward:/fallback/timeoutFuse # 熔断后降级到的路由

这样，当网关代理user-rest服务超过5s，得不到返回时，就会取消等待，并降级跳转到/fallback/timeoutFuse进行返回。

重试机制

spring cloud gateway内置了重试过滤器，名称为Retry，当代理服务发生错误返回时(如5xx)，可以进行重试，无需引入任何依赖，直接在配置文件中配置即可：

server:
  port: 18000
spring:
  application:
    name: gateway-server
  cloud:
    gateway:
      enabled: true
#      discovery:
#        locator:
#          enabled: true
#          lower-case-service-id: true
      routes:
      - id: user-rest
        uri: lb://user-rest
        predicates:
        - Path=/user-rest/**
        filters:
        - StripPrefix=1 # 路径截取
        - name: Retry # 重试过滤器配置
          args:
            retries: 3
            statuses: BAD_GATEWAY,GATEWAY_TIMEOUT
            methods: GET,PUT
            exceptions: java.io.IOException
            backoff:
              firstBackoff: 20ms
              maxBackoff: 200ms
              factor: 2
              basedOnPreviousValue: false

其中的字段意义为：

reties：发生错误时的重试次数
statuses：需要重试的状态码，默认为BAD_GATEWAY
methods：需要重试的请求类型。需要注意的是，带有body的请求永远都不会进行重试，如带有body参数的POST、PUT等非GET请求。
exceptions：需要重试的异常
backoff：指数退避配置，指数退避的意义即为请求重试时不会采取固定的时间间隔，而是依据一定策略指数性的增加重试间的延迟时间，不配置此项时每次重试间不会进行等待，其中
- firstBackoff：第一次重试时的延迟时间
- maxBackoff：最大延迟时间，因为指数退避策略决定了重试的延迟时间是随重试次数指数增长的，因此当重试次数过多时，重试延迟可能会非常大，也就意味着调用者要进行长时间等待。因此需要配置最大退避时间，保证重试延迟不超过该值。
- factor：指数因子
- basedOnPreviousValue：是否依据上次重试延迟计算本次重试延迟

当basedOnPreviousValue为false时发生重试时等待间隔时间的计算如下：

interval = min(maxBackoff, firstBackoff * (factor ^ n))

其中n为迭代次数，等于第几次重试 - 1。

当basedOnPreviousValue为true时发生重试时等待间隔时间的计算如下：

interval = prevBackoff * factor # interval 为上次重试时的延迟时间

按照上面的配置，当发生可重试的异常时，最多重试三次，当第一次重试时，延迟时间应该为firstBackoff即20ms，当第二次重试延迟时间应该为20 * (2 ^ 1) = 40ms，第三次重试时的延迟时间应该为20 * (2 ^ 2) = 80ms。在下层服务中写一个直接抛503异常的测试接口如下：

package top.wteng.userrest.controller;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.HttpStatus;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseStatus;
import org.springframework.web.bind.annotation.RestController;

import java.util.concurrent.TimeoutException;

@RestController()
@RequestMapping("user")
public class UserController {
    private final Logger logger = LoggerFactory.getLogger(UserController.class);

    @GetMapping("bad_gateway")
    @ResponseStatus(HttpStatus.BAD_GATEWAY)
    public String shouldBadGateway() throws InterruptedException {
        logger.info("<-- GET: bad_gateway ...");
        return "bad gateway";
    }
}

通过网关请求接口，输出如下:

Gatewayretrypng

可以看到共重试了三次，连同第一次正常请求共发生了四次请求，第一重试的延迟为761-724 = 37ms，第二次重试的延迟为808 -761 = 47ms，第三次重试的延迟为903 - 808 = 95ms，正好符合前边计算的20ms、40ms、80ms，多出的十几毫秒为处理以及网络延迟。

迎风而立砥砺前行

SpringCloudGateway超时熔断及请求异常重试

超时熔断

重试机制

文章除注明转载出处外，均为Tensoar原创，且采用CC BY-SA 4.0国际许可协议进行许可
转载请注明文章原始出处!

迎风而立 砥砺前行

超时熔断

重试机制

文章除注明转载出处外，均为Tensoar原创，且采用CC BY-SA 4.0国际许可协议进行许可转载请注明文章原始出处!

迎风而立砥砺前行

文章除注明转载出处外，均为Tensoar原创，且采用CC BY-SA 4.0国际许可协议进行许可
转载请注明文章原始出处!