基于SpringBoot的服务监控与健康检查

一. 前言

Prometheus是参考Google监控系统实现的新一代多维度指标实时监控告警系统,为满足大型项目的应用服务监控的需求。
Actuator 是 Spring Boot 提供的对应用系统的自省和监控的集成功能,可以查看应用配置的详细信息,例如自动化配置信息、创建的 Spring beans 以及一些环境属性等

二. 技术栈简介

1. Prometheus

Prometheus可以与Grafana监控展现系统无缝集成,可以灵活满足多种自定义的监控需求,提供灵活可配且强大的可视化界面。

2. Actuator

Actuator作为springboot的Starter POMs中提供的一个特殊依赖模块,极易在springBoot项目中集成,自动为应用构建了一系列用于监控的端点,同时在其他springCloud组件时会对其进行自动扩展(比如在Eurake为/health端点扩展了指标),因此推荐在SpringBoot项目中引入,其相关监控信息以json数据呈现。

三. 在项目中的实践

1. springBoot项目集成Prometheus

1). 引入maven依赖

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!-- 健康监控 Prometheus -->
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_servlet</artifactId>
<version>0.6.0</version>
</dependency>
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_spring_boot</artifactId>
<version>0.6.0</version>
</dependency>
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_hotspot</artifactId>
<version>0.6.0</version>
</dependency>

2). 启动类注解开启默认端点

springBoot项目在启动类上增加@EnablePrometheusEndpoin注解即可开启端点;
在启动类主方法上增加DefaultExports.initialize(); 可以实现JVM数据监控。

1
2
3
4
5
6
7
8
9
10
11
@EnablePrometheusEndpoint
@ServletComponentScan
@SpringBootApplication
public class Application extends SpringBootServletInitializer {

public static void main(String[] args) {
DefaultExports.initialize(); //开启JVM监控
SpringApplication.run(Application.class, args);
}

}

3). 实现核心服务耗时直方图监控指标

该功能建议在gateway中实现,各个服务通过路由路径进行访问。具体通过新增指标过滤器实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
*
* @Title: regisPrometheusMetricsFilter
* @Author: ysy
* @Description: 【用于监控模块Prometheus】
* @return FilterRegistrationBean 返回类型
* @date 2019年4月23日
*/
@Bean
FilterRegistrationBean regisPrometheusMetricsFilter(){
Filter prometheusMetricsFilter = new MetricsFilter("http_request_duration_seconds",
"Prometheus MetricsFilter http request duration(seconds)",
new Integer(5),//pathComponents level
null);
FilterRegistrationBean prometheusFilterReg = new FilterRegistrationBean();
prometheusFilterReg.setFilter(prometheusMetricsFilter);
prometheusFilterReg.setName("prometheusMetricsFilter");
prometheusFilterReg.addUrlPatterns("/*");
prometheusFilterReg.setOrder(Ordered.HIGHEST_PRECEDENCE);
return prometheusFilterReg;
}

4). 服务基本健康状态监控[自定义]

该功能通过后台线程的形式对自定义的健康状态指标进行更新,指标更新的工具类已由门户封装工具类MyServiceStatus,指标的更新拟有门户提供spring-task任务进行实时更新,建议各个服务启动时设置自身健康状态为healthy,具体健康状态有三种:

  • HEALTHY:1
  • ERROR:-1
  • WARNING:0
i. 健康状态监控工具类MyServiceStatus

基于Gauge,封装一个健康状态操作工具类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
/**
*
* @Author: ysy
* @Description: 用于prometheus状态监控
*/
public class MyServiceStatus {
private static final String ZHIXIN_APPCODE = "ZHIXIN";
private static final String ZHIXIN_APPCODE_ALL = "ZHIXIN_ALL";
private static final String PORTAL_APPCODE = "dyportalserver";
private static final String RELEASE_APPCODE = "dyreleaseserver";
private static final String WECAHT_APPCODE = "dywechatserver";
private static final String CMS_APPCODE = "dywebpageserver";
private static final String MICROBLOG_APPCODE = "dymicroblogserver";
private static final String CFB_APPCODE = "dycfbserver";
private static final String COMMAND_APPCODE = "newscommand";
private static final String TV_APPCODE = "dynewsserver";
// 自定义Gauge对象
private static final Gauge status = Gauge.build()
.name("service_status")
.help("service status")
.labelNames("module","submodule","description").register();

/**
* 健康状态常量
*/
public final static double HEALTHY = 1;
public final static double ERROR = -1;
public final static double WARNING = 0;

/**
* 设置某个模块的监控状态,支持多线程并发调用
* @param subModuleName 模块名称
* 至少要支持一个模块,该模块名称与该服务名称一致,如"creapi"
* @param healthStatus 模块健康状态
* 1: Health
* -1: Error
* -0: Warning
*/
public static void setHealthStatus(String subModuleName, double healthStatus){
if(PORTAL_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"采编门户").set(healthStatus);
}
if(RELEASE_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"通稿稿件").set(healthStatus);
}
if(WECAHT_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"微信稿件").set(healthStatus);
}
if(CMS_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"网页稿件").set(healthStatus);
}
if(MICROBLOG_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"微博稿件").set(healthStatus);
}
if(CFB_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"文件库").set(healthStatus);
}
if(COMMAND_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"指挥调度").set(healthStatus);
}
if(TV_APPCODE.equals(subModuleName)){
status.labels(ZHIXIN_APPCODE,subModuleName,"电视稿件").set(healthStatus);
}
}
/**
*
* @Title: setSummaryHealthStatus
* @Author: ysy
* @Description: 【设置智新服务中的健康状态】
* @return void 返回类型
* @date 2019年5月14日
*/
public static void setSummaryHealthStatus(double healthStatus){
// Gauge summaryStatus = Gauge.build().name("service_status").help("service status").labelNames("module").register();
status.labels(ZHIXIN_APPCODE,ZHIXIN_APPCODE_ALL,"采编平台").set(healthStatus);
}

/**
*
* @Title: inspectService
* @Author: ysy
* @Description: 【获取服务的健康状态】
* @return boolean 返回类型
* @throws Exception
* @date 2019年5月13日
*/
public static boolean monitorInspect(String serviceUrl) throws Exception{
boolean checkActive = checkActive(serviceUrl);
return checkActive;
}

private static boolean checkActive(String cmeditAddress) throws Exception {
// 1、参数校验
if(DyStringUtils.isEmpty(cmeditAddress)){
throw new Exception("服务地址为空!");
}
// 2、拼接请求接口完整地址
String url = DyStringUtils.combineUrl(cmeditAddress, Constants.CHECK_SERVICE_INTERFACE);
// 3.1 封装请求头
Map<String, String> postHeader = new HashMap<String, String>();
// 3.2 封装请求体
NameValuePair[] parametersBody = new NameValuePair[1];
parametersBody[0] = new NameValuePair("count", "11");
// 4、调用接口返回结果
JSONObject paramJson = new JSONObject();
String result = HttpClientUtil.HttpClientPost(url, paramJson.toString(), parametersBody, postHeader);
// 5、解析返回的结果
if (DyStringUtils.isNotEmpty(result) && DyStringUtils.isJson(result)) {
// 获取数据
JSONObject json = JSONObject.fromObject(result);
return json.getBoolean("status");
}
return false;
}
}
ii. 定时任务监控子模块健康状态

该定时任务,建议在gateway中实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/**
* @Author: ysy
* @Description: 更新应用状态任务
*/
@Conditional(value = { PrometheusCondition.class }) //使用策略模式控制开关
@Component("prometheusStatusCheckEngineer")
public class PrometheusStatusCheckEngineer implements Runnable {
private static Logger logger = LoggerFactory.getLogger(PrometheusStatusCheckEngineer.class);

private static final String PORTAL_APPCODE = "portalserver";
private static final String RELEASE_APPCODE = "releaseserver";
private static final String WECAHT_APPCODE = "wechatserver";
private static final String CMS_APPCODE = "webpageserver";
private static final String MICROBLOG_APPCODE = "microblogserver";
private static final String CFB_APPCODE = "cfbserver";
private static final String COMMAND_APPCODE = "commandserver";
private static final String TV_APPCODE = "newsserver";
private static final String[] APP_LIST = {PORTAL_APPCODE,
RELEASE_APPCODE,WECAHT_APPCODE,CMS_APPCODE,
MICROBLOG_APPCODE,TV_APPCODE,CFB_APPCODE,COMMAND_APPCODE};

@Autowired
private ZuulProperties zuulProperties;

@Scheduled(cron = "0/30 * * * * ?")
@Override
public void run() {
logger.info("----------更新任务状态线程启动-------");
double summaryStatus = APP_LIST.length;
boolean portalStatus = false;
for (String appContext : APP_LIST) {
try {
Map<String, ZuulRoute> routes = zuulProperties.getRoutes();
ZuulRoute zuulRoute = routes.get(appContext);
String url = zuulRoute.getUrl();
logger.info("url地址:【" + url + "】");
if(COMMAND_APPCODE.equals(appContext)){
appContext = "newscommand";
}else{
appContext = "dy"+appContext;
}
boolean monitorInspect = MyServiceStatus.monitorInspect(url);
if(monitorInspect){
MyServiceStatus.setHealthStatus(appContext, MyServiceStatus.HEALTHY);
if("dyportalserver".equals(appContext)){
portalStatus = true;
}
}else{
MyServiceStatus.setHealthStatus(appContext, MyServiceStatus.WARNING);
summaryStatus --;
}
if(summaryStatus == 0 || !portalStatus){
MyServiceStatus.setSummaryHealthStatus(MyServiceStatus.ERROR);
} else if(summaryStatus == APP_LIST.length){
MyServiceStatus.setSummaryHealthStatus(MyServiceStatus.HEALTHY);
} else{
MyServiceStatus.setSummaryHealthStatus(MyServiceStatus.WARNING);
}
} catch (Exception e) {
logger.info("检测【"+appContext+"】健康状态异常");
MyServiceStatus.setHealthStatus(appContext,MyServiceStatus.ERROR);
}
}
}
}

5). 访问URL获取监控数据

访问http://ip:port/项目上下文/prometheus,就会返回Prometheus格式的指标监控信息
一个示例:http://10.10.0.120:9010/gateway/prometheus

2. springBoot项目集成Actuator

1). 引入maven依赖

1
2
3
4
5
<!-- 健康检查 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

2). apploication文件配置

  • SpringBoot1.x的配置
    1
    2
    3
    4
    5
    ######################## 健康检查配置   ########################
    #关闭安全认证
    management.security.enabled=false
    #修改访问路径 2.0之前默认是/2.0默认是 /actuator 可以通过这个属性值修改
    management.context-path=/monitor
  • SpringBoot2.x的配置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ######################## 健康检查配置   ########################
    #关闭安全认证
    #management.security.enabled=false
    spring.application.name=gateway
    management.endpoints.web.exposure.include=*
    management.metrics.tags.application=${spring.application.name}
    management.metrics.export.atlas.enabled=true
    management.metrics.export.prometheus.enabled=true
    management.endpoints.web.base-path=/actuator
    management.endpoint.health.show-details=always

3). 自定义监控数据

通过在application-*.properties中增加配置项,实现对相关配置项进行前端显示,以便在生产环境简单地检测是否存在配置错误的情况,配置项以info.作为前缀

  • 从pom中获取项目元数据

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    info.build.name=@project.name@
    info.build.description=@project.description@
    info.build.version=@project.version@
    info.build.groupId=@project.groupId@
    info.build.artifactId=@project.artifactId@
    info.build.spring-boot-version=@project.parent.version@
    info.build.java-version=@java.version@
    info.build.user-home=@user.home@
    info.build.source-encoding=@project.build.sourceEncoding@
    info.build.output-encoding=@project.reporting.outputEncoding@
    info.build.buidTimestamp=@maven.build.timestamp@
  • 配置properties文件中的配置项

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    info.config.description =基础配置
    info.config.version =${version}
    info.config.profilesActive=${spring.profiles.active}
    info.config.casServerUrlPrefix=${casServerUrlPrefix}
    info.config.enablePrometheus=${gateway.prometheus}
    info.filter.api.description=api加密过滤器状态
    info.filter.api.enable=${dyportalserver.enableApiFilter}
    info.filter.api.filterPattern=${dyportalserver.apiFilterPattern}
    info.filter.api.ignorePattern=${dyportalserver.apiIgnorePattern}
    info.filter.cas.description=cas过滤器状态
    info.filter.cas.enable=${dyportalserver.enableCasFilter}
    info.filter.cas.filterPattern=${dyportalserver.casFilterPattern}
    info.filter.cas.ignorePattern=${dyportalserver.casIgnorePattern}
    info.mq.description=消息中间件配置
    info.mq.enable=${gateway.rabbitmq}
    info.mq.virtual-host=${spring.rabbitmq.virtual-host}
    info.mq.addresses=${spring.rabbitmq.addresses}
    info.mq.queueTv=${spring.rabbitmq.queueTv}
    info.mq.queueRelease=${spring.rabbitmq.queueRelease}
    info.mq.queueMicroblo=${spring.rabbitmq.queueMicroblo}
    info.mq.queueMicroblo=${spring.rabbitmq.queueMicroblo}
    info.mq.queueWechat=${spring.rabbitmq.queueWechat}

4). 访问URL获取监控数据

  • /info:该端点用来返回一些应用自定义的信息
  • /beans:该端点用来获取应用上下文中创建的所有Bean
  • /autoconfig:该端点用来获取应用的自动化配置报告,其中包括所有自动化配置的候选项
  • /env:该端点与/configprops不同,它用来获取应用所有可用的环境属性报告
  • /mappings:该端点用来返回所有Spring MVC的控制器映射关系报告
  • /configprops:该端点用来获取应用中配置的属性信息报告

四. FAQ

1. Prometheus项目默认的端点没有数据

数据通过调用服务的query_range端口获取,检查这个路径是否被api加密过滤器或其他全局前置过滤器拦截。

2. 监控数据被暴露出来是否有安全问题

确实有上述问题,因此一方面可以引入安全控制的依赖spring-boot-start-security,实现访问控制端点时均需要验证。另一方面,需要谨慎暴露明码的项目敏感信息

3. 集成Prometheus之后出现热部署启动异常

目前测试中发现有该问题,尚未解决

坚持原创技术分享,您的支持将鼓励我继续创作!