[Monitoring] DCMG-Exporter 메트릭이 프로메테우스에 수집 안되는 현상

DevOps/Monitoring

[Monitoring] DCMG-Exporter 메트릭이 프로메테우스에 수집 안되는 현상

장그래 2022. 8. 16. 16:37

개요

GPU 모니터링을 위해 NVIDIA에서 제공하는 DCGM-Exporter를 설치한 후, 프로메테우스에서 Metrics을 조회해보니 조회되지 않는 현상이 발생하였다.

(dcgm-exporter POD에 접속하여, curl 날려본 결과 정상적으로 메트릭을 보내는 것을 알 수 있다.)
원인은 prometheus 설정 부분으로 추측하였다.

https://github.com/NVIDIA/dcgm-exporter

GitHub - NVIDIA/dcgm-exporter: NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM - GitHub - NVIDIA/dcgm-exporter: NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

github.com

원인 해결

serviceMonitorSelectorNilUsesHelmValues 설정이 True로 되어 있다면, helm release 이름을 기반으로 prometheus 리소스를 생성했기 때문이다.. (promethues는 kube-prometheus-stack를 통해 설치하였고, dcgm-exporter는 yaml로 설치했기 때문에 realse 이름이 같을 수가 없다.)

그래서 kube-prometheus-stack의 values 파일에서 serviceMonitorSelectorNilUsesHelmValues 이 부분을 false로 바꿔 해결할 수 있다. 그렇게 되면 selector 설정 없이 promethues 리소스를 생성하게 되어 정상적으로 모니터링할 수 있게 된다.

https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/dcgm-exporter.html#gpu-telemetry

DCGM-Exporter — NVIDIA Cloud Native Technologies documentation

In this scenario the DCGM nv-hostengine runs in a separate container on the same host making its client port available to DCGM-Exporter as well as dcgmi client commands. Warning Similar to the warning when connecting to an existing DCGM agent, the dcgm-exp

docs.nvidia.com

 # If true, a nil or {} value for prometheus.prometheusSpec.serviceMonitorSelector will cause the
# prometheus resource to be created with selectors based on values in the helm deployment,
# which will also match the servicemonitors created
#
serviceMonitorSelectorNilUsesHelmValues: false

저작자표시 비영리 변경금지 (새창열림)

'DevOps > Monitoring' 카테고리의 다른 글

[Monitoring] Grafana 대시보드 백업 자동화 (0)	2022.11.07
[Monitoring] kube_node_labels에 label이 표시 안되는 현상 (kube-state-metrics) (grafana/prometheus) (0)	2022.08.31
[Monitoring] 프로메테우스란 (Prometheus) (0)	2022.08.04

현재글[Monitoring] DCMG-Exporter 메트릭이 프로메테우스에 수집 안되는 현상

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

컴맹에서 개발자 되기