Project

General

Profile

Actions

Feature #1176

closed

[infra] Healthcheck GPU voice-clone by AWS Lambda Schedule

Added by Phước Ngọc Trần 3 months ago. Updated 18 days ago.

Status:
Closed
Priority:
Normal
Category:
-
Start date:
10/06/2024
Due date:
10/10/2024
% Done:

100%

Estimated time:
4:00 h
Spent time:

Description

- Upload các files credentials của k8s GPU lên S3.
- Sử dụng lambda get cái credentials trên để check log, line cuối cùng của pod.
- Nếu timestamp line log cũ hơn 15p so với hiện tại thì sẽ restart lại pod mới.

Actions #1

Updated by Phước Ngọc Trần 3 months ago

  • Status changed from In Progress to Resolved - Dev
  • % Done changed from 10 to 100

Evidences:
https://eu-central-1.console.aws.amazon.com/lambda/home?region=eu-central-1#/functions/tts-openai-prod-kubernetes-healthcheck-lastline-log?tab=code

Có 2 kiểu trigger:
1. Mỗi giờ sẽ healthcheck 1 lần
2. Trigger bằng chat trên Slack #monitoring-general-notifications

   @aws lambda invoke --payload {"manuallyCheck": "true"} --function-name tts-openai-prod-kubernetes-healthcheck-lastline-log --region eu-central-1

Actions #2

Updated by Phước Ngọc Trần 18 days ago

  • Status changed from Resolved - Dev to Closed
Actions

Also available in: Atom PDF