Notify Monitoring on k3s¶
This runbook deploys Notify into the apps namespace and adds it to Homepage.
What gets deployed¶
Deploymentnotify(apps namespace)Servicenotifyon port9090IngressRoutenotifyathttps://notify.smartmur.caServiceAccount+ read-onlyClusterRole+ClusterRoleBinding- namespaced restart
Role+RoleBindingfor allowlisted deployments PersistentVolumeClaimnotify-datafor alert state persistence- ConfigMap
notify-config(runtime monitoring config) - ConfigMap
notify-app-code(app source injected at deploy time)
Deploy¶
If you omit the argument, the script defaults to /Users/dre/Desktop/LABing/Projects/Notify.
Verify¶
kubectl -n apps get deploy notify
kubectl -n apps get pods -l app=notify
kubectl -n apps logs deploy/notify --tail=100
kubectl -n apps get svc notify
kubectl -n apps get ingressroute notify
Health endpoints:
https://notify.smartmur.ca/(web dashboard UI)https://notify.smartmur.ca/status(JSON status payload)https://notify.smartmur.ca/readyz(service readiness)https://notify.smartmur.ca/healthz(strict health, returns503when checks fail)
k8s_pods behavior note:
max_container_restartsis evaluated together withrecent_restart_window_minutes.- A pod is only flagged for restart count when the latest restart is inside that time window.
- Current default window:
360minutes (6 hours).
Homepage integration¶
manifests/apps/homepage/homepage.yml includes a Notify card in the Apps section:
- href:
https://notify.smartmur.ca - monitor:
http://notify:9090/readyz
Secure channel credentials¶
Notify reads channel credentials from secret notify-channel-secrets in namespace apps.
Create/update it without writing plaintext to git:
kubectl create secret generic notify-channel-secrets -n apps \
--from-literal=TELEGRAM_BOT_TOKEN='REPLACE_ME' \
--from-literal=TELEGRAM_CHAT_ID='REPLACE_ME' \
--from-literal=TWILIO_ACCOUNT_SID='REPLACE_ME' \
--from-literal=TWILIO_AUTH_TOKEN='REPLACE_ME' \
--from-literal=TWILIO_WHATSAPP_FROM='whatsapp:+14155238886' \
--from-literal=TWILIO_WHATSAPP_TO='whatsapp:+10000000000' \
--from-literal=ALERT_SMTP_HOST='smtp.gmail.com' \
--from-literal=ALERT_SMTP_PORT='587' \
--from-literal=ALERT_EMAIL_USER='REPLACE_ME' \
--from-literal=ALERT_EMAIL_PASSWORD='REPLACE_ME' \
--from-literal=ALERT_FROM_EMAIL='notify@kwe2.org' \
--from-literal=ALERT_TO_EMAIL_1='you@example.com' \
--dry-run=client -o yaml | kubectl apply -f -
After updating secret values, restart Notify:
kubectl -n apps rollout restart deployment/notify
kubectl -n apps rollout status deployment/notify --timeout=300s
Telegram note:
TELEGRAM_BOT_TOKENalone is not enough.- The destination user/group must send at least one message (for example
/start) to the bot first. - Then read
message.chat.idfrom:
Enable channels¶
Channels are disabled by default in manifests/apps/notify/notify.yml.
Enable whichever you want by changing enabled: false to enabled: true under:
channels.telegramchannels.whatsappchannels.email
Then re-apply Notify:
Telegram ChatOps¶
Notify can process Telegram commands when chatops.telegram.enabled=true.
Supported commands:
/status/diag <check_id>/restart [namespace/]deployment/help
Controls:
- only
admin_chat_idscan execute commands - restart command is restricted by:
chatops.telegram.restart.allowed_namespaceschatops.telegram.restart.allowed_deployments- command history is written to Notify local state (
chatops.telegram.audit)
Telegram Ops Cheat Sheet¶
Quick commands:
/help/status/diag <check_id>/restart [namespace/]deployment
Common check IDs in this stack:
k8s_nodesk8s_podsk8s_deploymentsk8s_dashboard_httpobsidian_httphomepage_http
Obsidian recovery flow:
/diag obsidian_http/restart apps/obsidian- wait 30-60 seconds
/diag obsidian_http/status