# DevOps

# Servers and clusters

Our our services listen in apliteni/infra/inventory repo.

# Monitoring

# Project monitoring checklist

  • Send errors to Sentry.io.
  • Add site24x7 monitor for external monitoring.
  • Configure readiness and liveness probes.
  • (Optional) Export Prometheus metrics.
  • (WIP) Push logs to Loki instance.

# Cluster monitoring

We use Prometheus to store metrics and Grafana for dashboards.

# Guidelines

# Infrastructure as code

  • If the project is deployed on a VPS its deployment should be described as Ansible Playbook.
  • Every project should test and deploy itself. See Gitlab CI/CD.
  • Every project should be able at least to launch from docker compose.
  • Use bash as a scripting language. Use PHP for complicated logic. Use Golang for performance.

# Ansible guidelines

  • Use .ini files for inventory vaults.
  • Ansible files store in an ansible/ folder, roles in ansible/roles.
  • All external roles commit to a project repository.
  • If you use an external role a code audit is a must. Nowadays many public roles share harmful software.
  • Use - delimiter in roles names, e.g.: apliteni.postgres-wale .
  • Encrypt important passwords and tokens with ansible-vault (opens new window). A simple manual on how to use ansible-vault https://gist.github.com/tristanfisher/e5a306144a637dc739e7 (opens new window). Send the encryption key to at least 2 colleagues.

# Docker-compose guidelines

  • The file name is docker-compose.yml.
  • Add ENV parameters to a .env file. There should be only .env.example in a repository.
  • It's preferable to prepare docker-sync (opens new window) configuration for big projects.
  • Grafana/Prometheus/AlertManager if you need custom monitoring including tracking business indicators.

# Recommendations for documenting the project

  • README.md is a documenting file. In case you are creating another file give a link to it in README.md.
  • List the paths to the service logs and configs. Even if they are standard, it will save time finding them.
  • Create a Troubleshooting or Решение проблем section. Briefly describe the solution after each time the service is down.
  • Re-read README.md from time to time, remove old and unnecessary information.

# Recommendations for connecting Sentry.io

Every application in a production mode should send errors to sentry.io.

# Recommendations for setting up Site24x7

What we are monitoring:

  • Server indicators (cpu, ram, disk) with alerts.
  • The state and load of databases (MySQL, Postgres, Redis).
  • The processes which may cause the service interruption (haproxy, nginx).

# Recommendations when deploying new Grafana

  • Prometheus is the preferred data source.
  • An access to UI should be available only from our VPN. In Rancher's case, the access is already limited.
  • Recommended Dashboards:
    • 10242 (Node Exporter Full with Node Name).
    • 455 (Postgres Performance).
    • 893 (Docker).
    • 9628 (PostgreSQL database stats).
  • Recommended plugins:
    • camptocamp-prometheus-alertmanager-datasource.
  • External Grafana's (not the ones from the Rancher) should use an authorisation auth.google (opens new window):
    • [auth.google] allow_sign_up = true - otherwise, authorization of new users will not work.
    • [users] auto_assign_org_role = Admin - so that everyone could manage Grafana.
    • [auth] disable_login_form = true - disable password authorization.
    • Remove admin user.
  • For non-critical situations, send notifications to the # alerts channel or the project channel.
  • It is important to disable notifications that do not require a response in order to reduce the noise level.

# Recommendations for setting up Prometheus

  • Access to the UI should be available only from our VPN or even closed.
  • Recommended jobs:
    • prom/node-exporter - servers (cpu, ram, disk).
    • google/cadvisor - containers.

# Recommendations for setting up AlertManager

  • For non-critical situations, send notifications to the #alerts channel or the project channel.
  • It is important to disable notifications that do not require a response in order to reduce the noise level.