R
Full-time
On-site
Philippines
Test Lead
Configure and maintain Datadog dashboards, alerts, monitors, SLOs & SLIs.

Integrate Datadog with cloud environments (AWS / Azure / GCP), Kubernetes, and on-prem applications.

Implement APM traces, RUM, Infrastructure Monitoring, and Log Management.

Develop and standardize observability best practices across teams.

Troubleshoot performance issues using Datadog metrics, logs & traces.

Automate monitoring setup using Terraform / Ansible / CI/CD tools.

Work closely with DevOps, SRE, and development teams to ensure platform reliability.

Optimize alerting to reduce noise and enhance incident response processes.

Required Skills

Hands-on experience with Datadog (Dashboards, Log Pipelines, Metrics, Alerts, APM).

Strong knowledge of Linux-based systems and system performance metrics.

Experience working with Containers & Kubernetes (EKS / AKS / GKE).

Proficiency with at least one scripting language: Python / Bash / Shell.

Experience with Cloud platforms: AWS / Azure / GCP.

Understanding of CI/CD pipelines and Infrastructure as Code (Terraform preferred).

Good to Have

Experience with Incident Management / SRE practices

Familiarity with Prometheus, Grafana, Splunk, New Relic, or similar tools

Knowledge of Service Mesh / Microservices architecture

Networking basics (DNS, Load balancing, SSL/TLS)
Apply now
Share this job