Lead Observability Engineer (US Time Zone) (#4779)

N-iX is a global software development company founded in 2002, connecting over 2,400+ tech professionals across 40+ countries. We deliver innovative technology solutions in cloud computing, data analytics, AI, embedded software,IoT, and more to global industry leaders and Fortune 500 companies. Join us to create technology that drives real change for businesses and people across the world.

 

About the position

We are seeking a Lead Observability Engineer with proven expertise in designing, operating, and scaling analytical data systems, specifically ClickHouse or similar distributed databases. In this role, you will take a hands-on leadership position in architecting and migrating our existing custom Cosmos telemetry system storage to a robust, high-performing ClickHouse-based solution. You’ll be instrumental in building the foundation for alerting, notification, and telemetry workflows that empower visibility into our production systems.

 

Responsibilities

  • Lead the migration of DocuSign’s custom Cosmos telemetry storage solution to a scalable ClickHouse end-to-end platform
    Architect, deploy, and productionize ClickHouse clusters for high-volume observability, alerting, notifications, and telemetry workloads
  • Own the operations and reliability of ClickHouse environments (self-managed or cloud), including upgrades, scaling, backup/recovery, and incident response
  • Tune performance, query optimization, and storage strategies (replication, sharding, compactions/merges)
  • Develop robust alerting and monitoring frameworks for the ClickHouse platform and related infrastructure
  • Collaborate with developers and SREs to define requirements and best practices around telemetry, data pipelines, and observability tooling
  • Drive automation of platform management leveraging Kubernetes at scale

 

Requirements

  • Strong hands-on experience hosting and operating ClickHouse (self-managed or cloud), or a similar distributed analytical database, in production environments
  • In-depth understanding of distributed data systems internals, including storage engines, replication, sharding, compactions/merges, and query optimization
  • Proven experience designing, deploying, and scaling ClickHouse for observability or high-volume telemetry use cases
  • Production experience running analytics infrastructure on Kubernetes at scale
  • Expertise in troubleshooting, performance analysis, and reliability engineering for large-scale database systems

 

Would be a plus

  • Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
  • Development experience with infrastructure-as-code and automation tools (Terraform, Helm, etc.)
  • Knowledge of cloud-native observability solutions and best practices

 

We offer*:

  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

*not applicable for freelancers

Required languages

English B2 - Upper Intermediate
Published 18 February · Updated 3 March
Statistics:
58 views
·
9 applications
To apply for this and other jobs on Djinni login or signup.
Loading...