Senior DevOps / AWS Cloud Engineer

Stack: ASP.NET Core (C#) for most microservices, Go/Java for matching/wallet components (where applicable), PostgreSQL (transactional), ClickHouse (analytics), Kafka (AWS MSK), Redis, AWS, HSM (e.g., Thales Luna 7), CQRS/Event Sourcing where appropriate, gRPC + REST.

Responsibilities

  • Design, build, and operate production AWS infrastructure for a highload exchange (security, scalability, reliability).
  • Own and evolve CI/CD pipelines (GitHub Actions / AWS CodeBuild/CodePipeline), release strategies (blue/green, canary), rollbacks, and safe migrations.
  • Run container workloads on ECS Fargate or EKS (based on final architecture decisions), including deployment automation and operational playbooks.
  • Operate and tune Kafka on AWS MSK: topics, partition strategy, retention, ACL/SASL, consumer lag, retry/DLQ patterns, schema/versioning practices.
  • Operate PostgreSQL (preferably Aurora): performance, replication, backup/restore, failover testing.
  • Maintain ClickHouse (cluster/replication/partitions/merges/backup) and Redis (ElastiCache) for caching/rate-limit and low-latency access.
  • Implement observability: OpenTelemetry, metrics/logs/traces, alerting, SLO/SLA, incident response, RCA and postmortems.
  • Security ownership: IAM design, KMS, Secrets Manager/Parameter Store, encryption at rest/in transit, secret rotation, hardening, network segmentation (VPC, SG, NACL).
  • Integrate with the signing/HSM perimeter (Thales Luna / PKCS#11, or CloudHSM/KMS where applicable): secure key workflows, audit trails.
  • Protect public endpoints: WAF/Shield, rate limiting, API exposure via ALB/NLB/API Gateway/CloudFront (per final design).
  • Capacity planning, cost optimization (FinOps), and DR/BCP readiness (multi-AZ now, multi-region roadmap).

Required qualifications

  • 4+ years in DevOps/SRE with hands-on AWS production experience.
  • Strong knowledge of AWS core services: VPC, IAM, EC2, ECS/EKS, ALB/NLB, Route53, CloudWatch, CloudTrail, KMS, S3, RDS/Aurora, ElastiCache, ECR.
  • Proven experience with CI/CD and release engineering (blue/green, canary, rollback, safe DB migrations).
  • Production experience with Kafka (ideally AWS MSK): partitioning/retention, consumer groups, idempotency, at-least-once processing, monitoring lag.
  • Solid PostgreSQL skills (ops + performance basics).
  • Containers: Docker, orchestration on ECS/EKS, Infrastructure as Code (Terraform preferred).
  • Strong Linux fundamentals, networking (TLS, DNS), and practical security mindset.
  • Incident management experience: on-call, debugging under pressure, clear RCA, runbooks.

Nice to have

  • Deep ClickHouse operations experience (replication, partitions, performance tuning, backups).
  • Experience with HSM/PKCS#11, secure signing flows, key custody, audit/compliance requirements.
  • FinTech/Crypto background, familiarity with AML/KYC and audit requirements.
  • Observability stacks: Prometheus/Grafana, Loki/ELK/OpenSearch (or equivalents) alongside OpenTelemetry.
  • Multi-region active-active / DR drills and failover automation.
  • Load testing, capacity planning, and performance engineering.

What we offer

  • A genuinely challenging system: low latency, high throughput, strict security, real-time eventing and analytics.
  • Strong ownership and impact: you’ll shape our CI/CD, MSK strategy, security baseline, ClickHouse operations, and DR plan.
  • Remote/hybrid (by agreement) and competitive compensation.
  • Solid engineering culture: IaC, automation-first, blameless postmortems, and documentation.

 

Required languages

English B1 - Intermediate
Docker, Linux, AWS, Terraform, CI/CD, Kubernetes, Git, DevOps, Nginx, Prometheus+Grafana
Published 29 December 2025 · Updated 13 January
Statistics:
117 views
·
25 applications
100% read
·
50% responded
Last responded 3 days ago
To apply for this and other jobs on Djinni login or signup.
Loading...