Senior DevOps / AWS Cloud Engineer
Stack: ASP.NET Core (C#) for most microservices, Go/Java for matching/wallet components (where applicable), PostgreSQL (transactional), ClickHouse (analytics), Kafka (AWS MSK), Redis, AWS, HSM (e.g., Thales Luna 7), CQRS/Event Sourcing where appropriate, gRPC + REST.
Responsibilities
- Design, build, and operate production AWS infrastructure for a highload exchange (security, scalability, reliability).
- Own and evolve CI/CD pipelines (GitHub Actions / AWS CodeBuild/CodePipeline), release strategies (blue/green, canary), rollbacks, and safe migrations.
- Run container workloads on ECS Fargate or EKS (based on final architecture decisions), including deployment automation and operational playbooks.
- Operate and tune Kafka on AWS MSK: topics, partition strategy, retention, ACL/SASL, consumer lag, retry/DLQ patterns, schema/versioning practices.
- Operate PostgreSQL (preferably Aurora): performance, replication, backup/restore, failover testing.
- Maintain ClickHouse (cluster/replication/partitions/merges/backup) and Redis (ElastiCache) for caching/rate-limit and low-latency access.
- Implement observability: OpenTelemetry, metrics/logs/traces, alerting, SLO/SLA, incident response, RCA and postmortems.
- Security ownership: IAM design, KMS, Secrets Manager/Parameter Store, encryption at rest/in transit, secret rotation, hardening, network segmentation (VPC, SG, NACL).
- Integrate with the signing/HSM perimeter (Thales Luna / PKCS#11, or CloudHSM/KMS where applicable): secure key workflows, audit trails.
- Protect public endpoints: WAF/Shield, rate limiting, API exposure via ALB/NLB/API Gateway/CloudFront (per final design).
- Capacity planning, cost optimization (FinOps), and DR/BCP readiness (multi-AZ now, multi-region roadmap).
Required qualifications
- 4+ years in DevOps/SRE with hands-on AWS production experience.
- Strong knowledge of AWS core services: VPC, IAM, EC2, ECS/EKS, ALB/NLB, Route53, CloudWatch, CloudTrail, KMS, S3, RDS/Aurora, ElastiCache, ECR.
- Proven experience with CI/CD and release engineering (blue/green, canary, rollback, safe DB migrations).
- Production experience with Kafka (ideally AWS MSK): partitioning/retention, consumer groups, idempotency, at-least-once processing, monitoring lag.
- Solid PostgreSQL skills (ops + performance basics).
- Containers: Docker, orchestration on ECS/EKS, Infrastructure as Code (Terraform preferred).
- Strong Linux fundamentals, networking (TLS, DNS), and practical security mindset.
- Incident management experience: on-call, debugging under pressure, clear RCA, runbooks.
Nice to have
- Deep ClickHouse operations experience (replication, partitions, performance tuning, backups).
- Experience with HSM/PKCS#11, secure signing flows, key custody, audit/compliance requirements.
- FinTech/Crypto background, familiarity with AML/KYC and audit requirements.
- Observability stacks: Prometheus/Grafana, Loki/ELK/OpenSearch (or equivalents) alongside OpenTelemetry.
- Multi-region active-active / DR drills and failover automation.
- Load testing, capacity planning, and performance engineering.
What we offer
- A genuinely challenging system: low latency, high throughput, strict security, real-time eventing and analytics.
- Strong ownership and impact: you’ll shape our CI/CD, MSK strategy, security baseline, ClickHouse operations, and DR plan.
- Remote/hybrid (by agreement) and competitive compensation.
- Solid engineering culture: IaC, automation-first, blameless postmortems, and documentation.
Required languages
| English | B1 - Intermediate |
Docker, Linux, AWS, Terraform, CI/CD, Kubernetes, Git, DevOps, Nginx, Prometheus+Grafana
📊
$2000-4000
Average salary range of similar jobs in
analytics →
Loading...