Senior DevOps Engineer

CodeSmart is looking for a hands-on, systems-minded Senior DevOps Engineer to own and evolve our cloud infrastructure, deployment pipelines, and runtime reliability. You’ll work across AWS, Terraform, Rancher/Kubernetes, and Linux to build highly available, production-grade systems that can scale with our products.

This role is ideal for someone who enjoys designing resilient architectures, automating everything, and partnering closely with backend/frontend engineers (Python, Java, and JavaScript stacks) to ship safely and operate confidently.

Responsibilities

Design and operate AWS infrastructure for scalable, secure, and highly available production environments.
Build and maintain Infrastructure-as-Code using Terraform (modules, state management, workspaces, best practices).
Manage container orchestration and workloads using Rancher (Kubernetes lifecycle, clusters, upgrades, policies).
Own CI/CD pipelines (build, test, deploy), release workflows, and environment promotion strategies.
Implement reliability practices: monitoring, logging, alerting, incident response, postmortems, SLOs/SLIs.
Design high availability and disaster recovery approaches (multi-AZ, backups, failover, runbooks).
Manage Linux systems and networking: hardening, performance tuning, troubleshooting, OS-level automation.
Support and optimize VM-based workloads where needed (images, scaling, patching, sizing, cost control).
Partner with engineering teams to improve operability for Python/Java/JS services (deploy patterns, configs, secrets, rollback).
Establish secure secrets management, IAM best practices, and least-privilege access across environments.
Drive continuous improvement in infrastructure security, stability, and developer experience.

Requirements

6+ years of commercial DevOps / SRE / Platform Engineering experience (or equivalent).
Strong AWS experience across core services (e.g., VPC, EC2, IAM, ALB/NLB, RDS, S3, CloudWatch) and production operations.
Strong Terraform experience: reusable modules, remote state, state locking, drift detection, and multi-environment setups.
Hands-on Rancher experience managing Kubernetes clusters in production (upgrades, node pools, ingress, RBAC, policies).
Expert Linux skills: debugging, networking basics, system performance, permissions, automation, and security hardening.
Proven experience designing and operating high availability systems (multi-AZ, failover, redundancy, DR planning).
Solid understanding of virtual machines: compute sizing, images, networking, storage, patching, and lifecycle management.
Strong scripting/automation skills with a Python stack for DevOps (automation tooling, CLI scripts, operational utilities).
Working knowledge of Java stack operations (JVM service deployment patterns, tuning basics, observability, rollout/rollback).
Working knowledge of JS stack operations (Node.js services, build/release, environment config, runtime monitoring).
Experience with containers and Kubernetes fundamentals: deployments, services, ingress, configmaps/secrets, autoscaling.
Familiarity with secure SDLC practices: secrets, IAM policies, vulnerability scanning, least privilege, audit readiness.
Strong troubleshooting mindset with demonstrated production incident ownership.
Excellent communication skills and ability to collaborate in a distributed team.

Nice to Have

Experience with GitOps (e.g., Argo CD / Flux), Helm, and progressive delivery (blue/green, canary).
Experience with centralized logging/metrics stacks (Prometheus, Grafana, ELK/OpenSearch).
Experience optimizing cloud costs (rightsizing, reserved instances/savings plans, storage lifecycle policies).
Experience with service mesh / advanced Kubernetes networking (as relevant to the environment).
Prior experience supporting AI/LLM-heavy workloads (bursty traffic, queueing, cost monitoring, reliability patterns).

What to Include (Helps Us Review Faster)

A short write-up of a production system you operated (AWS + Terraform + Rancher), including HA/DR approach.
Examples of incident ownership (what happened, how you mitigated, what you improved afterward).
Links to IaC examples (Terraform modules), pipeline examples, or public repos (if available).

Required languages

English

C1 - Advanced

CI/CD, Prometheus+Grafana, DevOps, AWS, Git

Published 19 January

20 views

5 applications

40% read

To apply for this and other jobs on Djinni login or signup.

Only from 6 years of experience
Full Remote
Worldwide
Countries where we consider candidates
- English C1 - Advanced

DevOps

Employment: Fulltime
Domain: Security
Product

Apply for the job

40% read

0% responded

📊 Average salary range of similar jobs in analytics →