Senior DevOps Engineer IRC292108 $$$$
We are building a production-grade AI-driven SDLC platform on AWS, integrating Bedrock, enterprise knowledge bases, agent orchestration, and QuickSuite-based UI flows.
We are looking for a Principal DevOps Engineer who will own the cloud architecture, infrastructure automation, security posture, and reliability of the AI platform.
This is a hands-on technical leadership role. You will define and implement the production foundation on which AI agents and orchestration services operate.
Requirements
- 6+ years DevOps / Cloud Engineering experience
- 4+ years deep AWS architecture experience
- Strong Terraform expertise (modules, state management, multi-env)
- Deep knowledge of:
โ Lambda
โ API Gateway
โ Step Functions
โ S3
โ IAM
โ CloudWatch
โ VPC networking
- Experience designing secure multi-account AWS environments
- CI/CD pipeline architecture experience
- Infrastructure security hardening experience
- Experience deploying distributed production systems
- Experience with ECS or EKS
- Strong understanding of:
โ AI/ML production infrastructure
โ Data pipelines
โ RAG system infrastructure requirements
- Experience integrating AWS Bedrock or other enterprise LLM services
- Strong cost optimization mindset
- Excellent architecture documentation skills
Nice to Have
- Experience supporting LLM tool backends or MCP server hosting
- Experience with OpenSearch or vector database infrastructure
- Experience with QuickSuite or enterprise workflow platforms
- FinOps certification or formal cost governance experience
- SOC2 / HIPAA / enterprise compliance background
- Experience in regulated industries
Job responsibilities
- Design and own AWS cloud architecture (multi-account, prod/non-prod isolation)
- Build infrastructure using Terraform (modular, reusable, environment-aware)
- Architect and implement:
โ AWS Lambda services
โ API Gateway
โ Step Functions
โ Event-driven patterns (EventBridge, SQS, SNS)
- Design secure Bedrock integrations (private networking, IAM, VPC endpoints)
- Architect knowledge base storage (S3, OpenSearch, vector storage infra)
- Design networking topology (VPCs, private subnets, endpoints, routing)
- Implement CI/CD pipelines (GitHub Actions, GitLab CI, or similar)
- Implement containerized services (ECS or EKS where required)
- Establish observability stack (CloudWatch, tracing, structured logs)
- Implement secrets management (Secrets Manager, KMS)
- Define IAM least-privilege architecture
- Implement DevSecOps and secure SDLC standards
- Define SRE practices (SLAs, SLOs, monitoring, alerting)
- Implement cost governance and FinOps practices
- Support AI engineers in productionizing model services and tool backends
- Ensure system scalability, reliability, and auditability
Required languages
| English | B2 - Upper Intermediate |