Senior AWS DevOps Engineer
About Us
OnlyMonster provides Advanced Management Software for growing Content Selling Businesses. We’re seeking a talented Senior AWS DevOps Engineer to join our innovative team.
Job Description:
We are looking for an experienced Senior DevOps Engineer to design, implement, and maintain our cloud infrastructure. The ideal candidate will ensure the reliability and performance of our platform, optimize resource usage, and support our AI/ML teams under diverse operational conditions.
Key Responsibilities:
- Develop and maintain cloud infrastructure using Infrastructure as Code (IaC) (Terraform) and Configuration Management tools (Ansible).
- Collaborate closely with AI/ML teams to deploy and maintain models on GPU-accelerated infrastructure (Runpod, Tensordock).
- Optimize CI/CD pipelines for efficient deployment processes.
- Implement monitoring, alerting, and observability solutions (Grafana, Prometheus, centralized logging).
- Ensure system resilience during varying traffic loads and external dependencies.
- Collaborate with international teams in the US and Europe
Required Skills and Experience:
- Proven experience in DevOps or Site Reliability Engineering (SRE) roles
- Strong Linux background: Deep understanding of OS internals, SSH, networking, and troubleshooting.
- Proficiency in Ansible: Proven experience writing roles and playbooks for configuration management.
- Strong background in cloud services (AWS) and experience with other providers (e.g., Hetzner, GPU clouds).
- Proficiency in containerization (Docker) and orchestration (Kubernetes).
- Experience with IaC tools (Terraform).
- Programming skills in Python (preferred for AI ops) and Bash.
- Experience with version control systems (specifically Git/GitHub).
- Experience with PostgreSQL and other databases.
Nice-to-Have (Big Plus):
- MLOps Experience: Experience deploying and managing AI models, working with CUDA/GPU environments.
- Experience with GPU cloud providers (Runpod, Tensordock).
- FinOps / Cloud Cost Optimization experience.
- Knowledge of high availability system design.
Technologies We Use:
AWS, Hetzner, Runpod/Tensordock, Docker, Kubernetes, Terraform, Ansible, Linux, GitHub, ArgoCD, Helm, JavaScript, Python, Bash.
What We Offer:
- Opportunity to work on cutting-edge cloud and AI technologies.
- Collaborative, distributed team environment.
- Challenging projects that impact the growth of content selling businesses.
Required languages
| Ukrainian | Native |