Site Reliability Engineer to $5000
About Behavox
Behavox is a cloud-native AI company providing an integrated controls platform for global banks, asset managers, hedge funds, private equity firms, insurance businesses, and commodity firms. The platform unifies communications and trade surveillance, compliant archiving, policy management as well as front-office analytics on a single, AI-native technology stack, delivered as a globally scalable SaaS-based cloud service.
At Behavox, our engineering culture is built around speed, experimentation, and technical excellence, following agile principles and rapid iteration. We constantly test and adopt the latest cloud technologies and AI tooling, optimising for fast feedback loops and execution. We look for people who can move fast, challenge conventional wisdom, and who want to work at the frontier of modern AI, SaaS platforms, and distributed systems.
Behavox is a high-performance organisation with a strong bias toward delivery, ownership, and responsibility. We commit, and we execute. We are building systems that are complex, mission-critical, and global in scale; systems that many consider too large or too difficult.
To do that, we seek the smartest, most technically capable engineers and technologists who take end-to-end responsibility and want to win by building what others cannot.
Founded in 2014 and backed by SoftBank Vision Fund, Behavox is headquartered in London, with offices worldwide, including New York City, Montreal, Seattle, Singapore, and Tokyo.
About the Role
The Behavox Platform is a scalable, fault-tolerant and highly performant storage and processing system which allows us to manage and analyze massive volumes of data. We have an extensive and flexible set of APIs to develop products that allow our clients to work through millions of data items, by searching, filtering, and visualizing relationships between entities in the system.
As a Site Reliability Engineer you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product and Engineering teams to design and implement SRE practice at Behavox to build foundational infrastructure allowing to support the rapid growth of the Behavox client base.
This is an incredible opportunity to discover the world of high-load data processing and face the challenges of distributed Big Data systems. It will also provide you the opportunity to:
1. Work with high-load and business-critical services that will have a big impact on the company
2. Implement your ideas in an environment that strives for continuous improvement
3. Be part of a fast-growing, dynamic company and with modern technologies
More information about the tools and solutions used at Behavox can be found on our engineering blog https://blog.behavox.engineering
What You’ll Bring
- Linux mastery (5+ years). You understand how the kernel works, not just how to use it. You're comfortable with: systemd, strace, system calls, inodes, iptables/netfilter, namespace isolation, cgroups, process management, filesystem internals. You can debug a hanging process or network issue from first principles.
- Kubernetes in production (3+ years). Not just "I deployed a pod once." You've run production K8s clusters, debugged CNI issues, understood resource limits and QoS, troubleshot DNS problems, dealt with pod evictions, and know when to use StatefulSets vs Deployments vs DaemonSets.
- Production troubleshooting and incident leadership. You've been paged at 3 AM and fixed it. You've led incidents as a DRI or Incident Commander, not just participated as a responder. You know how to methodically isolate failures in distributed systems, write blameless postmortems, and improve systems based on lessons learned. You can read application logs, correlate metrics, check network connectivity, profile resource usage, and find root causes under time pressure.
- Python or Golang (hands-on, production experience). You've built real automation tools, not just scripts. You understand error handling, testing, logging, and writing maintainable code that other engineers will use and modify.
- Cloud platforms (GCP required, AWS is a plus). Real production experience with Google Cloud (Compute Engine, GKE, Cloud Storage, IAM, VPC networking) or AWS equivalents. You've designed cloud architecture, optimized costs, and debugged cloud-specific issues.
What You'll Do
- Be on-call and lead incident response. You'll carry the pager and act as Incident Commander or DRI during major outages. This means coordinating response teams, making decisive calls under pressure, running structured incident management (severity classification, communication, escalation, resolution, postmortems), and keeping stakeholders informed. You must know how to run an incident - not just fix technical problems.
- Deep troubleshooting. We believe in observability-first approaches with proper monitoring and metrics. But when observability doesn't give you the answer - when a Java service is leaking memory in Kubernetes, network packets are dropping mysteriously, or a production database is hitting inode limits - you need to be unafraid to go deeper. Grab strace, dive into kernel logs, check iptables rules, and analyze system calls. No handholding.
- Build real automation. Not bash one-liners. You'll write Python or Golang tools that solve complex operational problems - deployment automation, self-healing systems, capacity planning tools, incident response automation. Code that other engineers will depend on.
- Maintain high-load distributed systems. Our platform processes massive data volumes across GCP (primary) and AWS. You'll deploy, scale, monitor, and optimize these systems while keeping SLAs.
- Own the observability stack. Prometheus is your foundation. You'll design monitoring, write meaningful alerts (not alert spam), build dashboards that actually help during incidents, and implement quality control gates for AI services.
What We Offer & Expect
- The opportunity to work on a global, mission-critical AI platform alongside the best engineers and technologists across multiple geographies.
- A role with real ownership and impact, building complex systems at scale in an environment that values speed, experimentation, and technical excellence.
- A highly attractive benefits package, including competitive cash compensation, an equity award aligned with long-term value creation, and comprehensive health insurance for employees and their families.
- A modern, comfortable office in central Lviv, with an expectation of working from the office five (5) days per week, reflecting our belief in strong in-person collaboration, while remaining flexible to accommodate occasional personal circumstances that may require working from home.
- A generous time-off policy of 30 days annually, plus public holidays and sick leave, recognizing the importance of sustained high performance.
About Our Process
Our selection process is designed to rigorously assess a candidate’s depth of technical knowledge, problem-solving ability, and alignment with Behavox’s mission and core values.
As part of the process, candidates will first participate in a series of interviews focused on evaluating their technical expertise and engineering judgment. Candidates who successfully progress through these interviews will then be invited to complete a live technical exercise with a group of Behavox engineers and engineering managers.
The purpose of this live technical assessment is to validate the candidate’s stated technical competencies and assess their ability to solve complex problems with speed, accuracy, and sound engineering judgment. Note that whenever possible, we aim to conduct interviews in person at our offices.
We recognize and respect the time candidates invest in this process. In return, Behavox commits significant time and resources to ensure that those who join us have the capability, judgment, and alignment required to operate at the speed and level of complexity our work demands. We value efficiency and clarity on both sides; if at any point we determine that a candidate is not a fit, we reserve the right to immediately conclude the interview or the technical assessment.
Please note the following:
- A core objective of the process is to objectively assess individual knowledge and competencies. The use of AI tools or external assistance during live interviews or technical exercises is strictly prohibited (unless explicitly instructed otherwise) and will result in immediate disqualification.
- Interviews and technical sessions may be recorded for internal review to support fairness, consistency, and collaborative decision-making within the hiring team.
Required skills experience
| Kubernetes | 5 years |
| Python | 5 years |
| GCP (Google Cloud Platform) | 4 years |
| Terraform | 4 years |
| Jenkins | 4 years |
| GitLab | 4 years |
Required languages
| English | B2 - Upper Intermediate |