Senior Platform Reliability Engineer

Client

Our client is a division of the global business and financial news and information company, It's a leading market index provider and is the owner and distributor of multiple financial services, a dynamic information network with data, news and analytics including cash, derivatives markets, money markets, government and municipal bonds, currencies, commodities, mortgages, indices, insurance, and legal information.

Position overview

We’re seeking a Senior Platform Reliability Engineer to keep our Kubernetes-centric provisioning and Linux estate running smoothly. You’ll coordinate fixes when OS builds or upgrades hit exceptions working across teams to find root causes from logs/metrics and recommend changes.

You’ll automate repeat work (Bash/Python), strengthen runbooks and observability, and document configurations and procedures. You’ll be partnering with hands-on engineers and architects in a highly technical, delivery-focused environment.

Responsibilities

Operate and improve a Kubernetes-centric, open-source platform across provisioning and maintenance workflows.
Coordinate resolution of exceptions in a multi-stage (≈10) provisioning pipeline; engage the right owners with clear, actionable context.
Build and maintain automation and runbooks (Bash/Python) to reduce toil and increase reliability.
Lead triage, log analysis, and root-cause investigation to minimize downtime.
Enhance observability (metrics/logs/traces) and promote SLO-oriented practices.
Operate and tune distributed data stores (e.g., Cassandra) and platform services.
Evolve OS/network provisioning (PXE boot, Subiquity, Foreman, imaging) and server management (BMCs, multi-NIC).
Partner with platform teams to improve automation, performance, security, and cost efficiency.
Document system configurations, procedures, and changes for repeatability.

Requirements

Strong Linux administration and troubleshooting (Ubuntu/Debian preferred).
Production experience with Kubernetes (or similar orchestrator).
Hands-on network/OS provisioning (PXE, Foreman, Subiquity, imaging) and server hardware management (BMCs, multiple NICs).
Proficiency in scripting (Bash, Python) for automation and diagnostics.
Ability to debug across the stack (infrastructure, workloads, automation, networks) and deliver RCA.
Experience with distributed databases (Cassandra or similar).
Familiarity with runbooks, incident management, and SRE/reliability practices.
Clear communicator and process facilitator: knows whom to engage, what signals to collect, and how to drive issues to closure.
CI/CD and IaC mindset (Git and pipelines; Terraform/Ansible a plus).

Nice to have

Observability stacks (Prometheus, Grafana, ELK/EFK, OpenTelemetry).
Workflow systems and retry logic (Argo Workflows, Jenkins).
Python for internal tooling (Go a plus).
Distributed systems fundamentals (consistency, replication, partition tolerance).
Experience operating Cassandra at scale.
Experience with Agile development methodologies.
Experience working with foreign clients.

Required skills experience

Kubernetes	5 years
Python	5 years
bash	5 years
SRE	4 years
CI/CD	4 years

Required languages

English

B2 - Upper Intermediate

Published 12 December

18 views

2 applications

100% read

100% responded

Last responded 2 hours ago

To apply for this and other jobs on Djinni login or signup.

Only from 5 years of experience
Full Remote
Ukraine
Countries where we consider candidates
English B2 - Upper Intermediate

DevOps

Kubernetes	5 years
Python	5 years
bash	5 years

+ 2 more

Employment: Fulltime
Domain: Fintech
Outsource

Apply for the job

Last responded 2 hours ago

100% read

100% responded

📊 $4000-6000 Average salary range of similar jobs in analytics →