Production Reliability Engineer (relocate) Offline

Ukrainian Product πŸ‡ΊπŸ‡¦

We are the creators of a new fintech era!
Our mission is to change this world by making blockchain accessible to everyone in everyday life. WhiteBIT is a global team of over 1,200 professionals united by one mission β€” to shape the new world order in the Web3 era. Each of our employees is fully engaged in this transformative journey.

We work on our blockchain platform, providing maximum transparency and security for more than 8 million users worldwide. Our breakthrough solutions, incredible speed of adaptation to market challenges, and technological superiority are the strengths that take us beyond ordinary companies. Our official partners include the National Football Team of Ukraine, FC Barcelona, Lifecell, FACEIT and VISA.

We are seeking a highly skilled Production Reliability Engineer to join our core infrastructure team supporting ultra-low latency, mission-critical trading systems. This role involves direct ownership of production environments with a focus on performance, reliability, and real-time operability. You'll collaborate with developers, researchers, and platform engineers to continuously improve and support high-throughput, distributed systems deployed across global colocation sites.

The future of Web3 starts with you: join us as a Production Reliability Engineer!


Requirements

Education & Experience:
β€” Degree in Computer Science, Engineering, or a related technical field - or equivalent practical experience.
β€” 3+ years of hands-on experience in a Production Engineering, SRE, or DevOps role supporting real-time systems in high-availability environments.

Core Technical Skills:
β€” Deep knowledge of Linux internals: system performance tuning, kernel behavior, process scheduling, memory management, and I/O optimization.
β€” Strong networking fundamentals: TCP/IP, routing, multicast, VLANs, LLDP, Ethernet, and experience with low-latency switches (e.g., Arista, Mellanox).
β€” Proficiency in Python and Shell scripting; experience debugging or extending Java/C applications is a plus.
β€” Experience with configuration management tools (Ansible, Puppet, Chef) and observability stacks ( Prometheus, Grafana, ELK, etc.).
β€” Ability to analyze latency bottlenecks, from NIC to application, including familiarity with DPDK, kernel bypass, or FPGA-assisted networks.
β€” Clear communication skills and the ability to collaborate with diverse teams including developers, quants, traders, and risk.
β€” Strong sense of ownership, urgency, and accountability.
β€” Willingness to participate in shared on-call rotation and incident response.

Problem-Solving & Analytical Skills:
β€” Strong analytical and problem-solving skills, with a commitment to testing and quality assurance.
β€” Track record of independently solving complex technical challenges with real industry impact.
β€” Experience with continuous integration, and deployment processes.

Mindset & Soft Skills:
β€” Passion for technology, problem-solving, and continuous learning.
β€” Intellectual curiosity and a strong drive to grow within the quantitative finance industry.
β€” Team player, with strong communication skills and a collaborative attitude.
β€” Reliable and predictable availability to ensure smooth operation of production trading systems.


Responsibilities

β€” Own and optimize the production trading infrastructure, focusing on reliability, uptime, and system scalability.
β€” Proactively monitor, troubleshoot, and tune large-scale, low-latency trading systems and exchange connectivity across multiple asset classes and markets.
β€” Design, build, and maintain a comprehensive DevOps toolkit: configuration management, process orchestration, deployment automation, metrics, logging, alerting, and visualization.
β€” Continuously improve latency, throughput, and fault tolerance using quantitative performance metrics.
β€” Partner with trading and quant teams to analyze complex system behaviors, investigate anomalies, and resolve incidents in real-time.
β€” Collaborate with network and systems engineering teams to optimize hardware performance, including kernel tuning, BIOS configs, and CPU isolation strategies.
β€” Drive the operational readiness of all platform changes: risk-assess deployments, implement rollout procedures, and maintain rollback strategies.


Work conditions

Immerse yourself in Crypto & Web3:
β€” Master cutting-edge technologies and become an expert in the most innovative industry.
Work with the Fintech of the Future:
β€” Develop your skills in digital finance and shape the global market.
Take Your Professionalism to the Next Level:
β€” Gain unique experience and be part of global transformations.
Drive Innovations:
β€” Influence the industry and contribute to groundbreaking solutions.
Join a Strong Team:
β€” Collaborate with top experts worldwide and grow alongside the best.
Work-Life Balance & Well-being:
β€” Modern equipment.
β€” Comfortable working conditions, and an inspiring environment to help you thrive.
β€” 30 calendar days of paid leave.
β€” Additional days off for national holidays.

With us, you’ll dive into the world of unique blockchain technologies, reshape the crypto landscape, and become an innovator in your field. If you’re ready to take on challenges and join our dynamic team, apply now and start a new chapter in your career!
Let’s Build the Future Together!

WhiteBIT offers all candidates an equal opportunity to join the team. All hiring decisions are made without regard to race, national origin, gender identity or sexual orientation, age, religion, disability, medical condition, marital status, familial status, veteran status, or any other legally protected characteristic of an individual.

The job ad is no longer active

Look at the current jobs DevOps β†’

Loading...