Principal Platform Reliability Engineer Offline

About The Customer

Brightgrove is partnering with global mass media and entertainment conglomerate from the USA. They are recognized for their cutting-edge products and services, creating entertainment experiences that drive conversation and culture worldwide.

Through television, film, digital media, live events, merchandise, and software solutions, they connect with diverse, young, and young-at-heart audiences in more than 180 countries.

About the Project

We’re looking for a world-class senior engineer with strong computer science fundamentals who can quickly pick up new technologies and a true passion for working on a fast-growing, consumer-facing product that reaches more than 79 million active users.

The project is free online television broadcasting over 250 channels and 150,000+ unique hours worth of programming.

The platform is available on all mobile, web, and connected TV streaming devices.

Technical stack related but now limited to Microservices, AWS, Typescript/JavaScript, React with hooks, Redux and Saga, Node.js, Golang, MongoDB.

About The Team

At Brightgrove, we work with a diverse and talented team of engineers and quality assurance specialists worldwide. Our team members work remotely or from our offices in Ukraine, Romania, Poland, and Costa Rica.

The client team consists of more than 300 IT specialists, and as a team member, you'll have the opportunity to collaborate closely with them. Our Agile working environment ensures we use modern tools and technologies to build and deploy code quickly and efficiently. We expect you to work closely with product owners, QA, and backend teams to deliver the best-in-class user experience and entertainment to millions of users worldwide.

Responsibilities

This critical Platform Reliability Engineering role in our team includes responsibilities for system development and cloud resource management, configuration management, troubleshooting, preventative and corrective maintenance, performance monitoring, and enhancement for our cloud / hosted large-scale consumer video service.

- Analyze and improve system design to reduce failure modes and promote self-healing systems

- Develop reliability tools and frameworks for use by all engineers

- Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability

- Ability to work both independently as well as part of a geographically dispersed yet integrated team

- Ability to balance multiple priorities in a fast-paced environment, demonstrable experience supporting large-scale projects

- Ability to identify measures or indicators of application performance and the actions needed to improve or correct application performance

- Ability to deal with ambiguity, uncertainty, and incomplete information when evaluating alternatives and making recommendations

- Ability to work seamlessly within a team as well as manage individual tasks

- Build and maintain observability pipelines & resources - logging, monitoring, distributed tracing, alerting and offline test tools needed

- Respond to emerging incidents, solve critical issues, and follow through with a plan for resolution or future mitigation

- Act as an SME on the Engineering Operations team, partnering with backend services teams and application teams to overcome challenges across all the platforms where we stream our service

Candidate Requirements

To be considered for the Principal Platform Reliability Engineer position, you should meet the following requirements:

- 7+ years experience in software development

- Degree in Computer Science or related field, or equivalent work experience

- Experience building service-oriented APIs and cloud services, preferably on AWS

- Experience writing for and deploying to AWS using microservices

- Proven track record of engineering and coding excellence, with solid data structure knowledge and the ability to write high-performance production-quality code

- Good programming skills in one language such as Golang or Javascript, and ability to quickly learn new ones.

- Experience in the Linux environment and a good understanding of its fundamentals and internals, including filesystems, modern memory management, threads and processes, and the user/kernel-space divide

- Understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems

- Grit, drive, and a deep sense of ownership

- Working knowledge of the TCP/IP stack, internet routing, and load balancing

- Nice-to-have experience with Datadog

The job ad is no longer active
Job unpublished on 9 April 2023

Look at the current jobs Tech Leadership Kyiv→

from Upper-Intermediate

Considering with Pre-Intermediate
Only from 5 years of experience
Office or Remote
EU
Countries where we consider candidates

Tech Leadership
SRE

Domain: Media
Outstaff
Office: Costa Rica, Poland, Romania, Ukraine (Kyiv, Lviv, Kharkiv)

Apply for the job

📊 Average salary range of similar jobs in analytics →