Senior Site Reliability Engineer
We are looking for a Senior Site Reliability Engineer (SRE) with strong infrastructure experience to help ensure platform stability and optimize back-end systems in Python. You will play a key role in keeping their SMS marketing platform fast, reliable, and scalable. This is a highly technical position at the intersection of backend engineering and infrastructure. You’ll be working hands-on with Python/Flask application, Linux servers, and networking stack to make sure millions of SMS messages are delivered without delay or downtime.
This is a Full-Time remote role.
We are looking for a Senior Site Reliability Engineer specifically with these requirements:
- 5+ years of experience as a Site Reliability Engineer, System Engineer, Infrastructure Engineer, Platform Engineer, Backend Systems Engineer, or similar role, ideally as a Python Developer.
- Experience running and maintaining Python/Flask applications in production.
- Advanced Python development skills, particularly with Python libraries/frameworks.
- In-depth knowledge of Linux server administration (Debian/Ubuntu).
- Proficiency with network analysis tools: intercepting proxies, packet captures (Wireshark, mitmproxy, tcpdump, etc.).
- Familiarity with distributed systems, scaling strategies, and performance tuning.
- Strong understanding of monitoring and logging systems (e.g., Prometheus, Grafana, ELK, Datadog).
- Experience with version control (Git) and CI/CD workflows.
- Comfort with automation tools and scripting for infrastructure management.
- Excellent troubleshooting and analytical skills.
- Strong sense of ownership and accountability for uptime, stability, and performance.
Your responsibility will include (but not limited to):
- Maintain and optimize infrastructure: Manage Linux-based (Debian/Ubuntu) servers running Python/Flask applications, ensuring stability and performance.
- Ensure high uptime: Continuously monitor system health and proactively address bottlenecks or weak points to maximize reliability of SMS send-outs.
- Troubleshoot complex issues: Use intercepting proxies, packet captures, and diagnostic tools to identify, analyze, and resolve traffic or delivery issues.
- Optimize backend workflows: Work with Python/Flask async frameworks to streamline message queuing, delivery, and scaling mechanisms.
- Implement monitoring and alerting: Set up dashboards, logs, and alerts that provide visibility into system health and performance.
- Automate infrastructure tasks: Build tools/scripts to reduce manual work and ensure consistency in deployments and optimizations.
- Own decision-making: Take initiative in addressing infrastructure needs and make competent technical decisions without requiring constant supervision.
Growth Opportunities/Perks:
- Endless growth opportunities as they’re in a scale-up phase.
- Potential to move into a more elaborate R&D or leadership role.
- Flexible working schedule as long as deadlines and quality are met.
- Work alongside highly skilled developers in a unique and challenging industry.
- Performance bonuses as the company grows.
- Fully remote setup.
This Position Is Perfect For You If…
You’re a fast learner.
You won’t be expected to know everything from the start, but you’ll need to be motivated and quick to learn new tools, technologies, and patterns in a complex infrastructure environment.
You’re detail-oriented.
You notice flaws in systems before they become problems, and you enjoy digging into logs, metrics, or packet captures until you find the root cause.
You’re reliable under pressure.
When systems break, you don’t panic — you troubleshoot calmly, take action, and make the right call to stabilize the platform.
Required languages
| English | C1 - Advanced |