Senior Data Engineer (US-Based Product ,Real-Time Data Platform)
About the Product
We are building a US-based, data-driven product with a strong focus on scalability, performance, and cost efficiency.
Our mission is to design a modern data platform that transforms raw behavioral and monetization data into reliable, actionable business insights — in near real-time.
For us, data engineering is not just about moving data.
It’s about:
- Designing resilient architecture
- Optimizing for performance and cost
- Building reliable automation
- Ensuring architectural integrity at scale
Role Overview
We are looking for a Senior Data Engineer who will take ownership of the data platform architecture and drive technical excellence across ingestion, modeling, and performance optimization.
This role requires deep expertise in SQL, Python, AWS infrastructure, and modern data stack principles. You will not only build pipelines — you will define standards, lead architectural decisions, and proactively improve system efficiency.
You will play a critical role in ensuring that data flows seamlessly from event streams to business-ready datasets while maintaining high performance, reliability, and cost control.
What Makes This Role Senior-Level
As a Senior Data Engineer, you will:
- Own architectural decisions for the data platform
- Identify scalability bottlenecks before they become incidents
- Optimize data infrastructure for performance and cost
- Lead technical code reviews and set engineering standards
- Mentor mid-level engineers
- Act as a technical partner to Product and Analytics stakeholders
- Balance real-time and batch processing strategies strategically
Technical Requirements
Must-Have
Expert-Level SQL
- Complex analytical queries and window functions
- Query optimization and execution plan analysis
- Identifying and eliminating performance bottlenecks
- Reducing query complexity and compute costs
- Designing partitioning and clustering strategies
Python
- Advanced data manipulation
- Building scalable ETL/ELT frameworks
- Writing production-grade data services
- Automation and monitoring scripts
AWS Core Infrastructure
- AWS Kinesis Firehose (near-real-time data streaming)
- Amazon S3 (data lake architecture and storage optimization)
- Designing reliable ingestion layers
Version Control
- Git (GitHub / GitLab)
- Branching strategies
- Leading technical code reviews
- Enforcing best practices in code quality
Nice-to-Have
Modern Data Stack
- dbt (modular SQL modeling, documentation, testing)
- Experience structuring layered data models (staging → intermediate → marts)
Data Warehousing
- Google BigQuery
- Slot management
- Cost-efficient querying
- Storage and compute optimization
Advanced Optimization Techniques
- Partitioning
- Clustering
- Bucketing
- Storage layout optimization
Integrations & Infrastructure
- Salesforce data integration
- Docker / ECS
- CI/CD for data workflows
AI / ML Exposure
- Supporting feature pipelines
- Understanding data requirements for ML systems
Key Responsibilities
Data Platform Architecture
- Design and maintain a scalable real-time and batch data platform
- Architect ingestion pipelines using AWS Kinesis and Python
- Ensure high availability and reliability of data flows
Real-Time Processing
- Enable near-real-time (seconds–minutes latency) data processing
- Build systems for operational alerting and anomaly detection
- Ensure early detection of monetization and traffic issues
Data Modeling
- Transform raw event data into business-ready datasets using dbt
- Design scalable, maintainable schemas aligned with product evolution
Performance & Cost Engineering
- Optimize SQL queries and storage structures
- Design cost-efficient partitioning strategies
- Monitor and reduce warehouse and infrastructure costs
- Balance real-time and batch processing appropriately
Engineering Excellence
- Lead and participate in code reviews
- Enforce high standards of performance, security, and maintainability
- Improve observability and monitoring across pipelines
Cross-Functional Collaboration
- Work closely with Data Analysts and Product Managers
- Translate business requirements into scalable technical solutions
- Clearly communicate trade-offs between speed, cost, and complexity
Type of Data We Process
- User behavior events (page views, clicks, searches, conversions)
- Ad & monetization events (impressions, clicks, CTR, attribution)
- System and integration logs (latency, errors, rate limits)
Why Real-Time Is Critical
- Detect broken ads or impression drops before revenue is lost
- Identify traffic anomalies or abuse early
- Enable same-day operational intervention
- Prevent negative user and advertiser experience
Near-real-time (seconds to minutes latency) is required for operational awareness.
Batch processing remains important for historical analysis and reporting — but not for incident detection.
Working Schedule
- Monday – Friday
- 16:00 – 00:00 Kyiv time
- Full alignment with US-based stakeholders
What We Value
- Strong ownership mindset
- Strategic thinking about architecture
- Focus on scalability, reliability, and cost efficiency
- Proactive problem-solving
- Clear communication with both technical and non-technical teams
- Ability to think beyond “just making it work”
Required languages
| English | B2 - Upper Intermediate |