Senior Data Engineer

$$$

A leading organization in the media streaming industry focuses on delivering innovative solutions that enhance user engagement and content accessibility. The company caters to a diverse audience, including individual consumers and content creators, providing them with advanced tools for content production and distribution. With a strong emphasis on technology, the organization leverages cutting-edge AI and machine learning capabilities to optimize user experiences and operational efficiency. Their commitment to continuous improvement and adaptation positions them as a significant player in the evolving media landscape.

We are looking for an experienced Data Engineer to participate in desing and architectute of client's data platform and GenAI initiatives. The primary focus of this role is building scalable distributed data processing systems that will serve as the foundation for the company's intelligent services.



Job Description

Required:

  • Bachelor’s/Master’s degree in Computer Science, Computer Engineering, Machine Learning, or a related field.
  • 4+ years of experience in software development, focusing on Python programming and data engineering.
  • Strong expertise in Data Lake architecture and modern data warehousing.
  • Hands-on experience with distributed computing systems (Apache Spark is a must).
  • Strong Python proficiency for building complex pipelines and internal tooling.
  • Good understanding of distributed systems principles, real-time data processing, and batch processing.
  • Experience in scaling data infrastructure (AWS/GCP cloud solutions).
  • Proven experience in designing and scaling vector databases (e.g. Pinecone, Milvus, Weaviate, pgvector etc.), experience in developing recommendation systems and retrieval-augmented generation (RAG) pipelines.
  • Ability to independently drive technical discussions, make informed architectural assumptions, and justify technology choices.
  • Ability to decompose complex "from scratch" tasks without detailed specifications.

Nice to have:

  • Intermediate knowledge of Airflow for workflow management in data processing.
  • Generative AI: Experience with vector databases, RAG architectures, or integrating LLMs into data pipelines.
  • Experience designing and building middleware platforms, REST APIs, and distributed systems at scale.
  • Web Development: Basic API development skills to provide data access to other services.

Job Responsibilities

  • Data Platform: Designing and maintaining high-load Spark pipelines for processing terabytes of data.
  • GenAI Support: Building data infrastructure for training and operating generative models (ingestion, cleaning, vectorization).
  • Architectural Contribution: Active participation in technical brainstorming, developing data quality standards, and processing protocols.
  • Optimization: Identifying and resolving bottlenecks in current distributed processing systems.

Required languages

English B1 - Intermediate
Published 7 April
22 views
·
0 applications
Response activity: Very high
Last responded yesterday
To apply for this and other jobs on Djinni login or signup.
Loading...