Are you an experienced Data Engineer ready to tackle complex, high-load, and data-intensive systems? We are looking for a Senior professional to join our team in Ukraine, Europe, working full-time on a project that will make a real impact in the public sector.
At Sigma Software, we specialize in delivering innovative solutions for enterprise clients and public organizations. In this role, you will contribute to building an integrated platform that collects, processes, and visualizes critical indicators, enabling better decision-making and analytics.
Why join us? You will work with a modern big data stack, have end-to-end involvement from ingestion to machine learning workflows, and be part of a professional team that values ownership, collaboration, and continuous improvement.
Project
You will be involved in developing an integrated platform that processes both batch and streaming data, ensures secure and governed data environments, and supports advanced analytics and machine learning workflows. The solution will leverage modern big data technologies to provide actionable insights for the public sector.
Responsibilities
- Design and implement data ingestion pipelines for batch and streaming data
- Configure and maintain data orchestration workflows (Airflow, NiFi) and CI/CD automation for data processes
- Design and organize data layers within Data Lake architecture (HDFS, Iceberg, S3)
- Build and maintain secure and governed data environments using Apache Ranger, Atlas, and SDX
- Develop SQL queries and optimize performance for analytical workloads in Hive/Impala
- Collaborate on data modeling for analytics and BI, ensuring clean schemas and dimensional models
- Support machine learning workflows using Spark MLlib or Cloudera Machine Learning (CML)
Requirements
- Proven experience in building and maintaining large-scale data pipelines (batch and streaming)
- Strong knowledge of data engineering fundamentals: ETL/ELT, data governance, data warehousing, Medallion architecture
- Strong SQL skills for Data Warehouse data serving
- Minimum 3 years of experience in Python or Scala for data processing
- Hands-on experience with Apache Spark, Kafka, Airflow, and distributed systems optimization
- Experience with Apache Ranger and Atlas for security and metadata management
- Upper-Intermediate English proficiency
Will be a plus
- Experience with Cloudera Data Platform (CDP)
- Advanced SQL skills and Hive/Impala query optimization
- BS in Computer Science or related field
- Exposure to ML frameworks and predictive modeling
Personal profile
- Ownership mindset and proactive approach
- Ability to drive initiatives forward and suggest improvements
- Team player with shared responsibility for delivery speed, efficiency, and quality
- Excellent written and verbal communication skills
Π’ΠΈ Π΄ΠΎΡΠ²ΡΠ΄ΡΠ΅Π½ΠΈΠΉ Data Engineer Ρ ΠΏΡΠ°Π³Π½Π΅Ρ ΡΠ΅Π°Π»ΡΠ·ΠΎΠ²ΡΠ²Π°ΡΠΈ ΡΠΊΠ»Π°Π΄Π½Ρ, Π²ΠΈΡΠΎΠΊΠΎΠ½Π°Π²Π°Π½ΡΠ°ΠΆΠ΅Π½Ρ ΡΠ° ΠΎΡΡΡΠ½ΡΠΎΠ²Π°Π½Ρ Π½Π° Π΄Π°Π½Ρ ΡΡΡΠ΅Π½Π½Ρ? ΠΠΈ ΡΡΠΊΠ°ΡΠΌΠΎ Senior ΡΠΏΠ΅ΡΡΠ°Π»ΡΡΡΠ°, ΡΠΊΠΈΠΉ ΠΏΡΠΈΡΠ΄Π½Π°ΡΡΡΡΡ Π΄ΠΎ Π½Π°ΡΠΎΡ ΠΊΠΎΠΌΠ°Π½Π΄ΠΈ ΡΠ° Π΄ΠΎΠΏΠΎΠΌΠΎΠΆΠ΅ ΡΡΠ²ΠΎΡΠΈΡΠΈ ΠΏΡΠΎΠ΄ΡΠΊΡ, ΡΠΎ ΠΌΠ°ΡΠΈΠΌΠ΅ Π·Π½Π°ΡΠ½ΠΈΠΉ Π²ΠΏΠ»ΠΈΠ² Π½Π° Π΄Π΅ΡΠΆΠ°Π²Π½ΠΈΠΉ ΡΠ΅ΠΊΡΠΎΡ.
Π£ Sigma Software ΠΌΠΈ ΡΠΎΠ·ΡΠΎΠ±Π»ΡΡΠΌΠΎ ΡΡΡΠ°ΡΠ½Ρ ΡΠ° ΡΠ½Π½ΠΎΠ²Π°ΡΡΠΉΠ½Ρ ΡΡΡΠ΅Π½Π½Ρ Π΄Π»Ρ ΠΊΠΎΡΠΏΠΎΡΠ°ΡΠΈΠ²Π½ΠΈΡ
ΠΊΠ»ΡΡΠ½ΡΡΠ² Ρ Π΄Π΅ΡΠΆΠ°Π²Π½ΠΈΡ
ΡΡΡΠ°Π½ΠΎΠ². Π£ ΡΡΠΉ ΡΠΎΠ»Ρ ΡΠΈ ΡΡΠ°Π½Π΅Ρ ΡΠ°ΡΡΠΈΠ½ΠΎΡ ΠΊΠΎΠΌΠ°Π½Π΄ΠΈ, ΡΠΊΠ° Π±ΡΠ΄ΡΡ ΡΠ½ΡΠ΅Π³ΡΠΎΠ²Π°Π½Ρ ΠΏΠ»Π°ΡΡΠΎΡΠΌΡ Π΄Π»Ρ Π·Π±ΠΎΡΡ, ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ ΡΠ° Π²ΡΠ·ΡΠ°Π»ΡΠ·Π°ΡΡΡ ΠΊΠ»ΡΡΠΎΠ²ΠΈΡ
ΠΏΠΎΠΊΠ°Π·Π½ΠΈΠΊΡΠ², ΡΠΎΠ± Π·Π°Π±Π΅Π·ΠΏΠ΅ΡΠΈΡΠΈ ΡΠΊΡΡΠ½Ρ Π°Π½Π°Π»ΡΡΠΈΠΊΡ ΡΠ° ΠΏΡΠ΄ΡΡΠΈΠΌΠ°ΡΠΈ ΠΏΡΠΈΠΉΠ½ΡΡΡΡ ΡΡΡΠ΅Π½Ρ.
Π§ΠΎΠΌΡ Π²Π°ΡΡΠΎ ΠΏΡΠΈΡΠ΄Π½Π°ΡΠΈΡΡ? Π’ΠΈ ΠΏΡΠ°ΡΡΠ²Π°ΡΠΈΠΌΠ΅Ρ ΡΠ· ΠΏΠ΅ΡΠ΅Π΄ΠΎΠ²ΠΈΠΌ ΡΡΠ΅ΠΊΠΎΠΌ big data, ΠΌΠ°ΡΠΈΠΌΠ΅Ρ ΠΌΠΎΠΆΠ»ΠΈΠ²ΡΡΡΡ Π²ΠΏΠ»ΠΈΠ²Π°ΡΠΈ Π½Π° Π²ΡΡ Π΅ΡΠ°ΠΏΠΈ ΡΠΎΠ·ΡΠΎΠ±ΠΊΠΈ β Π²ΡΠ΄ ingestion Π΄ΠΎ machine learning, Ρ Π±ΡΠ΄Π΅Ρ Ρ ΠΊΠΎΠΌΠ°Π½Π΄Ρ, Π΄Π΅ ΡΡΠ½ΡΡΡΡ ΡΠ½ΡΡΡΠ°ΡΠΈΠ²Π½ΡΡΡΡ, ΠΊΠΎΠΌΠ°Π½Π΄Π½Ρ ΡΠΎΠ±ΠΎΡΡ ΡΠ° ΠΏΠΎΡΡΡΠΉΠ½ΠΈΠΉ ΡΠΎΠ·Π²ΠΈΡΠΎΠΊ.
ΠΡΠΎΠ΅ΠΊΡ
Π’ΠΈ Π±ΡΠ°ΡΠΈΠΌΠ΅Ρ ΡΡΠ°ΡΡΡ Ρ ΡΡΠ²ΠΎΡΠ΅Π½Π½Ρ ΡΠ½ΡΠ΅Π³ΡΠΎΠ²Π°Π½ΠΎΡ ΠΏΠ»Π°ΡΡΠΎΡΠΌΠΈ, ΡΠΊΠ° ΠΏΡΠ°ΡΡΡ ΡΠΊ Π· batch, ΡΠ°ΠΊ Ρ Π· streaming Π΄Π°Π½ΠΈΠΌΠΈ, Π·Π°Π±Π΅Π·ΠΏΠ΅ΡΡΡ Π·Π°Ρ
ΠΈΡΠ΅Π½Π΅ ΡΠ° ΠΊΠ΅ΡΠΎΠ²Π°Π½Π΅ ΡΠ΅ΡΠ΅Π΄ΠΎΠ²ΠΈΡΠ΅ Π΄Π°Π½ΠΈΡ
, Π° ΡΠ°ΠΊΠΎΠΆ ΠΏΡΠ΄ΡΡΠΈΠΌΡΡ ΡΠΎΠ·ΡΠΈΡΠ΅Π½Ρ Π°Π½Π°Π»ΡΡΠΈΠΊΡ ΡΠ° machine learning ΠΏΡΠΎΡΠ΅ΡΠΈ. Π ΡΡΠ΅Π½Π½Ρ Π±Π°Π·ΡΠ²Π°ΡΠΈΠΌΠ΅ΡΡΡΡ Π½Π° ΡΡΡΠ°ΡΠ½ΠΈΡ
big data ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΡΡΡ
, ΡΠΎΠ± Π½Π°Π΄Π°Π²Π°ΡΠΈ ΠΊΠΎΡΠΈΡΠ½Ρ ΡΠ½ΡΠ°ΠΉΡΠΈ Π΄Π»Ρ Π΄Π΅ΡΠΆΠ°Π²Π½ΠΎΠ³ΠΎ ΡΠ΅ΠΊΡΠΎΡΡ.
ΠΠ±ΠΎΠ²βΡΠ·ΠΊΠΈ
- Π ΠΎΠ·ΡΠΎΠ±Π»ΡΡΠΈ ΡΠ° Π²ΠΏΡΠΎΠ²Π°Π΄ΠΆΡΠ²Π°ΡΠΈ data ingestion pipelines Π΄Π»Ρ batch ΡΠ° streaming Π΄Π°Π½ΠΈΡ
- ΠΠ°Π»Π°ΡΡΠΎΠ²ΡΠ²Π°ΡΠΈ ΡΠ° ΠΏΡΠ΄ΡΡΠΈΠΌΡΠ²Π°ΡΠΈ workflows Π΄Π»Ρ ΠΎΡΠΊΠ΅ΡΡΡΠ°ΡΡΡ Π΄Π°Π½ΠΈΡ
(Airflow, NiFi) ΡΠ° CI/CD Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·Π°ΡΡΡ
- ΠΡΠΎΡΠΊΡΡΠ²Π°ΡΠΈ ΡΠ° ΠΎΡΠ³Π°Π½ΡΠ·ΠΎΠ²ΡΠ²Π°ΡΠΈ ΡΠ°ΡΠΈ Π΄Π°Π½ΠΈΡ
Ρ Data Lake Π°ΡΡ
ΡΡΠ΅ΠΊΡΡΡΡ (HDFS, Iceberg, S3)
- Π‘ΡΠ²ΠΎΡΡΠ²Π°ΡΠΈ ΡΠ° ΠΏΡΠ΄ΡΡΠΈΠΌΡΠ²Π°ΡΠΈ Π±Π΅Π·ΠΏΠ΅ΡΠ½Π΅ ΡΠ° ΠΊΠ΅ΡΠΎΠ²Π°Π½Π΅ ΡΠ΅ΡΠ΅Π΄ΠΎΠ²ΠΈΡΠ΅ Π΄Π°Π½ΠΈΡ
Π·Π° Π΄ΠΎΠΏΠΎΠΌΠΎΠ³ΠΎΡ Apache Ranger, Atlas ΡΠ° SDX
- ΠΠΈΡΠ°ΡΠΈ SQL-Π·Π°ΠΏΠΈΡΠΈ ΡΠ° ΠΎΠΏΡΠΈΠΌΡΠ·ΡΠ²Π°ΡΠΈ ΡΡ
ΠΏΡΠΎΠ΄ΡΠΊΡΠΈΠ²Π½ΡΡΡΡ Π΄Π»Ρ Π°Π½Π°Π»ΡΡΠΈΡΠ½ΠΈΡ
Π½Π°Π²Π°Π½ΡΠ°ΠΆΠ΅Π½Ρ Ρ Hive/Impala
- ΠΡΠ°ΡΠΈ ΡΡΠ°ΡΡΡ Ρ ΠΌΠΎΠ΄Π΅Π»ΡΠ²Π°Π½Π½Ρ Π΄Π°Π½ΠΈΡ
Π΄Π»Ρ Π°Π½Π°Π»ΡΡΠΈΠΊΠΈ ΡΠ° BI, ΡΡΠ²ΠΎΡΡΡΡΠΈ ΡΠΈΡΡΡ ΡΡ
Π΅ΠΌΠΈ ΡΠ° Π±Π°Π³Π°ΡΠΎΠ²ΠΈΠΌΡΡΠ½Ρ ΠΌΠΎΠ΄Π΅Π»Ρ
- ΠΡΠ΄ΡΡΠΈΠΌΡΠ²Π°ΡΠΈ machine learning workflows, Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΡΡΠΈ Spark MLlib Π°Π±ΠΎ Cloudera Machine Learning (CML)
ΠΠΈΠΌΠΎΠ³ΠΈ
- ΠΠΎΡΠ²ΡΠ΄ ΡΡΠ²ΠΎΡΠ΅Π½Π½Ρ ΡΠ° ΠΏΡΠ΄ΡΡΠΈΠΌΠΊΠΈ ΠΌΠ°ΡΡΡΠ°Π±Π½ΠΈΡ
ΠΊΠΎΠ½Π²Π΅ΡΡΡΠ² Π΄Π°Π½ΠΈΡ
(batch ΡΠ° streaming)
- ΠΠ»ΠΈΠ±ΠΎΠΊΡ Π·Π½Π°Π½Π½Ρ ΠΎΡΠ½ΠΎΠ² data engineering: ETL/ELT, data governance, data warehousing, Medallion architecture
- ΠΠΏΠ΅Π²Π½Π΅Π½Π΅ Π²ΠΎΠ»ΠΎΠ΄ΡΠ½Π½Ρ SQL Π΄Π»Ρ ΡΠΎΠ±ΠΎΡΠΈ Π· Data Warehouse
- ΠΡΠ½ΡΠΌΡΠΌ 3 ΡΠΎΠΊΠΈ Π΄ΠΎΡΠ²ΡΠ΄Ρ ΡΠΎΠ±ΠΎΡΠΈ Π· Python Π°Π±ΠΎ Scala Π΄Π»Ρ ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ ΡΠ° ΡΡΠ°Π½ΡΡΠΎΡΠΌΠ°ΡΡΡ Π΄Π°Π½ΠΈΡ
- ΠΡΠ°ΠΊΡΠΈΡΠ½ΠΈΠΉ Π΄ΠΎΡΠ²ΡΠ΄ ΡΠΎΠ±ΠΎΡΠΈ Π· Apache Spark, Kafka, Airflow ΡΠ° ΠΎΠΏΡΠΈΠΌΡΠ·Π°ΡΡΡΡ ΡΠΎΠ·ΠΏΠΎΠ΄ΡΠ»Π΅Π½ΠΈΡ
ΡΠΈΡΡΠ΅ΠΌ
- ΠΠΎΡΠ²ΡΠ΄ Π²ΠΏΡΠΎΠ²Π°Π΄ΠΆΠ΅Π½Π½Ρ ΠΏΡΠ°ΠΊΡΠΈΠΊ Π±Π΅Π·ΠΏΠ΅ΠΊΠΈ ΡΠ° ΡΠΏΡΠ°Π²Π»ΡΠ½Π½Ρ ΠΌΠ΅ΡΠ°Π΄Π°Π½ΠΈΠΌΠΈ (Apache Ranger, Atlas)
- Π ΡΠ²Π΅Π½Ρ Π°Π½Π³Π»ΡΠΉΡΡΠΊΠΎΡ - Upper-Intermediate
ΠΡΠ΄Π΅ ΠΏΠ»ΡΡΠΎΠΌ
- ΠΠΎΡΠ²ΡΠ΄ ΡΠΎΠ±ΠΎΡΠΈ Π· Cloudera Data Platform (CDP)
- ΠΠΎΠ³Π»ΠΈΠ±Π»Π΅Π½Ρ Π½Π°Π²ΠΈΡΠΊΠΈ SQL ΡΠ° ΠΎΠΏΡΠΈΠΌΡΠ·Π°ΡΡΡ Π·Π°ΠΏΠΈΡΡΠ² Ρ Hive/Impala
- Π‘ΡΡΠΏΡΠ½Ρ Π±Π°ΠΊΠ°Π»Π°Π²ΡΠ° Π² Π³Π°Π»ΡΠ·Ρ Computer Science Π°Π±ΠΎ ΡΡΠΌΡΠΆΠ½ΡΠΉ ΡΡΠ΅ΡΡ
- ΠΠ½Π°ΠΉΠΎΠΌΡΡΠ²ΠΎ Π· ML-ΡΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠ°ΠΌΠΈ ΡΠ° predictive modeling
ΠΡΠΎΠ±ΠΈΡΡΠΈΠΉ ΠΏΡΠΎΡΡΠ»Ρ
- ΠΡΠ΄ΠΏΠΎΠ²ΡΠ΄Π°Π»ΡΠ½ΡΡΡΡ ΡΠ° ΠΏΡΠΎΠ°ΠΊΡΠΈΠ²Π½ΡΡΡΡ
- ΠΠ΄Π°ΡΠ½ΡΡΡΡ ΠΏΡΠΎΡΡΠ²Π°ΡΠΈ ΡΠ΄Π΅Ρ ΡΠ° ΠΏΡΠΎΠΏΠΎΠ½ΡΠ²Π°ΡΠΈ ΠΏΠΎΠΊΡΠ°ΡΠ΅Π½Π½Ρ
- ΠΠΎΠΌΠ°Π½Π΄Π½ΠΈΠΉ Π³ΡΠ°Π²Π΅ΡΡ, ΡΠΊΠΈΠΉ ΡΠΎΠ·Π΄ΡΠ»ΡΡ Π²ΡΠ΄ΠΏΠΎΠ²ΡΠ΄Π°Π»ΡΠ½ΡΡΡΡ Π·Π° ΡΠ²ΠΈΠ΄ΠΊΡΡΡΡ, Π΅ΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡΡΡ ΡΠ° ΡΠΊΡΡΡΡ
- ΠΡΠ΄ΠΌΡΠ½Π½Ρ ΠΊΠΎΠΌΡΠ½ΡΠΊΠ°ΡΠΈΠ²Π½Ρ Π½Π°Π²ΠΈΡΠΊΠΈ β ΠΏΠΈΡΡΠΌΠΎΠ²Ρ ΡΠ° ΡΡΠ½Ρ