Senior Big Data Engineer
Position Summary:
We are looking for a Sr. Data Engineer with a diverse background in data integration to join the Data Services team. Some data are small, some data are very large (1 trillion+ rows), some data is structured, some data is not. Our data comes in all kinds of sizes, shapes and formats. Traditional RDBMS like PostgreSQL, Oracle, SQL Server, MPPs like StarRocks, Vertica, Snowflake, Google BigQuery, and unstructured, key-value like MongoDB, Elasticsearch, to name a few.
We are looking for individuals who can design and solve any data problems using different types of databases and technologies supported within our team. We use MPP databases to analyze billions of rows in seconds. We use Spark and Iceberg, batch or streaming to process whatever the data needs are. We also use Trino to connect all different types of data without moving them around.
Responsibilities:
1) Implement ETL/ELT processes using various tools and programming languages (Scala, Python) against our MPP databases StarRocks, Vertica and Snowflake
2) As Data Engineers, not only are we developers, but we also maintain and administrate our MPP ecosystem. We will tune and maximize our hardware potential from OS, network and storage levels
3) Work with the Hadoop team and optimize Hive and Iceberg tables
4) Running POC between different table formats
5) Contribute to the existing Data Lake and Data Warehouse imitative using Hive, Spark, Iceberg, Presto/Trino
6) Analyze business requirements, design and implement required data models
Qualifications: (must have)
1) BA/BS in Computer Science or in related field
2) 2+ years of experience with MPP databases such as StarRocks, Vertica, Snowflake
3) 5+ years of experience with RDBMS databases such as Oracle, MSSQL or PostgreSQL
4) 2+ years of experience managing or developing in the Hadoop ecosystem
5) Programming background with Scala, Python, Java or C/C++
Experience with Elasticsearch or ELK stack
6) Working knowledge of streaming technologies such as Kafka
Strong in any of the Linux distributions, RHEL,CentOS or Fedora
Deep knowledge shell scripting, scheduling, and monitoring processes on Linux
7) Experience working in both OLAP and OLTP environments
8) Experience working on-prem, not just cloud environments
Desired: (nice to have)
1) Working knowledge of data unification and setup using Presto/Trino
2) Working knowledge of orchestration tools such Oozie and Airflow
3) Experience with Spark. PySpark, SparkSQL, Spark Streaming, etcβ¦
4) Experience using ETL tools such as Informatica, Talend and/or Pentaho
5) Understanding of Healthcare data
6) Data Analyst or Business Intelligence would be a plus
We are looking for a Sr. Data Engineer with a diverse background in data integration to join the Data Services team. Some data are small, some data are very large (1 trillion+ rows), some data is structured, some data is not. Our data comes in all kinds of sizes, shapes and formats. Traditional RDBMS like PostgreSQL, Oracle, SQL Server, MPPs like StarRocks, Vertica, Snowflake, Google BigQuery, and unstructured, key-value like MongoDB, Elasticsearch, to name a few.
We are looking for individuals who can design and solve any data problems using different types of databases and technologies supported within our team. We use MPP databases to analyze billions of rows in seconds. We use Spark and Iceberg, batch or streaming to process whatever the data needs are. We also use Trino to connect all different types of data without moving them around.
Responsibilities:
1) Implement ETL/ELT processes using various tools and programming languages (Scala, Python) against our MPP databases StarRocks, Vertica and Snowflake
2) As Data Engineers, not only are we developers, but we also maintain and administrate our MPP ecosystem. We will tune and maximize our hardware potential from OS, network and storage levels
3) Work with the Hadoop team and optimize Hive and Iceberg tables
4) Running POC between different table formats
5) Contribute to the existing Data Lake and Data Warehouse imitative using Hive, Spark, Iceberg, Presto/Trino
6) Analyze business requirements, design and implement required data models
Qualifications: (must have)
1) BA/BS in Computer Science or in related field
2) 2+ years of experience with MPP databases such as StarRocks, Vertica, Snowflake
3) 5+ years of experience with RDBMS databases such as Oracle, MSSQL or PostgreSQL
4) 2+ years of experience managing or developing in the Hadoop ecosystem
5) Programming background with Scala, Python, Java or C/C++
Experience with Elasticsearch or ELK stack
6) Working knowledge of streaming technologies such as Kafka
Strong in any of the Linux distributions, RHEL,CentOS or Fedora
Deep knowledge shell scripting, scheduling, and monitoring processes on Linux
7) Experience working in both OLAP and OLTP environments
8) Experience working on-prem, not just cloud environments
Desired: (nice to have)
1) Working knowledge of data unification and setup using Presto/Trino
2) Working knowledge of orchestration tools such Oozie and Airflow
3) Experience with Spark. PySpark, SparkSQL, Spark Streaming, etcβ¦
4) Experience using ETL tools such as Informatica, Talend and/or Pentaho
5) Understanding of Healthcare data
6) Data Analyst or Business Intelligence would be a plus
About Mellivora Software
Mellivora Software helps SME and Enterprise businesses build custom IT solutions for a bunch of specific industries. Our core expertise is focused around:β Big Data and AWS technologies
β DevOps practices
β NLP/Machine Learning technologies
Company website:
https://www.mellivorasoft.com/
DOU company page:
https://jobs.dou.ua/companies/mellivora-software/
Job posted on
15 April 2024
130 views 19 applications
130 views 19 applications
$4000-6000
Average salary range of similar jobs in
analytics β
Similar jobs
Senior Data Engineer at Adaptiq
Poland, Romania to $7000
Team Lead Java with Big data at Luxoft
Ukraine
Team Lead Java with Big data at Luxoft
Ukraine
All jobs Mellivora Software