Lead Big Data Engineer

5+ years of hands-on experience in the big-data field: Hadoop, Spark, Spark Streaming, MR.

-3+ years of experience with relevant cloud data services (AWS, Azure, GCP) like EMR, Glue, S3, Lambda, Fargate, DynamoDB, ADF, Azure functions, Azure Blob Storage, DataProc & DataFlow, BigQuery.

-Knowledge of DataBricks will be significant benefit for the candidate.

- Excellent knowledge and hands-on experience of SQL in context of Big Data: Spark SQL, Hive QL

- Ability to understand and optimize Spark execution plans via Spark UI

- Excellent knowledge of Python or Scala including Functional Scala Libs (Cats) or Java SE

- Batch processing and ETL principles in Data Warehouses.

- Data completeness signals and orchestration

- Approaches for historical reprocessing and data correction

- Handling bad data and late data in inputs and outputs

- Schema migrations and datasets evolution

- B2+ Strong speaking English.

- Ability to learn fast new set of tools and technology used internally: Radar, platform services, telemetry providers, Spark-as-a-Service, build system and much more.

 

 

Will be a plus

- Understanding of functional programming ideas and principles.

- Experience in building and using web services.

- Experience with any of Teradata, Vertica, Oracle, Tableau.

- Spark Streaming and Kafka.

- Experience or knowledge of: Apache Iceberg, Trino (Presto), Druid, Cassandra, Blob storage like AWS.

- Understanding or experience with Azkaban or Airflow.