Data Scientist Offline

Project is related to collection, visualization and analysis (including real-time) of the data received from production and test wells evaluation.

Responsibilities:

- Statistics collection from large data sets (Spark, Data Analysis)

- Building Predictive models based on data in Oil Industry Field (Spark, Machine Learning)

- Analyze data in order to target business goals like costs or risks reduction (Spark, Data Analysis, Data Mining)

- Distributed data computation (Spark)

Mandatory Skills Description:

C++/Python

Apache Spark

Data Engineering,

SQL

Nice-to-Have Skills:

Data Warehousing, Data Lakes

Understanding of fundamental algorithms (sorting, binary search, statistics etc.) and data structures (array, linked list, stack, queue, tree, heap, hash table, graph) in computer science

Knowledge of object-oriented programming principles (encapsulation, inheritance, polymorphism, SOLID principles)

Knowledge of common signal processing and filtering algorithms (FFT, bandpass filter, median filter, smoothing etc.)

Expertise in optimization/improvement of algorithms performance

Expertise in parallelizing of computational algorithms (Python multiprocessing, C/C++ multi-threading)

Expertise in distributed parallel computation frameworks (e.g. PySpark)

Experience in embedding Python programs to C/C++ code

Understanding of Python object serialization (pickle files)

Languages:

English: B2 Upper Intermediate