Cyber Harbor is a fast-growing Ukraine-based company founded by engineers and researchers who helped defend critical cyber infrastructure during the war. Our work is shaped by real-world experience, and we build AI-powered systems designed to handle complex, high-stakes environments.
We are currently expanding and looking for a strong Data Engineer to help us design and build scalable pipelines for processing large volumes of both structured and unstructured data. These pipelines power comprehensive, interactive knowledge bases used in real-world operations.
In this role, you will work closely with the core data processing workflows and collaborate daily with software engineers, AI/ML engineers, DevOps specialists, and the QA team. Because much of the work happens across different parts of the system, strong communication and the ability to collaborate effectively with other teams are essential.
What You Will Do
- Design and build ETL/ELT pipelines for large volumes of unstructured data using orchestrators such as Temporal, Prefect, Celery, or Airflow, along with distributed processing engines like Ray and Spark.
- Automate AI-related tasks including text extraction, summarization, OCR, named entity recognition, and embedding generation using various self-hosted models, including some developed in-house.
- Design data schemas and manage storage across multiple systems, including relational databases (PostgreSQL), NoSQL databases (Elasticsearch, Mongodb), vector search engines (Qdrant, Milvus), and graph databases (Neo4j).
- Set up and maintain messaging queues for distributed data processing using tools such as Kafka, RabbitMQ.
- Develop custom Python tools and logic to ensure high throughput, reliability, and fault tolerance in data pipelines.
- Containerize services with Docker and orchestrate deployments using Kubernetes, including on-premise environments.
- Build and maintain CI/CD pipelines to support reliable development and deployment workflows.
- Monitor infrastructure and pipeline health using Prometheus and Grafana, and manage centralized logging with the ELK Stack.
What You Need to Join Us
- Strong hands-on experience with SELF-HOSTED data engineering tools.
- Solid Python skills and the ability to build reliable, production-grade data workflows.
- Experience automating workflows that involve different AI models, including self-hosted ones.
- Good understanding of how modern search and RAG systems are designed and operate.
- Familiarity with messaging systems such as Apache Kafka, RabbitMQ.
- Strong understanding of relational databases (PostgreSQL) as well as other data systems such as Elasticsearch, Qdrant/Milvus, and Neo4j.
- Practical experience with Docker, Kubernetes, and CI/CD tools (GitHub Actions or GitLab CI).
- Familiarity with monitoring and observability tools such as Prometheus, Grafana, and the ELK Stack.
- Experience working with on-premise infrastructure environments.
Why Join Us
- Make a Real Impact: Our systems have already been validated in real-world, nation-state level operations.
- Right Place, Right Time: Work at the intersection of AI, large-scale data processing, and cybersecurity.
- Military Deferment: Available for full-time employees.
- Flexible Schedule: Remote-friendly environment focused on results rather than rigid hours.
βββββββββββββββββββββββββββββββΠ ΡΠ΄Π½ΠΎΡβββββββββββββββββββββββββββββββ
Cyber Harbor - ΡΠΊΡΠ°ΡΠ½ΡΡΠΊΠ° ΠΊΠΎΠΌΠΏΠ°Π½ΡΡ, Π·Π°ΡΠ½ΠΎΠ²Π°Π½Π° ΡΠ½ΠΆΠ΅Π½Π΅ΡΠ°ΠΌΠΈ ΡΠ° Π΄ΠΎΡΠ»ΡΠ΄Π½ΠΈΠΊΠ°ΠΌΠΈ, ΡΠΊΡ Π±ΡΠ°Π»ΠΈ ΡΡΠ°ΡΡΡ Ρ Π·Π°Ρ
ΠΈΡΡΡ ΠΊΡΠΈΡΠΈΡΠ½ΠΎΡ ΠΊΡΠ±Π΅ΡΡΠ½ΡΡΠ°ΡΡΡΡΠΊΡΡΡΠΈ ΠΏΡΠ΄ ΡΠ°Ρ Π²ΡΠΉΠ½ΠΈ. ΠΠΈ ΡΠΏΠΈΡΠ°ΡΠΌΠΎΡΡ Π½Π° ΡΠ΅Π°Π»ΡΠ½ΠΈΠΉ ΠΏΡΠ°ΠΊΡΠΈΡΠ½ΠΈΠΉ Π΄ΠΎΡΠ²ΡΠ΄ Ρ ΡΡΠ²ΠΎΡΡΡΠΌΠΎ AI-powered ΡΠΈΡΡΠ΅ΠΌΠΈ Π΄Π»Ρ ΡΠΊΠ»Π°Π΄Π½ΠΈΡ
ΡΠ΅ΡΠ΅Π΄ΠΎΠ²ΠΈΡ, Π΄Π΅ ΠΏΠΎΠΌΠΈΠ»ΠΊΠ° ΠΌΠΎΠΆΠ΅ ΠΌΠ°ΡΠΈ Π²ΠΈΡΠΎΠΊΡ ΡΡΠ½Ρ.
ΠΠ°ΡΠ°Π· ΠΌΠΈ ΡΠΎΠ·ΡΠΈΡΡΡΠΌΠΎ ΠΊΠΎΠΌΠ°Π½Π΄Ρ ΡΠ° ΡΡΠΊΠ°ΡΠΌΠΎ ΡΠΈΠ»ΡΠ½ΠΎΠ³ΠΎ Data Engineer, ΡΠΊΠΈΠΉ Π΄ΠΎΠΏΠΎΠΌΠΎΠΆΠ΅ Π½Π°ΠΌ ΠΏΡΠΎΡΠΊΡΡΠ²Π°ΡΠΈ ΠΉ ΡΠΎΠ·Π²ΠΈΠ²Π°ΡΠΈ ΠΌΠ°ΡΡΡΠ°Π±ΠΎΠ²Π°Π½Ρ ΠΏΠ°ΠΉΠΏΠ»Π°ΠΉΠ½ΠΈ Π΄Π»Ρ ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ Π²Π΅Π»ΠΈΠΊΠΈΡ
ΠΎΠ±ΡΡΠ³ΡΠ² ΡΡΡΡΠΊΡΡΡΠΎΠ²Π°Π½ΠΈΡ
Ρ Π½Π΅ΡΡΡΡΠΊΡΡΡΠΎΠ²Π°Π½ΠΈΡ
Π΄Π°Π½ΠΈΡ
. Π‘Π°ΠΌΠ΅ ΡΡ ΠΏΠ°ΠΉΠΏΠ»Π°ΠΉΠ½ΠΈ Ρ ΠΎΡΠ½ΠΎΠ²ΠΎΡ ΠΊΠΎΠΌΠΏΠ»Π΅ΠΊΡΠ½ΠΈΡ
ΡΠ½ΡΠ΅ΡΠ°ΠΊΡΠΈΠ²Π½ΠΈΡ
Π±Π°Π· Π·Π½Π°Π½Ρ, ΡΠΎ Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΡΡΡΡΡ Π² ΡΠ΅Π°Π»ΡΠ½ΠΈΡ
ΠΎΠΏΠ΅ΡΠ°ΡΡΠΉΠ½ΠΈΡ
ΠΏΡΠΎΡΠ΅ΡΠ°Ρ
.
Π£ ΡΡΠΉ ΡΠΎΠ»Ρ Π²ΠΈ ΠΏΡΠ°ΡΡΠ²Π°ΡΠΈΠΌΠ΅ΡΠ΅ Π· ΠΊΠ»ΡΡΠΎΠ²ΠΈΠΌΠΈ ΠΏΡΠΎΡΠ΅ΡΠ°ΠΌΠΈ ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ Π΄Π°Π½ΠΈΡ
ΡΠ° ΡΠΎΠ΄Π½Ρ Π²Π·Π°ΡΠΌΠΎΠ΄ΡΡΡΠΈΠΌΠ΅ΡΠ΅ Π· software engineers, AI/ML engineers, DevOps specialists Ρ QA ΠΊΠΎΠΌΠ°Π½Π΄ΠΎΡ. ΠΠ½Π°ΡΠ½Π° ΡΠ°ΡΡΠΈΠ½Π° Π·Π°Π΄Π°Ρ Π»Π΅ΠΆΠΈΡΡ Π½Π° ΡΡΠΈΠΊΡ ΠΊΡΠ»ΡΠΊΠΎΡ
ΠΊΠΎΠΌΠΏΠΎΠ½Π΅Π½ΡΡΠ² ΡΠΈΡΡΠ΅ΠΌΠΈ, ΡΠΎΠΌΡ Π΄Π»Ρ Π½Π°Ρ Π²Π°ΠΆΠ»ΠΈΠ²Ρ ΡΠΈΠ»ΡΠ½Ρ ΠΊΠΎΠΌΡΠ½ΡΠΊΠ°ΡΡΠΉΠ½Ρ Π½Π°Π²ΠΈΡΠΊΠΈ ΡΠ° Π²ΠΌΡΠ½Π½Ρ Π΅ΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎ ΠΏΡΠ°ΡΡΠ²Π°ΡΠΈ Π· ΡΡΠ·Π½ΠΈΠΌΠΈ ΠΊΠΎΠΌΠ°Π½Π΄Π°ΠΌΠΈ.
Π©ΠΎ Π²ΠΈ Π±ΡΠ΄Π΅ΡΠ΅ ΡΠΎΠ±ΠΈΡΠΈ
- ΠΡΠΎΡΠΊΡΡΠ²Π°ΡΠΈ ΡΠ° ΡΠΎΠ·ΡΠΎΠ±Π»ΡΡΠΈ ETL/ELT-ΠΏΠ°ΠΉΠΏΠ»Π°ΠΉΠ½ΠΈ Π΄Π»Ρ ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ Π²Π΅Π»ΠΈΠΊΠΈΡ
ΠΎΠ±ΡΡΠ³ΡΠ² Π½Π΅ΡΡΡΡΠΊΡΡΡΠΎΠ²Π°Π½ΠΈΡ
Π΄Π°Π½ΠΈΡ
ΡΠ· Π²ΠΈΠΊΠΎΡΠΈΡΡΠ°Π½Π½ΡΠΌ ΡΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡΠ² ΠΎΡΠΊΠ΅ΡΡΡΠ°ΡΡΡ, ΡΠ°ΠΊΠΈΡ
ΡΠΊ Temporal, Prefect, Celery Π°Π±ΠΎ Airflow, Π° ΡΠ°ΠΊΠΎΠΆ ΡΠΈΡΡΠ΅ΠΌ ΡΠΎΠ·ΠΏΠΎΠ΄ΡΠ»Π΅Π½ΠΎΡ ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ, Π·ΠΎΠΊΡΠ΅ΠΌΠ° Ray Ρ Spark.
- ΠΠ²ΡΠΎΠΌΠ°ΡΠΈΠ·ΡΠ²Π°ΡΠΈ AI-related Π·Π°Π΄Π°ΡΡ, Π·ΠΎΠΊΡΠ΅ΠΌΠ° text extraction, summarization, OCR, named entity recognition ΡΠ° Π³Π΅Π½Π΅ΡΠ°ΡΡΡ embeddings, Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΡΡΠΈ ΡΡΠ·Π½Ρ self-hosted ΠΌΠΎΠ΄Π΅Π»Ρ, Ρ ΡΠΎΠΌΡ ΡΠΈΡΠ»Ρ ΡΠ°ΡΡΠΊΠΎΠ²ΠΎ ΡΠΎΠ·ΡΠΎΠ±Π»Π΅Π½Ρ in-house.
- ΠΡΠΎΡΠΊΡΡΠ²Π°ΡΠΈ ΡΡ
Π΅ΠΌΠΈ Π΄Π°Π½ΠΈΡ
Ρ Π±ΡΠ΄ΡΠ²Π°ΡΠΈ Π»ΠΎΠ³ΡΠΊΡ Π·Π±Π΅ΡΡΠ³Π°Π½Π½Ρ Π² ΠΊΡΠ»ΡΠΊΠΎΡ
ΡΠΈΡΡΠ΅ΠΌΠ°Ρ
ΠΎΠ΄Π½ΠΎΡΠ°ΡΠ½ΠΎ: ΡΠ΅Π»ΡΡΡΠΉΠ½ΠΈΡ
Π±Π°Π·Π°Ρ
Π΄Π°Π½ΠΈΡ
(PostgreSQL), NoSQL-ΡΡΡΠ΅Π½Π½ΡΡ
(Elasticsearch, MongoDB), Π²Π΅ΠΊΡΠΎΡΠ½ΠΈΡ
ΠΏΠΎΡΡΠΊΠΎΠ²ΠΈΡ
ΡΠΈΡΡΠ΅ΠΌΠ°Ρ
(Qdrant, Milvus) Ρ Π³ΡΠ°ΡΠΎΠ²ΠΈΡ
Π±Π°Π·Π°Ρ
Π΄Π°Π½ΠΈΡ
(Neo4j).
- ΠΠ°Π»Π°ΡΡΠΎΠ²ΡΠ²Π°ΡΠΈ ΡΠ° ΠΏΡΠ΄ΡΡΠΈΠΌΡΠ²Π°ΡΠΈ ΡΠ΅ΡΠ³ΠΈ ΠΏΠΎΠ²ΡΠ΄ΠΎΠΌΠ»Π΅Π½Ρ Π΄Π»Ρ ΡΠΎΠ·ΠΏΠΎΠ΄ΡΠ»Π΅Π½ΠΎΡ ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ Π΄Π°Π½ΠΈΡ
Π·Π° Π΄ΠΎΠΏΠΎΠΌΠΎΠ³ΠΎΡ ΡΠ°ΠΊΠΈΡ
ΡΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡΠ², ΡΠΊ Kafka ΡΠ° RabbitMQ.
- Π ΠΎΠ·ΡΠΎΠ±Π»ΡΡΠΈ ΠΊΠ°ΡΡΠΎΠΌΠ½Ρ Python-ΡΠ½ΡΡΡΡΠΌΠ΅Π½ΡΠΈ ΠΉ Π²Π½ΡΡΡΡΡΠ½Ρ Π»ΠΎΠ³ΡΠΊΡ, ΡΠΎ Π·Π°Π±Π΅Π·ΠΏΠ΅ΡΡΡΡΡ Π²ΠΈΡΠΎΠΊΡ ΠΏΡΠΎΠΏΡΡΠΊΠ½Ρ Π·Π΄Π°ΡΠ½ΡΡΡΡ, Π½Π°Π΄ΡΠΉΠ½ΡΡΡΡ Ρ Π²ΡΠ΄ΠΌΠΎΠ²ΠΎΡΡΡΠΉΠΊΡΡΡΡ data pipelines.
- ΠΠΎΠ½ΡΠ΅ΠΉΠ½Π΅ΡΠΈΠ·ΡΠ²Π°ΡΠΈ ΡΠ΅ΡΠ²ΡΡΠΈ Π·Π° Π΄ΠΎΠΏΠΎΠΌΠΎΠ³ΠΎΡ Docker ΡΠ° ΠΊΠ΅ΡΡΠ²Π°ΡΠΈ ΡΠΎΠ·Π³ΠΎΡΡΠ°Π½Π½ΡΠΌ ΡΠ΅ΡΠ΅Π· Kubernetes, Π·ΠΎΠΊΡΠ΅ΠΌΠ° Π² on-premise ΡΠ΅ΡΠ΅Π΄ΠΎΠ²ΠΈΡΠ°Ρ
.
- ΠΡΠ΄ΡΠ²Π°ΡΠΈ ΠΉ ΠΏΡΠ΄ΡΡΠΈΠΌΡΠ²Π°ΡΠΈ CI/CD pipelines Π΄Π»Ρ ΡΡΠ°Π±ΡΠ»ΡΠ½ΠΈΡ
Ρ ΠΏΠ΅ΡΠ΅Π΄Π±Π°ΡΡΠ²Π°Π½ΠΈΡ
ΠΏΡΠΎΡΠ΅ΡΡΠ² ΡΠΎΠ·ΡΠΎΠ±ΠΊΠΈ ΡΠ° Π΄Π΅ΠΏΠ»ΠΎΡ.
- ΠΡΠ΄ΡΡΠ΅ΠΆΡΠ²Π°ΡΠΈ ΡΡΠ°Π½ ΡΠ½ΡΡΠ°ΡΡΡΡΠΊΡΡΡΠΈ ΡΠ° ΠΏΠ°ΠΉΠΏΠ»Π°ΠΉΠ½ΡΠ² Π·Π° Π΄ΠΎΠΏΠΎΠΌΠΎΠ³ΠΎΡ Prometheus Ρ Grafana, Π° ΡΠ°ΠΊΠΎΠΆ ΠΏΡΠ΄ΡΡΠΈΠΌΡΠ²Π°ΡΠΈ ΡΠ΅Π½ΡΡΠ°Π»ΡΠ·ΠΎΠ²Π°Π½Π΅ Π»ΠΎΠ³ΡΠ²Π°Π½Π½Ρ ΡΠ΅ΡΠ΅Π· ELK Stack.
Π©ΠΎ Π½Π°ΠΌ Π²Π°ΠΆΠ»ΠΈΠ²ΠΎ
- Π‘ΠΈΠ»ΡΠ½ΠΈΠΉ ΠΏΡΠ°ΠΊΡΠΈΡΠ½ΠΈΠΉ Π΄ΠΎΡΠ²ΡΠ΄ ΡΠΎΠ±ΠΎΡΠΈ Π· self-hosted ΡΠ½ΡΡΡΡΠΌΠ΅Π½ΡΠ°ΠΌΠΈ Π΄Π»Ρ data engineering.
- ΠΠΏΠ΅Π²Π½Π΅Π½Π΅ Π²ΠΎΠ»ΠΎΠ΄ΡΠ½Π½Ρ Python Ρ Π²ΠΌΡΠ½Π½Ρ Π±ΡΠ΄ΡΠ²Π°ΡΠΈ Π½Π°Π΄ΡΠΉΠ½Ρ data workflows production-ΡΡΠ²Π½Ρ.
- ΠΠΎΡΠ²ΡΠ΄ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·Π°ΡΡΡ ΠΏΡΠΎΡΠ΅ΡΡΠ², ΡΠΎ Π²ΠΊΠ»ΡΡΠ°ΡΡΡ ΡΠΎΠ±ΠΎΡΡ Π· ΡΡΠ·Π½ΠΈΠΌΠΈ AI-ΠΌΠΎΠ΄Π΅Π»ΡΠΌΠΈ, Π·ΠΎΠΊΡΠ΅ΠΌΠ° self-hosted.
- Π ΠΎΠ·ΡΠΌΡΠ½Π½Ρ ΠΏΡΠΈΠ½ΡΠΈΠΏΡΠ² ΠΏΠΎΠ±ΡΠ΄ΠΎΠ²ΠΈ ΡΠ° ΡΠΎΠ±ΠΎΡΠΈ ΡΡΡΠ°ΡΠ½ΠΈΡ
search Ρ RAG systems.
- ΠΡΠ°ΠΊΡΠΈΡΠ½ΠΈΠΉ Π΄ΠΎΡΠ²ΡΠ΄ ΡΠΎΠ±ΠΎΡΠΈ Π· ΡΠΈΡΡΠ΅ΠΌΠ°ΠΌΠΈ ΠΎΠ±ΠΌΡΠ½Ρ ΠΏΠΎΠ²ΡΠ΄ΠΎΠΌΠ»Π΅Π½Π½ΡΠΌΠΈ, Π·ΠΎΠΊΡΠ΅ΠΌΠ° Apache Kafka ΡΠ° RabbitMQ.
- Π‘ΠΈΠ»ΡΠ½Π΅ ΡΠΎΠ·ΡΠΌΡΠ½Π½Ρ ΡΠ΅Π»ΡΡΡΠΉΠ½ΠΈΡ
Π±Π°Π· Π΄Π°Π½ΠΈΡ
(PostgreSQL), Π° ΡΠ°ΠΊΠΎΠΆ ΡΠ½ΡΠΈΡ
ΡΠΈΡΡΠ΅ΠΌ Π·Π±Π΅ΡΡΠ³Π°Π½Π½Ρ ΠΉ ΠΏΠΎΡΡΠΊΡ Π΄Π°Π½ΠΈΡ
, Π·ΠΎΠΊΡΠ΅ΠΌΠ° Elasticsearch, Qdrant/Milvus Ρ Neo4j.
- ΠΡΠ°ΠΊΡΠΈΡΠ½ΠΈΠΉ Π΄ΠΎΡΠ²ΡΠ΄ ΡΠΎΠ±ΠΎΡΠΈ Π· Docker, Kubernetes Ρ CI/CD-ΡΠ½ΡΡΡΡΠΌΠ΅Π½ΡΠ°ΠΌΠΈ (GitHub Actions Π°Π±ΠΎ GitLab CI).
- ΠΠ½Π°ΠΉΠΎΠΌΡΡΠ²ΠΎ Π· ΡΠ½ΡΡΡΡΠΌΠ΅Π½ΡΠ°ΠΌΠΈ ΠΌΠΎΠ½ΡΡΠΎΡΠΈΠ½Π³Ρ ΡΠ° observability, ΡΠ°ΠΊΠΈΠΌΠΈ ΡΠΊ Prometheus, Grafana ΠΉ ELK Stack.
- ΠΠΎΡΠ²ΡΠ΄ ΡΠΎΠ±ΠΎΡΠΈ Π· on-premise ΡΠ½ΡΡΠ°ΡΡΡΡΠΊΡΡΡΠΎΡ.
Π§ΠΎΠΌΡ Π²Π°ΡΡΠΎ ΠΏΡΠΈΡΠ΄Π½Π°ΡΠΈΡΡ
- Π Π΅Π°Π»ΡΠ½ΠΈΠΉ Π²ΠΏΠ»ΠΈΠ² - Π½Π°ΡΡ ΡΠΈΡΡΠ΅ΠΌΠΈ Π²ΠΆΠ΅ Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΡΡΡΡΡ Π² ΡΠ΅Π°Π»ΡΠ½ΠΈΡ
ΠΎΠΏΠ΅ΡΠ°ΡΡΠΉΠ½ΠΈΡ
ΡΠΌΠΎΠ²Π°Ρ
.
- Π‘ΠΈΠ»ΡΠ½ΠΈΠΉ ΡΠ΅Ρ
Π½ΡΡΠ½ΠΈΠΉ ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡ - Π²ΠΈ ΠΏΡΠ°ΡΡΠ²Π°ΡΠΈΠΌΠ΅ΡΠ΅ Π½Π° ΠΏΠ΅ΡΠ΅ΡΠΈΠ½Ρ AI, ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ Π²Π΅Π»ΠΈΠΊΠΈΡ
Π΄Π°Π½ΠΈΡ
Ρ ΠΊΡΠ±Π΅ΡΠ±Π΅Π·ΠΏΠ΅ΠΊΠΈ.
- ΠΡΠΎΠ½ΡΠ²Π°Π½Π½Ρ - Π΄ΠΎΡΡΡΠΏΠ½Π΅ Π΄Π»Ρ full-time ΠΏΡΠ°ΡΡΠ²Π½ΠΈΠΊΡΠ².
- ΠΠ½ΡΡΠΊΠΈΠΉ ΡΠΎΡΠΌΠ°Ρ ΡΠΎΠ±ΠΎΡΠΈ - remote-friendly ΡΠ΅ΡΠ΅Π΄ΠΎΠ²ΠΈΡΠ΅ Π· ΡΠΎΠΊΡΡΠΎΠΌ Π½Π° ΡΠ΅Π·ΡΠ»ΡΡΠ°Ρ, Π° Π½Π΅ Π½Π° ΡΠΎΡΠΌΠ°Π»ΡΠ½ΠΈΠΉ Π³ΡΠ°ΡΡΠΊ.