Software Engineer in Data (Senior, Rust, Arrow, Datafusion, Flink, Spark) $6000-10000 Offline

Kamu.dev is developing a new-generation decentralized data lake and a global data processing network based on Streaming SQL.

 

Think of us as "decentralized Snowflake", or "GitHub for data pipielines" - a network of data flows that can cross company boundaries and are collectively maintained by multiple parties.

 

Our focus:

- Privacy-preserving data sharing between companies

- Collaborative data processing

- 100% verifiability, provenance, and accountability of all data through cryptography.

 

This is your opportunity to join an ambitious early-stage startup that has already secured funding, and work on a technology that will shape the future of data science from a place of relative financial stability.

 

Required skills:

- BSc in CS or equivalent experience

- 6+ years of industry experience

- Mastery of one of the languages: Rust, C++, Java, or Scala

- Deep experience in one of the following: Apache Arrow, Apache Flink, Apache Spark, Kafka SQL/Streams

- Strong knowledge of SQL and database internals

- Modern data lake architecture and horizontal scaling

- Good written English skills, ability to write clear documentation

 

Desired skills:

- Statefull stream processing

- Data integration systems and patterns

- Data science toolkits (Pandas, R)

- Software quality (test pyramid, CI/CD)

- Structured data formats (Parquet, Arrow)

- CDC, Event sourcing

- Docker, AWS, Kubernetes

- Data visualization (PowerBI, Tableau, Jupyter)

- Development methodologies (Agile, Scrum)

- Open source collaboration

- Blockchain indexing and analytics (Dune)

- Decentralized storage (IPFS)

 

Responsibilities:

You will be working on the core technologies that serve our network and the platform:

- A stream-oriented data format for structured dynamic data that can work with conventional (S3, GCS) and decentralized (IPFS, Arweave) storage

- A metadata format that serves as a passport of data and describes every event that influenced it

- A protocol for 100% verifiable, reproducible, and auditable multi-party data processing

- A fleet of plug-in data processing engines including Flink, Spark, Datafusion, Arroyo

- And an infrastructure that turns this technology into a novel decentralized and near real-time data lake!

 

Core technology stack:

- Rust

- Parquet, Apache Arrow

- Streaming (temporal) SQL

- Apache Spark, Flink, Datafusion

- IPLD, IPFS, Filecoin

- Ethereum blockchain

 

Your work will include:

- Evolving the core data formats and protocols

- Improving the the existing data engines and integrating new ones

- Building an efficient distributed processing infrastructure for running data pipelines and API queries

- Designing data access APIs for ingress and egress of data

- Building a federated data sharing and compute network

- Integrating Kamu with 3rd-party data providers and consumers

- Integrating Kamu with blockchain decoding/indexing technologies

- Research and implementation of features like: Privacy-preserving compute, fine-grain provenance, AI/ML integration with Kamu data pipelines

- Communicating your progress to users and the community

- Contributing to the product documentation and automated testing

The job ad is no longer active

Look at the current jobs Rust →

Similar jobs

Countries of Europe or Ukraine