Software Engineer in Data (Senior, Rust, Arrow, Datafusion, Flink, Spark) $6000-10000 Offline
Kamu.dev is developing a new-generation decentralized data lake and a global data processing network based on Streaming SQL.
Think of us as "decentralized Snowflake", or "GitHub for data pipielines" - a network of data flows that can cross company boundaries and are collectively maintained by multiple parties.
Our focus:
- Privacy-preserving data sharing between companies
- Collaborative data processing
- 100% verifiability, provenance, and accountability of all data through cryptography.
This is your opportunity to join an ambitious early-stage startup that has already secured funding, and work on a technology that will shape the future of data science from a place of relative financial stability.
Required skills:
- BSc in CS or equivalent experience
- 6+ years of industry experience
- Mastery of one of the languages: Rust, C++, Java, or Scala
- Deep experience in one of the following: Apache Arrow, Apache Flink, Apache Spark, Kafka SQL/Streams
- Strong knowledge of SQL and database internals
- Modern data lake architecture and horizontal scaling
- Good written English skills, ability to write clear documentation
Desired skills:
- Statefull stream processing
- Data integration systems and patterns
- Data science toolkits (Pandas, R)
- Software quality (test pyramid, CI/CD)
- Structured data formats (Parquet, Arrow)
- CDC, Event sourcing
- Docker, AWS, Kubernetes
- Data visualization (PowerBI, Tableau, Jupyter)
- Development methodologies (Agile, Scrum)
- Open source collaboration
- Blockchain indexing and analytics (Dune)
- Decentralized storage (IPFS)
Responsibilities:
You will be working on the core technologies that serve our network and the platform:
- A stream-oriented data format for structured dynamic data that can work with conventional (S3, GCS) and decentralized (IPFS, Arweave) storage
- A metadata format that serves as a passport of data and describes every event that influenced it
- A protocol for 100% verifiable, reproducible, and auditable multi-party data processing
- A fleet of plug-in data processing engines including Flink, Spark, Datafusion, Arroyo
- And an infrastructure that turns this technology into a novel decentralized and near real-time data lake!
Core technology stack:
- Rust
- Parquet, Apache Arrow
- Streaming (temporal) SQL
- Apache Spark, Flink, Datafusion
- IPLD, IPFS, Filecoin
- Ethereum blockchain
Your work will include:
- Evolving the core data formats and protocols
- Improving the the existing data engines and integrating new ones
- Building an efficient distributed processing infrastructure for running data pipelines and API queries
- Designing data access APIs for ingress and egress of data
- Building a federated data sharing and compute network
- Integrating Kamu with 3rd-party data providers and consumers
- Integrating Kamu with blockchain decoding/indexing technologies
- Research and implementation of features like: Privacy-preserving compute, fine-grain provenance, AI/ML integration with Kamu data pipelines
- Communicating your progress to users and the community
- Contributing to the product documentation and automated testing
The job ad is no longer active
Look at the current jobs Rust →