At Kamu we are developing a novel decentralized data lake technology that, similarly to the invention of SQL database 40 years ago, will write a new chapter in humanity’s transition towards data economy.
This is your opportunity to join an ambitious early-stage startup that has already secured funding, and work on a technology that will shape the future of data science from a place of relative financial stability.
- BSc in CS or equivalent experience
- 6+ years of industry experience
- Mastery of one of the languages: Rust, C++, Java, or Scala
- Deep experience in one of the following: Apache Arrow, Apache Flink, Apache Spark, Kafka SQL/Streams
- Strong knowledge of SQL and database internals
- Modern data lake architecture and horizontal scaling
- Good written English skills, ability to write clear documentation
- Statefull stream processing
- Data integration systems and patterns
- Data science toolkits (Pandas, R)
- Software quality (test pyramid, CI/CD)
- Structured data formats (Parquet, Arrow)
- CDC, Event sourcing
- Docker, AWS, Kubernetes
- Data visualization (PowerBI, Tableau, Jupyter)
- Development methodologies (Agile, Scrum)
- Open source collaboration
- Blockchain indexing and analytics (Dune, TrueBlocks)
- Decentralized storage (IPFS)
You will be working on the core technologies that serve our network and the platform:
- A stream-oriented data format for structured dynamic data that can work with conventional (S3, HDFS) and decentralized (IPFS, Arweave) storage
- A metadata format that serves as a passport of data and describes every event that influenced it
- A protocol for 100% verifiable, reproducible, and auditable multi-party data processing
- A fleet of plug-in data processing engines
- And an infrastructure that turns this technology into a novel decentralized and near real-time data lake!
Core technology stack:
- Apache Arrow
- Streaming (temporal) SQL
- Apache Spark, Flink, DataFusion
- IPLD, IPFS, Filecoin
- Ethereum blockchain
Your work will include:
- Evolving the core data formats and protocols
- Improving the the existing data engines and integrating new ones
- Building an efficient distributed processing infrastructure for running data pipelines and API queries
- Designing data access APIs for ingress and egress of data
- Building a federated data sharing and compute network
- Integrating Kamu with 3rd-party data providers and consumers
- Integrating Kamu with blockchain decoding/indexing technologies
- Research and implementation of features like: Privacy-preserving compute, Fine-grain provenance, AI/ML integration with Kamu data pipelines
- Communicating your progress to users and the community
- Contributing to the product documentation and automated testing
About Kamu Data
Kamu is a 3y-old startup, backed by investors like Protocol Labs (IPFS, Filecoin).
We are a fully distributed company with presence in Canada (Vancouver) / Ukraine / Portugal.
We are building the world's first decentralized data lake and collaborative data processing network. Think "GitHub for data", where people build streaming pipelines with SQL that continuously process data from governments, industry, and blockchains into high-quality datasets ready for AI training and use in Smart Contracts, while data is 100% auditable and verifiable.
Our goal to bring the effects of Open Source Revolution to data.
DOU company page:
Job posted on
19 November 2023
To apply for this and other jobs on Djinni