The project is a fault-detection and alerting system that serves the network of one of the biggest cable internet providers in the United States.
Technologies: Java, Big Data, Spring, AWS, Kafka, Spark, Scala
We collect various data (logs, metrics, statistics) from a huge number of network routers and process it in a real-time streaming mode.
Currently our input flow is more that 350 thousands messages per second.
The whole project is logically splitted into to parts:
1) Set of Apache Spark applications that ingests initial flow of the data and performs filtration and enrichment. The code is written in Scala (we strive to functional programming approach). We use Amazon Kinesis and Apache Kafka as input message buses, and we run on Amazon EMR.
2) Set of Java-based applications that take messages from the Spark part, and analyze them to detect network anomalies, and automatically creates incidents for network engineers. These Java applications run with Spring boot and use Spring integration to build message processing chains.
We are very flexible in choosing technologies, frameworks and libraries for our solutions.
Currently we use:
Amazon Kinesis and Apache Kafka for transporting massive flow of messages
Amazon SQS for messaging in a transactional way
InfluxDB for storing time series data and calculating metrics on them
Amazon RDS (PostgreSQL and MySQL) for transactional data
Apache Spark for both batch and streaming (we use structured streaming) processing
ElasticSearch for information that requires text-based search
Hazelcast as a distributed cache
Grafana+ElasticSearch and Amazon Cloudwatch for metrics
Amazon EMR to run our Spark jobs
Docker, ECS, EC2, Ansible to run Java applications etc.
· Strong knowledge of Java 8+
· Ability to work with any of the message queues (JMS, Rabbit, Amazon SQS, etc.)
· Basic relational database skills
· Basic knowledge of the Spring framework
· Understanding the principles of parallel data processing
· Understanding the principles of developing high-load applications
· Basic Linux skills
· Willingness and desire to learn Scala and Apache Spark
Would be a plus:
· Experience with Hazelcast, ElasticSearch, Redis
· Experience with NoSQL databases
· Experience with AWS
· Experience in developing systems of stream processing of messages on Apache Spark, Apache Flink or Apache Storm
· Knowledge of Scala, Big Data.
About The Product Engine
Founded in 2001 and located in the heart of Silicon Valley, The Product Engine provides end-to-end consulting and software development services.
Our mission is to be a highly dependable, single-source digital solutions provider to a wide-range of customers and organizations.
The Product Engine strives to deliver the most suitable and intelligent software & technology solutions on demand. Our goal is to be a leading force in the successful development and implementation of innovative products and services that fully satisfy the evolving goals of our customers and their businesses.
With our headquarters and project managers in Silicon Valley and our offshore development centers in Ukraine (Odessa), The Product Engine delivers the best of offshore IT talent, using intensive project management, efficient and reliable communication practices, sound risk management, and an unwavering commitment to quality.
The Product Engine delivers high quality software applications and solutions on time and on budget in a variety of fields. From designing sophisticated software architectures for data intensive applications to building intricate websites, we have the flexibility to take on tasks either by complementing your existing engineering team, or by functioning as your complete virtual engineering department.
DOU company page: