Lead Observability Engineer (IRC259837)
Description
The Generative AI Platform project leverages cutting-edge AI and ML technologies to develop and deploy cloud-agnostic Gen AI applications on demand and enable users to create custom solutions for Sales, Marketing, Software Engineering, and Customer Service use cases. Fully managed RAG workflow including ingestion, retrieval, and augmentation of data, securely connected to many types of information consolidated and aggregated from multiple sources. The Platform’s ultimate goal is to deliver the user-friendly LLM as a Service Model (LLMaaS) trained on internal company resources, which contributes to the adoption of AI among multiple consumers across the business.
Requirements
- Experience designing monitoring, alerting, and observability & implementing standardization across an extensive modular architecture
- Experience designing and implementing observability solutions tailored to product requirements and industry standards
- Experience with tools like Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana), OpenTelemetry, FluentBit.
- Hands-on experience querying and transforming volumes of data collected with these tools.
- Familiarity with AWS, GCP, or Azure observability solutions (CloudWatch, Stackdriver, Azure Monitor) as well as on-prem implementations ideally.
- Experience monitoring Kubernetes (K8s) workloads
- Ability to analyze logs, traces, and metrics to identify system performance issues.
Job responsibilities
Description:
Designing, implementing, and maintaining monitoring, logging, and tracing components of the product to ensure system reliability, performance, and availability.
Responsibilities:
- Develop and maintain data collection within an observability layer (from E2E tracing, logging and metric collection), which will later support dashboards, alerts, and reports for real-time system visibility.
- Design and maintain infrastructure monitoring approach to fulfill product SLA/SLO requirements.
- Design observability layer to collect necessary data for any product operation, including deployment, upgrade,s and other day-2 operations to provide visibility for product administrators
- Design observability tooling to provide product performance insights for product configuration and improvements
- Optimize log aggregation, metric collection, and distributed tracing for scalable applications. Ensure unified logging strategy across all product components.
- Collaborate with DevOps, SREs, and developers to enhance incident response and root cause analysis.
- Automate observability processes using Infrastructure-as-Code (IaC) tools.
What we offer
Culture of caring. At GlobalLogic, we prioritize a culture of caring. Across every region and department, at every level, we consistently put people first. From day one, you’ll experience an inclusive culture of acceptance and belonging, where you’ll have the chance to build meaningful connections with collaborative teammates, supportive managers, and compassionate leaders.
Learning and development. We are committed to your continuous learning and development. You’ll learn and grow daily in an environment with many opportunities to try new things, sharpen your skills, and advance your career at GlobalLogic. With our Career Navigator tool as just one example, GlobalLogic offers a rich array of programs, training curricula, and hands-on opportunities to grow personally and professionally.
Interesting & meaningful work. GlobalLogic is known for engineering impact for and with clients around the world. As part of our team, you’ll have the chance to work on projects that matter. Each is a unique opportunity to engage your curiosity and creative problem-solving skills as you help clients reimagine what’s possible and bring new solutions to market. In the process, you’ll have the privilege of working on some of the most cutting-edge and impactful solutions shaping the world today.
Balance and flexibility. We believe in the importance of balance and flexibility. With many functional career areas, roles, and work arrangements, you can explore ways of achieving the perfect balance between your work and life. Your life extends beyond the office, and we always do our best to help you integrate and balance the best of work and life, having fun along the way!
High-trust organization. We are a high-trust organization where integrity is key. By joining GlobalLogic, you’re placing your trust in a safe, reliable, and ethical global company. Integrity and trust are a cornerstone of our value proposition to our employees and clients. You will find truthfulness, candor, and integrity in everything we do.
About GlobalLogic
GlobalLogic, a Hitachi Group Company, is a trusted digital engineering partner to the world’s largest and most forward-thinking companies. Since 2000, we’ve been at the forefront of the digital revolution – helping create some of the most innovative and widely used digital products and experiences. Today we continue to collaborate with clients in transforming businesses and redefining industries through intelligent products, platforms, and services.