PulseRise Technologies

Systems Engineer (HPC) to $4000

PulseRise Technologies Responds Quickly

We are seeking a highly skilled Systems Engineer specializing in High Performance Computing (HPC) to support, maintain, and optimize our HPC infrastructure. The ideal candidate has deep technical expertise, hands-on experience with HPC environments, and a strong understanding of performance engineering, systems operations, and automation.

 

Project Start: ASAP

Project Duration: Until December 2026

Location: Remote (with on‑site onboarding in Cologne)

English: Fluent

German: as a plus

 

Responsibilities

Incident & Service Operations

Incident Management: Respond to, diagnose, and resolve HPC-related incidents to ensure system stability and minimize downtime.

Service Request Management: Process and fulfill service requests related to HPC resources, tooling, and services.

 

Technical Tasks

Troubleshooting: Investigate and resolve complex technical issues across HPC clusters, applications, networking, and performance workflows.

Testing & Validation: Develop, execute, and document test plans to validate system reliability, scalability, and performance.

Documentation: Create and maintain detailed documentation on system architecture, configurations, workflows, and optimizations.

Manage, monitor, and optimize HPC clusters, job scheduling systems, and related infrastructure.

Analyze performance bottlenecks and apply optimization techniques across compute, memory, and networking layers.

Support software development, integration, and deployment workflows within HPC environments.

 

Required Qualifications

Minimum 3 years of experience in software development and/or systems engineering with a strong focus on HPC environments.

Expertise in Linux operating systems, specifically Red Hat Enterprise Linux (RHEL).

Strong programming/scripting skills: C, C++, Python, Bash, Ansible

Hands-on experience with parallel computing frameworks: MPI, OpenMP, CUDA

Solid knowledge of computer architecture, performance tuning, and system optimization.

Experience managing HPC clusters, including job schedulers (e.g., Slurm, PBS, LSF).

Strong networking knowledge, particularly InfiniBand.

Understanding of ITIL best practices, especially: Incident Management, Service Management, Process Optimization

 

Soft Skills

Strong analytical and problem-solving capabilities

Ability to work in distributed, remote teams

Clear communication and documentation skills

Proactive, structured, and solution-oriented mindset

Required languages

English C1 - Advanced
German B2 - Upper Intermediate
Published 2 April
6 views
·
0 applications
To apply for this and other jobs on Djinni login or signup.
Loading...