Lead Research Systems Engineer

Columbia University in the City of New York

Manhattanville, NY

ID: 7089699
Posted: November 24, 2021
Application Deadline: Open Until Filled

Job Description

Position Summary

Reporting to the Manager, High Performance Computing; the Lead Research Systems Engineer participates in the design, development, implementation, and operations of Columbia’s portfolio of high-performance computing services. The position collaborates with other CUIT technical teams and Columbia researchers to support research computing resources, including but not limited to high performance computing (HPC) clusters, ensuring user requirements are met and planning ongoing improvements and modifications to these systems and services.

Responsibilities

Takes a lead role in the planning and design of research computing services.
Investigates new and emerging technologies, evaluating usefulness to Columbia researchers and making recommendations for future services.
Interacts with Columbia researchers on various topics, including (but not limited to) the use of existing services, service policies, and research requirements.
Takes a lead role in HPC system troubleshooting including coordinating with users, vendors and other CUIT departments to resolve system problems.
Manages storage systems.
Resolves incidents and service requests.
Administration of systems in the research computing infrastructure, including the installation and management of configuration, monitoring and notification tools, as well as basic network administration.
Assists in the creation and maintenance of user documentation.
Interacts with vendors, assessing products and making purchasing recommendations.
All other duties as assigned.
Minimum Qualifications

Bachelor’s degree or equivalent required. Advanced degree desirable.
Minimum 5-7 years’ related experience
5 years’ Linux/Unix experience.
Prior experience in programming, software development, or system administration.
Excellent written and verbal communication skills.
Demonstrated ability to work in a fast-paced, deadline driven environment.
Demonstrated excellence in a variety of competencies including teamwork/collaboration, analytical thinking, communication and influencing skills, and technical expertise.
Ability to work with changing priorities and with multiple projects.
Ability to be precise and attentive to detail is essential.
Ability to work with minimal supervision.
Ability to work weekend and off-hour work on occasion.

Preferred Qualifications

Experience with Linux system administration, particularly Red Hat (7, 8).
Familiarity with Google Cloud Platform or other cloud services.
Knowledge of GPFS, Lustre, NFS and other network or parallel file systems
Experience with Shell scripting and Python.
Experience with version control systems, such as Git.
Familiarity with HPC programming technologies (such as MPI, OpenMP, or CUDA).
Familiarity with other HPC technologies (such as Infiniband, or GPU).
Familiarity with Machine Learning software such as TensorFlow.
Familiarity with JupyterHub
Familiarity with standard programming languages (such as C, C++, Fortran, or Java).
Knowledge of TCP/IP.
Familiarity with statistical tools (such as R) or mathematical tools (such as Matlab).
Knowledge of technology, applications and interfaces designed to support research.