Systems Engineer II

University of Texas at Dallas

Dallas, TX

ID: 7140026
Posted: April 14, 2023
Application Deadline: Open Until Filled

Job Description

Job Summary
The HPC Systems Engineer II is responsible for provisioning, deploying, administering, monitoring, maintaining, troubleshooting, upgrading and patching of HPC/HPN and CI resources and services. Assist the HPC Systems Engineer I in learning and developing.
Minimum Education and Experience
No degree – Six years recent applicable experience
Associate Degree with 4 years applicable experience
Bachelor’s Degree with 2 years applicable experience
Preferred Education and Experience
BS degree with a minimum of 3+ years of experience working in IT HPC.
Working knowledge of CENTOS, Red Hat Enterprise Linux and RHEL derivatives.
Familiarity with Linux administration in a virtual environment
Familiarity with architecture, provisioning, and maintenance of Linux based HPC systems.
Ability to work with PI / Researchers to meet their needs and provide solutions to assist them
To have networking experience with both conventional and HPN High Performance Networking solutions.
Troubleshooting methodology for systems, networking, and OpenHPC related systems.
Knowledge of OS vulnerabilities and patching, third party application vulnerabilities and patching.
Good communication and problem solving skills.
Essential Duties and Responsibilities
Essential Duties and Responsibilities

Provide cyber infrastructure technical support to departmental and end users.
Setup/usage/documentation/training of OIT-CI Request Tracker ticketing system
Installation, configuration, application of software, hardware.
Creation of software modules in the OpenHPC stack
Install/remove/maintain server hardware (hard drives, CPU, memory, PCIe cards, etc.)
Preparation of hardware for disposal in UTD surplus department
Setup/maintenance of Splunk forwarding clients on OIT-CI and NSM systems
Setup/maintenance of Telegraf/InfluxDB/Grafana (TIG) stack on OIT-CI and NSM systems
Setup/maintenance of RAID arrays on servers (via LSI, HP, or Dell RAID controllers)
Document processes, procedures, system configurations and specifications.
Perform installation, configuration, updating, and performance monitoring and troubleshooting of servers.
Server firmware: installation/updating/troubleshooting
Administer various Linux or UNIX based applications.
Develop, document and maintain various related shell scripts.
Conduct research into technology problems and have direct interaction with technology vendors.
Create, modify, and update technical and process documentation. Share documentation with the CI Team.
Assist researchers with computational issue.
Assist entire OIT CI team as a critical team member.
Other duties as assigned.
Work with vendors to build out and price research computing hardware
Determine scientific workloads for PI hardware purchases
Data center architecture and setup: power/cooling/space requirements
CPU architecture, specifically setting up memory population in servers to get optimal bandwidth
Networking: Ethernet, Infiniband, Omnipath cabling/setup/configuration
Networking: Dell Ethernet Switch setup/configuration
Storage: creation of software-defined file systems (Ceph, GPFS, BeeGFS, MooseFS)
Virtualization: Installation of Proxmox VE, setup of cluster, administration of virtual machines
Primary administration of HPC clusters: setup, maintenance, user assistance, user setup
Secondary administration of 1.2PB IBM GPFS cluster on primary campus cluster

Extended Duties and Responsibilities (CVL, CI, and NSM)

Configuring/quoting/purchasing of researcher workstations
Working with RedHat licenses to install/update RHEL workstations
Maintain workstation hardware (drives, CPU, memory, etc.)
Software installation on NSM Linux workstations
Mac first-time setup: wipe/install/user addition
Mac setup: MS Active Directory binding
Mac administration: backup of user data on mobile workstations
Windows setup: OS imaging, active directory addition/binding, SCCM setup, software installation
(when applicable) assist users with printing of posters via HP DesignJet printer
Working with central IT department to spec out laptops for CI team