Systems Architect

Colorado State University

Fort Collins, CO

ID: 7234429
Posted: April 3, 2024
Application Deadline: Open Until Filled

Job Description

Description of Work Unit
Colorado State University (CSU) is a Carnegie Research I institution located in Fort Collins, Colorado, 60 miles north of Denver and situated at the foothills of the Rocky Mountains. Enhanced by relationships with CSU, Fort Collins has become a thriving center for high-tech and in 2018 was rated one of Milken’s top five best performing cities.

The computer science department has over 900 undergraduate majors and 150 graduate students in our masters and Ph.D. programs. The department has 21 tenure-track faculty with strong research programs in artificial intelligence, big data, bioinformatics, computer vision, networks, parallel and distributed computing, algorithms, security, and software engineering. More information is at https://compsci.colostate.edu.

The CSU computer science department systems support group is a collaborative, energetic, and hard-working team, with a reputation for providing strong systems and user support. The group will be composed of three full-time Systems Administrators and one half-time graduate student systems assistant.
Position Summary
This position designs, builds, and develops computing systems, distributed computing systems, clusters, services, and system architectures to support the research and teaching mission of a leading university academic computer science department. The position seeks and evaluates state of the art technologies that advance the mission of the department, adopting and implementing relevant solutions. The systems and architectures support high performance computing, resource scheduling, virtualization, containerization, and DevOps, along with a variety of highly specialized requirements related to computer science research.
Required Job Qualifications
A Master’s degree in Computer Science.
At least three years of experience supporting high performance computing.
At least three years of experience developing computing infrastructure and services to support post-secondary computer science research and teaching.
The experience should demonstrate competence with the following:
Building and managing VMware/Vcenter clusters.
Cluster management technologies.
Container management tools.
Designing, building and administering HPC clusters and HPC storage technologies such as SAS, NVMe, and/or Fiber Channel.
CUDA support.
Familiarity with a large set of programming languages such as C++, Java, JavaScript, Bash, Csh, Python, Perl, etc.
Familiarity with a large set of operating systems such as Red Hat, Ubuntu, Windows, and MacOS.
An ability to integrate and configure the full spectrum of computing hardware componentry.
Preferred Job Qualifications
Experience with the following:
GPU and CUDA programming.
Shared and distributed memory parallelism using OpenMP and MPI.
Industry standard automation tools such as Ansible or Terraform.
Red Hat, VMware, or Kubernetes certifications.
Diversity Statement
Reflecting departmental and institutional values, candidates are expected to have the ability to advance the Department’s commitment to diversity and inclusion.
Essential Duties
Job Duty Category Consulting and Mentoring
Duty/Responsibility
Consult with CS researchers and other Department stakeholders to understand requirements, specify system hardware and software components and design and build systems or architectures to meet specialized needs.
Consult with researchers to evaluate the suitability of various platforms for the development of systems or applications they plan to produce.
Assist CS researchers and other systems users with the development, enhancement, and trouble-shooting of complex or specialized computing applications and software systems.
Advise systems users regarding methods, tools, and strategies for optimizing application performance on particular platforms.
Consult with and advise system users to solve a variety of complex problems, including problems with application or system performance, portability, and security.
Percentage Of Time 25
Job Duty Category High Performance Computing
Duty/Responsibility
Design, deploy, and maintain efficient High Performance Computing (HPC) clusters and distributed system architectures.
Qualify and integrate HPC components including CPUs, GPUs, RAM, Storage, and interconnection devices such as busses, and networks.
Implement and configure efficient and fair scheduling mechanisms for HPC resources including CPUs, GPUs, RAM, and storage, utilizing appropriate tools such as Slurm, TorquePBS, etc. Develop optimal scheduling heuristics per site requirements.
Evaluate performance of alternative parallelization methods for particular applications.
Percentage Of Time 25
Job Duty Category Virtualization
Duty/Responsibility
Design, build, maintain, and develop a scalable, interoperable, and secure VMware/vCenter virtualization architecture for the Department.
Install and configure ESXi servers and physical networking.
Develop storage architecture using protocols and tools such as NFS, iSCSI, VSAN, etc.
Maintain VMware infrastructure health and stability. Identify system performance metrics. Monitor and optimize server, network, and storage performance. Monitor and verify security.
Design and implement a comprehensive disaster recovery system for virtualization infrastructure.
Provision, monitor, and troubleshoot virtual machines (VMs), virtual networking (both standard and distributed vSwitches), and guest operating systems.
Perform VM migrations across different virtualization platforms (VMware, VirtualBox, etc).
Percentage Of Time 20
Job Duty Category Containers and Orchestration
Duty/Responsibility
Design and implement secure, highly available portable and scalable Kubernetes clusters.
Architect implement, and maintain end-to-end Kubernetes/OpenShift infrastructure, including installation, patching, migration, application on-boarding, SSL certificates deployment, ingress controllers, load balancers, storage solutions, container registry, CI/CD tools, identity management tools, etc.
Identify container performance metrics. Configure monitoring tools, such as Metrics server, Prometheus, Grafana, or similar tools, on Kubernetes/OpenShift clusters, and employ these to optimize computation, networking, data storage and load balancing resources per application requirements.
Develop persistent container storage architecture using tools such as NFS, Gluster FS, etc.
Build and administer public and private image repositories using tools such as Dockerhub, Quay, etc.
Develop automation for new Kubernetes/OpenShift site deployments and for DevOps Lifecycles.
Percentage Of Time 15
Job Duty Category Systems Administration and Automation
Duty/Responsibility
Design, deploy and maintain configuration management software tools to automate administration in a complex, multi-platform environment.
Develop and deploy tools and strategies to automate patch and software deployments, in both physical and virtual environments.
Provision and maintain a variety of database management systems including MariaDB, PostgreSQL, and MongoDB.
Provision, maintain, and secure a variety of web servers and services.
Automate, provision, and perform a variety of systems administration tasks such as software installation, patching, upgrades, security maintenance, backups, disaster recovery, etc.
Percentage Of Time 15
Application Details
Special Instructions to Applicants
Applicants must submit:
A cover letter addressing how professional experiences align with required and preferred qualifications of the position.
A current resume.
The names and contact information of three (3) professional references. References of candidates advancing to the finalist stage will be contacted to upload a letter of recommendation.*

* References will not be contacted without prior notification to candidates.

CSU is committed to full inclusion of qualified individuals. If you are needing assistance or accommodations with the search process, please reach out to the listed search contact.