Job Role Summary
The HPC Systems Engineer role has the overall responsibility to work within a team to provide a performant, reliable, and secure high performance computing (HPC) environment. The HPC Systems Engineer will be involved in various aspects of designing and engineering our HPC system as well as be responsible for managing day-to-day operations and maintenance activities including, but not limited to the following: general troubleshooting of any issues that may arise, monitoring overall system health, performing system maintenance tasks, and evaluating new hardware/system software.
Primary Job Functions
Establish strategies for overall support of the system
Evaluate new hardware and software and understand potential benefits/impacts it can have in the environment
Perform hardware maintenance
Perform software installations and upgrades; inclusive of operating system
Monitor overall system performance and health
Be available periodically for on-call support and weekend maintenance activities
Provide support for the management of data in the environment
Work with users to resolve problems and ensure they are able to effectively utilize the system
Interact with both business customers and technical teams that are globally distributed and within varied time zones
Engaging with vendors for problem resolution of existing infrastructure and discussion of roadmaps and new technologies for evaluations
Foster a supportive work environment and maintains open, productive interactions among team and across organizations
Build and maintain cross-organizational contacts to facilitate execution of work
Job Requirements
B. S. in Computer Science or related degree area (e.g. Computer Engineering, Information Systems) or equivalent skills work experience
Excellent technical, analytical, and communication skills
A minimum of 3 years of hands-on Linux experience (e.g. RHEL, CentOS) and production infrastructure support (e.g. networking, storage, monitoring, compute)
Experience in system administration and technical support (e.g. installation, configuration, maintenance, upgrade, retirement, problem resolution)
Experience in HPC technologies such as parallel/distributed files systems (e.g. Lustre, GPFS), high speed interconnect fabrics (e.g. Infiniband, Omni-Path), and HPC batch scheduling software suites (e.g. PBSPro, SLURM)
Proficiency in technical writing and documentation of solutions
Works well in a team environment
Self-motivated
Preferred Knowledge/Skills/Abilities
Strong IT skills in infrastructure and applications
Experience with supporting large scale production environments
Experience in implementing changes and security controls in a global framework
Understanding of data center operations fundamentals in networking, cooling, and power
Knowledge and experience with installing/compiling vendor and open source software
Knowledge and experience with application/infrastructure deployment and support in one or more of the major cloud environments
Irving, TX
Exxon Mobil Corporation explores for and produces crude oil and natural gas in the United States, Canada/Other Americas, Europe, Africa, Asia, and Australia/Oceania. It operates through Upstream, Downstream, and Chemical segments. The company is also involved in the manufacture, trade, transport, and sale of crude oil, petroleum products, and other specialty products; and manufactures and markets petrochemicals, including olefins, polyolefins, aromatics, and various other petrochemicals. As of December 31, 2018, it had approximately 24,696 net operated wells with proved reserves of 24.3 billion oil-equivalent barrels. The company was founded in 1870 and is headquartered in Irving, Texas.