MSGradJobs Logo

« Back to Search Results
EmployerOracle
Location Jackson, MS USA
PostedJune 23, 2026

Job Details

Director, Data Center Reliability Engineering
**Job Description**

**Key Responsibilities**

+ Lead reliability engineering and analytics teams across multiple sites.

+ Standardize and enforce FMEA, RCA, and continuous improvement methodologies.

+ Oversee deployment of monitoring, analytics, and automation tools supporting reliability programs.

+ Define, track, and report reliability KPIs to executive and global operations leadership.

+ Ensure corrective actions are implemented, verified, and sustained.

+ Develop engineers and analysts in disciplined, data-driven problem solving.

**Ideal Candidate Profile**

+ Senior experience in reliability engineering, maintenance engineering, or uptime-critical environments.

+ Strong background in analytics, RCA rigor, and reliability frameworks.

**Skills and Competencies**

+ Strong technical leadership and stakeholder influence.

+ Comfortable translating analysis into executive-level decisions.

**Why Oracle Cloud Infrastructure?**

+ Global impact at scale: Contribute directly to how mission-critical OCI data centers operate across regions and continents, influencing infrastructure reliability, security, sustainability, and long-term capacity growth.

+ Technically rigorous environment: Work alongside experienced engineers, automation specialists, and compliance teams in a rapidly scaling hyperscale cloud infrastructure, where disciplined execution and technical depth matter.

+ Culture built on operational excellence: Join an organization that values safety, process rigor, clear accountability, and continuous improvement as foundational to protecting uptime and customer trust.

+ Long-term career development: Benefit from internal mobility, role-based technical training, and development opportunities designed for professionals building long-term careers in cloud infrastructure and facilities operations.

**Responsibilities**

**Key Responsibilities**

**Data Center Site Portfolio Management:**

-Data Center country leader and typically has responsibility for one or more sites & teams in a region.

**Performance Monitoring and Analysis:**

-Sets strategic direction for data center operations performance monitoring, collaborates with executive leadership.

-Defines strategic direction for network performance evaluation, collaborates with executive leadership.

-Establishes strategic direction for analysis of physical, power, and cooling capacity, in collaboration with executive leadership.

-Defines the strategic direction for continuous improvement, collaborates with executive leadership to achieve KPIs and objectives.

**Issue Management and Automation:**

-Oversees all aspects of support for escalated complex technical issues across multiple data centers.

-Defines and enforces strategies for issue triage, leveraging advanced automation, scheduling, and monitoring tools.

-Identifies, documents, and standardizes issues, processes, and solutions, ensuring the data center knowledge base is comprehensive, accurate, and strategically aligned with department goals.

-Oversees the implementation of strategy for incident or crisis management protocols in alignment with business continuity plans.

-Establishes best practices for conducting Root Cause Analysis (RCA) following crises or incidents, and updates documentation to capture process improvements.

**Data Center Expansion Support:**

-Sets the strategic direction and oversees the entire process of new region builds and expansion activities, both onsite and remotely.

-Acts as the primary liaison with senior project teams and data center engineering leadership, organizing resources and ensuring strategic timelines and long-term capacity needs are effectively managed for all expansion projects and site builds.

-Collaborates at the highest level with project teams to ensure the delivery of world-class standards across all expansion projects and site builds.

**Installation and Maintenance:**

-Directs all aspects of installations, repairs, inventory management, and logistics tasks across several data centers.

-Establishes standards and best practices for component replacements and upgrades.

-Advises on and manages large-scale purchases or upgrades for data centers.

-Ensures implementation of proactive maintenance and lifecycle management strategies of the Data Center facilities with regard to efficiency and stability (e.g. containment, air flow & pressure, power trains).

**Core Responsibilities**

**Planning & Execution:**

-Oversees and guides multiple teams on managing complex projects or initiatives, monitoring timelines, deliverables, and budgets when applicable to ensure strategic objectives are met. Serves as a role model for appropriately delegating work, setting priorities, and ensuring alignment with business needs. Coaches others on adjusting resources or project timelines in anticipation of business changes.

**Collaboration & Partnership:**

-Role models leading cross-functional collaborative efforts to ensure alignment of expectations and strategic objectives. Empowers team to build and maintain partnerships with business leaders, stakeholders, and/or customers to address barriers and contribute to organizational success. Drives transparency and inclusivity by modeling actively seeking, listening to, and leveraging diverse perspectives.

**Problem Solving:**

-Shares problem-solving strategies across teams, providing oversight on complex operational and/or technical issues, as needed. Coaches teams on analyzing highly complex data and/or information to identify solutions to ambiguous issues, and provides direction on identifying root causes to prevent recurrence of issues.

**Continuous Learning:**

-Pursues strategic learning opportunities to maintain expertise and apply best practices at the organizational level. Creates opportunities for team members and leaders to build their expertise in new areas, coaching them to build innovative skills. Identifies skill gap trends across the organization, and upholds a culture that places significant emphasis on sharing knowledge and pursuing learning opportunities that advance the organization. Evaluates efficiency of learning strategies and recommends adjustments as needed.

**Continuous Improvement:**

-Empowers team to own the development and implementation of ideas that increase the efficiency and effectiveness of processes, protocols, and workflows across the department. Coaches teams to gain buy-in for ideas and to seek feedback on approaches and methods for continued improvement. Prioritizes and reviews the roadmap of improvement initiatives to ensure alignment with strategic direction and maximize return on investments.

**Performance and Development:**

-Serves as a role model for driving performance across teams through tailored feedback and coaching in alignment with performance management processes, guidelines, and expectations. Drives consistency in the application of talent development procedures and socializes performance expectations across the organization. Ensures that individual development goals are aligned with organizational strategic initiatives. Collaborates with HR to implement talent strategy through hiring and promotion processes.

**Minimum Job Qualifications**

**Education and/or Experience:**

12 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout

OR

Bachelor's Degree in Computer Science, Engineering, Information Systems, Information Technology, or related field AND 8 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout

OR

Master's Degree in Computer Science, Engineering, Information Systems, Information Technology, or related field AND 6 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout

OR

Doctorate in Computer Science, Engineering, Information Systems, Information Technology, or related field AND 4 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout.

Job Skills:

Same as prior level plus;

Technical Leadership:Demonstrated ability to provide technical leadership and mentoring on complex projects.

Compliance: Demonstrated knowledge of and adherence to regulatory, legal, and organizational compliance requirements.

Quality Assurance: Demonstrated ability to plan and carry out quality assurance activities for high-standard deliverables.

Vendor Management: Demonstrated ability to select, onboard, and manage vendor relationships to ensure optimal performance.

Data Center Facilities Management: Demonstrated experience managing data center facilities, including systems, operations, and resource allocation.

**Preferred Job Qualifications**

**Education and/or Experience:**

13 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout

OR

Bachelor's Degree in Computer Science, Engineering, Information Systems, Information Technology, or related field AND 9 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout

OR

Master's Degree in Computer Science, Engineering, Information Systems, Information Technology, or related field AND 7 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout

OR

Doctorate in Computer Science, Engineering, Information Systems, Information Technology, or related field AND 5 years of experience in IT infrastructure support, server administration, or data center operations, design, and layout.

**People Leadership / Management Experience:**

5 years of experience in a leadership role with direct reports.

**Budget Experience:**

3 years of experience working with operating budgets and/or project financials.

**Additional Experience:**

Data Center or Cloud Industry Certifications.

**About Us**

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling 1-888-404-2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Job #NLX293412724