Lead SRE

Other Jobs To Apply

No other job posts for this day.

Job Title: Lead Integration & Observability Specialist (SRE Lead)

Location: McKiney, TX (Hybrid role)

Client: NTT DATA / Globe Life Insurance

 

Job Summary:

We are seeking a Lead Integration & Observability Specialist to design, implement, and lead enterprise observability and reliability solutions, while supporting cloud-based integration platforms on AWS/Azure. The role focuses on monitoring, automation, and operational readiness of applications, APIs, data pipelines, and messaging systems.

This is a hands-on technical leadership role with mentoring and solution ownership responsibilities.

 


Key Responsibilities

  • Lead the implementation of enterprise observability for applications, APIs, services, batch jobs, and data pipelines.
  • Design and standardize monitoring, alerting, logging, metrics, and health checks across distributed systems.
  • Integrate observability platforms with incident management and automation tools to support proactive issue detection and remediation.
  • Support reliability and availability of integration platforms built on AWS/Azure
  • Perform advanced troubleshooting using logs, metrics, and traces to resolve production issues.
  • Define operational readiness standards and non-functional requirements.
  • Mentor engineers on observability best practices and platform usage.
  • Collaborate with product, support, and operations teams to improve service stability and delivery.

 

Required Skills (Mandatory)

  • 15+ years of overall IT experience
  • 7+ years of relevant experience in Observability / Monitoring / Reliability Engineering
  • Strong hands-on experience with enterprise observability tools, such as:
    • Instana, Dynatrace, AppDynamics, Prometheus, Grafana
  • Expertise in:
    • Monitoring and alerting design
    • Log management and analysis
    • Metrics and distributed tracing
    • Health checks and SLO/SLI concepts
  • Experience monitoring AWS/Azure workloads
  • Strong troubleshooting and incident analysis skills
  • Experience defining operational and non-functional requirements
  • Technical leadership and mentoring experience
  • Automation and ITSM integration (ServiceNow workflows, incident automation)
  • CI/CD and release management exposure
  • Cloud integration and messaging exposure
  • Automation and ITSM integration (ServiceNow workflows, incident automation)
Back to blog