Site Reliability Engineer

  • Toronto, Ontario
  • Full Time
Title: Site Reliability Engineer
Location: Toronto, Ontario
Duration: 12 months

Pay range: C49 INC

Years of Experience: 6-8

We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform services. The ideal candidate will bring strong expertise in SRE practices, observability, infrastructure automation, and developer platform enablement, with exposure to modern technologies including policy-as-code and emerging GenAI-driven systems.

Key Responsibilities
Implement and manage SRE practices including:
Incident management, root cause analysis, and postmortems
Reliability engineering and performance optimization
Tracking and improving DORA metrics

Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Build and manage monitoring, logging, and distributed tracing frameworks
Ensure platform reliability through proactive alerting, observability, and automation
Automate infrastructure and governance using:
Terraform (Infrastructure as Code)
Policy-as-Code tools (OPA/Rego, Sentinel)

Enhance developer experience and productivity by:
Designing self-service platform capabilities
Managing service catalogs and platform standards
Building reusable templates and golden paths
Work with tools like Backstage to enable internal developer platforms
Collaborate with engineering teams to improve system stability, deployment reliability, and operational efficiency
Support integration and reliability considerations for GenAI-based systems (RAG, prompt workflows, model evaluation)

Required Skills
Strong experience in SRE practices and reliability engineering

Hands-on expertise with:
Monitoring/logging platforms and distributed tracing
SLO/SLI frameworks and observability design
Experience in incident management and performance engineering
Strong understanding of DORA metrics and operational excellence

Proficiency in:
Terraform (Infrastructure as Code)
Policy as Code (OPA/Rego, Sentinel)

Experience with:
Developer platform tools (Backstage, service catalogs)
Golden paths and platform standardization

Nice to Have
Exposure to GenAI platforms, RAG, and prompt engineering concepts
Experience in developer productivity measurement and platform engineering initiatives

Tools & Methodologies
Experience with Agile methodologies (Jira, Confluence)
Familiarity with DevOps and platform engineering practices

Soft Skills
Strong problem-solving and analytical skills
Ability to work in high-pressure production environments
Excellent communication and cross-team collaboration
Comments for Suppliers:
Job ID: 521049599
Originally Posted on: 5/13/2026

Want to find more Engineering opportunities?

Check out the 141,442 verified Engineering jobs on iHireEngineering