Description

Position Description:

At CGI, we’re dedicated to creating innovative and tailored solutions that drive success for our clients. We are looking for an experienced Site Reliability Engineer (SRE) to play a critical role in designing and implementing ITSM workflows, incident management processes, and system reliability frameworks. This position requires expertise in creating scalable, secure, and efficient solutions that align with operational goals and service-level objectives. You will collaborate with cross-functional teams, drive monitoring and observability strategies, and ensure seamless communication and escalation paths for incident resolution. By leveraging automation tools, cloud platforms, and industry best practices, you will help build and maintain systems that deliver exceptional reliability and performance.

Who You Are:

You are a skilled Site Reliability Engineer with a proven track record of over four years in designing ITSM workflows and ensuring system reliability. With strong expertise in observability tools like New Relic and Grafana, cloud platforms such as AWS, Azure, or GCP, and automation technologies like Ansible and Kubernetes, you bring a holistic approach to optimizing performance and scalability. Your problem-solving mindset and proficiency in scripting languages like Python empower you to automate and innovate effectively. Collaborative, detail-oriented, and forward-thinking, you thrive in dynamic environments, ensuring operational excellence and continuous improvement in every project you undertake.

Your future duties and responsibilities:

In this role, you will:

• Design ITSM Incident Management Workflows to ensure alignment with operational goals and service-level objectives.

• Define ITSM Problem Management Processes to improve long-term system reliability and reduce recurring incidents.

• Establish Incident Response and Communication Plans with clear escalation paths and communication protocols for efficient ticket assignment and resolution.

• Design Capacity Planning Frameworks to ensure systems can handle growth and prevent performance degradation.

• Create Metrics Framework for System Reliability to track system performance, incident response, and overall service reliability.

• Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) based on business and user expectations.

• Develop Monitoring Strategy for Critical Systems to align monitoring efforts with business needs.

• Establish ITSM Change Management Workflows to minimize operational risk and downtime during releases.

• Facilitate Cross-Team Training on ITSM & SRE Processes to promote operational consistency and reliability.

• Implement Light Observability Practices to ensure visibility into system health, performance, and incident trends.

• Collaborate closely with project managers, solution architects, and team members, ensuring SRE projects are delivered with precision, on time, and within budget.

• Work directly with clients, designing, building, and maintaining systems that aren’t just scalable and secure, but also reliably meet our customers’ needs.

• Contribute to the development of vital documentation, policies, and procedures within the realm of site reliability engineering.

• Stay abreast of emerging technologies and industry trends to recommend innovative tools and processes.

Required qualifications to be successful in this role:

Qualifications

• Minimum 4 years of experience in a Site Reliability Engineer role or similar, with a focus on designing ITSM workflows, incident management, and metrics frameworks.

• In-depth knowledge of New Relic, ELK, Dynatrace, Prometheus, Grafana for observability and monitoring system health.

• Proficiency in cloud platforms such as GCP, AWS, Azure (preferred) for scalable infrastructure and deployment automation.

• Solid experience with Kubernetes for container orchestration and ensuring system reliability.

• Strong programming and scripting skills in Python, BASH for automation and custom tool development.

• Hands-on experience with automation tools like Ansible, Chef, Puppet to streamline operations.

• In-depth knowledge of ITSM tools like ServiceNow for managing incident, change, and problem management workflows.

• Experience with incident management tools such as PagerDuty for escalation and incident response.

• Familiarity with Agile methodologies for driving continuous improvements and iterative operations.

• Exceptional problem-solving and troubleshooting abilities in resolving complex production issues.

• Strong communication and collaborative skills to work effectively across teams.

Use of the term ‘engineering’ in this job posting refers to the technical sense related to Information Technology (IT) and does not imply that the individual practices engineering or possesses the requisite license as prescribed by the applicable provincial or territorial engineering regulator. We are seeking individuals with expertise in IT engineering-related functions, but licensure from an engineering regulator is not a prerequisite for this position. Engineering is a regulated profession in Canada which is restricted in terms of use of titles and designation.

Skills:

  • Azure DevOps

What you can expect from us:

Together, as owners, let’s turn meaningful insights into action.

Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you’ll reach your full potential because…

You are invited to be an owner from day 1 as we work together to bring our Dream to life. That’s why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company’s strategy and direction.

Your work creates value. You’ll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise.

You’ll shape your career by joining a company built to grow and last. You’ll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons.

At CGI, we recognize the richness that diversity brings. We strive to create a work culture where all belong and collaborate with clients in building more inclusive communities. As an equal-opportunity employer, we want to empower all our members to succeed and grow. If you require an accommodation at any point during the recruitment process, please let us know. We will be happy to assist.

Come join our team—one of the largest IT and business consulting services firms in the world.

Share on LinkedInShare on FacebookTweet about this on Twitter