Program Manager
Requisition ID: 44110
Organization
Located on the traditional, ancestral and unceded lands of the xʷməθkʷəy̓əm (Musqueam), Sḵwx̱wú7mesh (Squamish), and səlilwətaɬ (Tsleil-Waututh) Peoples, Vancouver has a commitment to becoming a City of Reconciliation. Vancouver consistently ranks as one of the world’s most liveable and environmentally sustainable cities. Named among Canada's Top 100 Employers, BC's Top Employers, and Canada's Greenest Employers, the City of Vancouver seeks colleagues who can help shape and embody our core commitments to sustainability, reconciliation, equity and outstanding quality of life for all residents.
Consider joining our committed team of staff and being part of an innovative, inclusive and engaging workplace. Working at the City of Vancouver and within the public service can be a rewarding career where you play a key role in ensuring impartial and equitable access to services, upholding ethical governance, and addressing the needs of citizens with integrity and dedication.
Main Purpose and Function
The Manager, Incident Response plays a pivotal role in safeguarding the City's technology infrastructure by leading the development, testing and management of comprehensive incident recovery processes. This strategic position ensures the resilience of Technology Services (TS) through disaster recovery planning, rigorous failover capability testing, as well as managing the interdepartmental Change Advisory Board that reviews all planned modifications to TS production environments. Additionally, in the event of data centre, network, or application failures this position is relied upon to manage and coordinate TS response.
Specific Duties/Responsibilities
Business Continuity Planning
This position is responsible for developing and maintaining the TS department’s Business Continuity Plan (BCP). This work entails:
- Ensures currency, relevance, and actionability of TS BCP
- Ensures TS BCP addresses response activities for multiple scenarios of varying scale and duration, e.g. major systems or data centre failures, network cuts, earthquakes, floods, fire, pandemic, major public disturbances, extreme weather, staffing and facilities impacts
- Coordinates regular review by managers and staff for TS BCP procedures applicable to them, updating as necessary
- Liaises with Risk Management and ensures TS’ business continuity planning aligns with and complements other department’s BCP processes
- Incorporates post-incident reviews and lessons learned into TS BCP
- Establishes distinct but complementary relationship of TS’s BCP with TS’s Cybersecurity Incident Response Plan (CSIRP)
- In coordination with Applications Services, Database Solutions, Digital Services and Enterprise Architecture, establishes definition of critical applications and services; ensures BCP processes prioritize their recovery
- Schedules and manages periodic tabletop exercises to ensure TS has organizational familiarity with BCP procedures (readiness)
Change Advisory Board (CAB)
Managing TS’s existing cross functional CAB group, which includes representation from across TS and reviews all proposed changes to production system prior to their implementation.
- Schedules and manages (twice-weekly) CAB meetings
- Maintains and updates CAB Terms of Reference and process documentation
- Develops and conducts regular orientation sessions to ensure TS staff are familiar with CAB processes
- Reviews and updates onboarding materials to ensure new staff are made aware of CAB processes
- Leverages TS’s Enterprise Service Management platform to manage the execution of CAB, using CAB Manager administrative privileged role
Failover Tests and Exercises
The Manager Incident Response develops and manages failover testing processes and exercises to ensure TS resiliency in the event of failures.
- In coordination with Applications and Database Solutions teams, develops a reusable and standardized failover testing process for critical application systems and the services they enable
- Schedules and manages regular (annual) failover tests of critical applications
- Records and documents lessons learned from failover tests and actual failover incidents, plans and tracks resulting improvements with appropriate teams
- In coordination with IO teams, Database Solutions and Applications teams, develops a failover test of the City’s primary and secondary data centres (PDC and SDC)
- Schedules and manages an annual test of the PDC and SDC failover exercise
Incident Response Management
In the event of technology related failures and incidents, this position will be required to coordinate and manage the work of multiple teams within TS and communicate timely and relevant updates with the TS Directors, Chief Technology Officer (CTO), and City Leadership Team (CLT), as well as the City’s Civic Engagement and Communications department (CEC).
- Identifies and engages teams and resources needed based on type of failure, effecting an ad-hoc collaboration channel using best available technology
- In coordination with relevant team managers and directors, establishes, records and assigns mitigation and recovery activities and anticipated timelines – leveraging the TS BCP where pertinent - tracks the execution of the work
- Ensures compliance with legal and regulatory standards during incidents
- Provides timely, concise, simply stated updates to the following bodies as appropriate:
- All TS staff
- All City staff
- TS Leadership including CTO
- City Leadership/Incident Management Team (IMT)
- Should the incident necessitate the activation of the City’s IMT (e.g. privacy or PCI breach, public services impacted), this position will be involved in the development of communications with Civic Engagement and Communication (for use in communicating to all City staff or the public). Also as appropriate with the IMT activated, be involved with communication to and coordination with: Privacy Office, Legal Services, Police and Fire
- Manages the work, incident escalation, communication and ensures required notifications are completed within the defined timeframes, delegating as needed, until the City has recovered
- Upon conclusion of the recovery from (major) incidents, authors a Major Incident Report (MIR), reviewing for accuracy with involved resources, and sharing with TS leadership team, CLT and Risk Management as appropriate
- Ensures the MIR contains a Lessons Learned section, and develop an implementation plan that defines work to be done to eliminate future repetition of avoidable failures; assign and track that work with appropriate teams
Other Duties and Responsibilities
The above four sections define the primary responsibilities of this position, but the duties and responsibilities also include:
- Reviews all TS groups systems and infrastructure, and advocates work that improves City resiliency; evaluates and recommends improvements in incident response strategies and tools
- Occupies the role of major incident manager within the City’s enterprise service management (ESM) platform, ensures the platform’s major incident management module is prepared to be used to its full capabilities, and TS teams and individuals understand how to use it and understand their responsibilities when needed
- In coordination with TS leadership and Risk Management, develops and documents a Disaster Recovery Plan (DRP) that leverages the City’s geo-remote data centre, to be used in the event of major disaster (e.g. major earthquake that compromises both local data centres)
- Oversees and monitors daily data backups to the City’s geo-remote data centre
- Ensures the DRP aligns with and complements the TS BCP, including an annual tabletop exercise of the DRP
- In coordination with Enterprise Architecture and TS leadership, ensures long term technology strategy takes into account resiliency, prioritizing the ability for the City to deliver critical services in the event of failures or disasters
- Management of the Incident Response team, including hiring, assigning work, setting goals and standards, approving absences, and planning training and other work-related activities
- Other duties and responsibilities as assigned
Minimum Qualification Requirements
Education and Experience:
- Ddegree in information technology or a related discipline supplemented by technical courses related to the work or an equivalent combination of education, training and experience.
- Minimum 6 years working in a technical position within Information Technology and computer support services.
- Considerable related experience supervising technical support staff, working with collective agreements and dealing effectively with labour relations issues.
- Training and experience with IT best practice frameworks (Enterprise Architecture, ITIL, MCSE, etc.) is an asset.
- Project Management training, experience, or certification is an asset.
Knowledge, Skills and Abilities:
- Thorough understanding of business continuity planning and disaster recovery planning
- Proven skills in managing cross-functional teams, where some or all team members have no reporting relationship to you; ability to establish and maintain effective working relationships with a variety of internal and external contacts
- Knowledge in how to develop and conduct failover tests and exercises in a technology environment
- Thorough knowledge of ITIL Change Management processes, and ITIL risk management practices
- Strong incident response management skills
- Effective communication and documentation skills: excellent verbal, written, and presentation skills for business-focused and project documents, including the ability to prepare and maintain a variety of records and documentation
- Experience incorporating post-incident reviews and lessons learned to improve processes
- Considerable knowledge of information technology components, processes and developments, including local and wide area networks, cloud and cloud-hybrid architectures, data centre server and storage, cooling and power infrastructure, database and applications architecture
- Ability to determine priorities and structure work activities to achieve goals in a timely manner; ability to work effectively in a multi-task dynamic environment punctuated by frequent interruptions
- Ability to develop and communicate strategic direction in the technology resiliency space, providing leadership to operational and project resources, as well as providing advice, guidance, and recommendations to the Technology Services leadership
- Ability to exercise independent judgment in emergencies and non-routine matters which meets City priorities and obligations
- Ability to establish and implement service standards, set work priorities, and schedule and lead assigned projects
- Demonstrated ability to work collaboratively within a team environment and promote a supportive, respectful and safe work environment in an economically and culturally diverse workplace and community
- Proven skills in supervising staff, including providing feedback, dealing with conflict, corrective discipline, managing performance, and coaching
- Effective planning, organization and work management skills
- Demonstrated ability to keep up with evolving technology use and trends
- Understanding of City human resources practices and principles, collective agreements and BC legislation related to safety and labour standards and practices is an asset
- Knowledge of industry standard technology and security frameworks would be considered an asset (NIST 2.0, ISO, PCI DSS)
Where operationally appropriate and subject to change, the City of Vancouver has a Flexible Work Program. This program allows staff to work a hybrid work week from locations that are a daily commutable distance from their work at a City worksite. At this time this position is eligible to be part of the Flexible Work Program.
Business Unit/Department: IT, Digital Strategy & 311 (1070)
Affiliation: Exempt
Employment Type: Regular Full Time
Position Start Date: 01/01/2025
Salary Information: Pay Grade RNG-101: $118946 to $148688 per annum
Application Close: August 6, 2025
At the City of Vancouver, we are committed to recruiting a diverse workforce that represents the community we so proudly serve. Indigenous peoples, people of colour, 2SLGBTQ+ persons including all genders and persons with disabilities are encouraged to apply. Accommodations will be provided upon request during the selection process. Learn more about our commitment to diversity and inclusion.
Before you click Apply now
Once you start your application you can save your work and leave the applications page, however please remember to submit your profile to the specific job requisition before the posting closing date.
In addition to uploading your cover letter and resume, part of the application process may include answering application questions related to the preferred requirements of the role which may take approx. 5-10 minutes. Cover letters should express interest and highlight additional information relevant to the position and resumes should include a summary of skills and experience related to the position.
Job Segment:
Information Technology, IT Architecture, Program Manager, Data Center, Database, Technology, Management