Incident Response & Escalation Protocols
Overview
This document provides IT managers with comprehensive protocols for incident response, escalation procedures, and crisis management. Effective incident management minimizes business disruption, protects company assets, and ensures appropriate stakeholder communication during critical events.
Incident Classification
Severity Levels
| Severity | Definition | Examples | Response Time | Escalation |
|---|---|---|---|---|
| P1 - Critical | Complete service outage affecting all users or critical business functions | Email down company-wide, ERP system unavailable, data breach, ransomware | 15 minutes | Immediate to IT Director and executives |
| P2 - High | Significant service degradation or outage affecting multiple users or departments | Critical application slow, VPN intermittent, department file share unavailable | 30 minutes | IT Manager within 1 hour |
| P3 - Medium | Service disruption affecting small group or non-critical system | Printer not working, single user cannot access shared drive, minor software bug | 2 hours | If unresolved after 4 hours |
| P4 - Low | Minor issue or inconvenience with minimal impact | Password reset, software installation request, cosmetic UI issue | 8 hours | Standard queue management |
Impact Assessment Criteria
Consider these factors when classifying incidents:
- Number of Users Affected: 1 user, department, multiple departments, entire company
- Business Impact: Revenue impact, customer-facing, regulatory compliance, reputation
- Duration: Short-term vs. extended outage
- Workaround Availability: Can users continue working through alternate means?
- Security Implications: Data exposure, unauthorized access, compliance violations
- Time Sensitivity: Is there a critical deadline or time-bound business need?
Incident Response Process
Phase 1: Detection and Logging
- Incident Reported: Via service desk, monitoring alerts, or direct communication
- Initial Assessment: Gather basic information about the issue
- Create Ticket: Log in IT service management system with all details
- Assign Severity: Classify based on impact and urgency
- Acknowledge Receipt: Confirm to reporter that incident is logged and being addressed
Phase 2: Initial Response
- Route to Appropriate Team: Assign to correct technical team or specialist
- Acknowledge SLA: Ensure response within defined SLA for severity level
- Begin Investigation: Technical team starts troubleshooting
- Document Actions: Record all troubleshooting steps and findings
- Status Updates: Provide updates per communication plan
Phase 3: Escalation (if needed)
Escalate when:
- Incident cannot be resolved within SLA timeframe
- Technical expertise beyond current team is required
- Business impact is greater than initially assessed
- Vendor involvement is needed
- Multiple teams must coordinate response
- Management awareness and decisions are required
Phase 4: Resolution and Recovery
- Implement Fix: Apply permanent or temporary solution
- Verify Resolution: Test and confirm issue is resolved
- Communicate Resolution: Notify affected users and stakeholders
- Monitor Stability: Watch for recurrence
- Document Resolution: Record root cause and fix in ticket
Phase 5: Post-Incident Review
For P1 and P2 incidents, conduct post-incident review within 48 hours:
- Timeline of events from detection to resolution
- Root cause analysis
- Response effectiveness evaluation
- Identification of preventive measures
- Process improvement recommendations
- Documentation of lessons learned
Escalation Procedures
Technical Escalation Path
| Level | Team/Role | Responsibilities | Escalation Trigger |
|---|---|---|---|
| L1 | Service Desk | Initial triage, basic troubleshooting, known issues | 30 minutes without resolution |
| L2 | Technical Support Specialists | Advanced troubleshooting, system administration | 2 hours without resolution or requires specialized knowledge |
| L3 | Subject Matter Experts / Engineering | Deep technical expertise, architecture, development | 4 hours without resolution or complex technical issue |
| L4 | Vendor Support | Product-specific expertise, code fixes, patches | When issue is vendor product defect or requires vendor intervention |
Management Escalation Path
| Level | Role | Contact Method | When to Escalate |
|---|---|---|---|
| 1 | IT Manager | Email/Phone/Slack | P2 incidents; P3 if unresolved after 4 hours |
| 2 | IT Director | Phone/Text | All P1 incidents immediately; P2 if unresolved after 2 hours |
| 3 | CTO/CIO | Phone | P1 incidents affecting business operations or data security |
| 4 | CEO / Executive Team | Phone (via CTO/CIO) | Major security breach, significant financial impact, public relations concern |
Critical Incident Response (P1)
Immediate Actions for P1 Incidents
- 0-5 minutes:
- Log P1 incident in ticketing system
- Call IT Director immediately
- Assemble incident response team via conference bridge
- 5-15 minutes:
- Assess scope and business impact
- Initiate incident command structure
- Notify CTO/CIO if warranted
- Begin status page updates
- 15-30 minutes:
- Implement emergency workarounds if available
- Engage vendors if needed
- Notify affected business units
- Establish communication cadence (hourly updates)
- Ongoing:
- Continue troubleshooting and resolution efforts
- Provide regular status updates
- Coordinate with business leadership
- Document all actions and decisions
Incident Command Structure
For major P1 incidents, establish clear roles:
| Role | Responsibilities | Assigned To |
|---|---|---|
| Incident Commander | Overall coordination, decision-making authority, stakeholder communication | IT Director or Senior Manager |
| Technical Lead | Lead technical investigation and resolution, coordinate technical teams | Senior Engineer or Architect |
| Communications Lead | Manage all internal and external communications, status updates | IT Manager or designated liaison |
| Scribe | Document timeline, actions, decisions, participants | Service Desk Lead or Admin |
| Business Liaison | Interface with affected business units, communicate business impact | Business Relationship Manager |
Security Incident Response
Security Incident Types
- Malware or ransomware infection
- Data breach or unauthorized access
- Phishing attack or social engineering
- Denial of service (DoS/DDoS)
- Insider threat or policy violation
- Lost or stolen device with company data
- System compromise or unauthorized changes
Security Incident Response Process
- Identification:
- Security alert or user report received
- Log incident as P1 or P2 based on scope
- Notify Security Team immediately
- Containment:
- Isolate affected systems from network
- Disable compromised accounts
- Block malicious IP addresses or domains
- Preserve evidence for investigation
- Eradication:
- Remove malware or unauthorized access
- Patch vulnerabilities
- Reset compromised credentials
- Verify systems are clean
- Recovery:
- Restore systems from clean backups
- Gradually bring systems back online
- Monitor for signs of persistent threat
- Validate business operations
- Lessons Learned:
- Conduct post-incident review
- Document attack vector and response
- Implement preventive controls
- Update security policies and training
Security Escalation and Notification
Security incidents require specific notifications:
| Incident Type | Immediate Notification | Additional Notifications |
|---|---|---|
| Suspected Data Breach | IT Director, Security Manager, Legal | CEO, CFO, Board (within 24 hours) |
| Ransomware | IT Director, Security Manager, CTO/CIO | FBI Cyber Division, Insurance carrier |
| Customer Data Exposure | IT Director, Legal, Compliance | CEO, affected customers per legal guidance |
| Regulatory Violation | IT Director, Legal, Compliance | Regulatory agency per compliance requirements |
Communication Protocols
Internal Communication
Status update frequency based on severity:
| Severity | Update Frequency | Communication Channels | Audience |
|---|---|---|---|
| P1 | Every 30-60 minutes | Email, Slack incident channel, status page | All affected users, IT leadership, executives |
| P2 | Every 2-4 hours | Email, Slack, status page | Affected users, IT management |
| P3 | Daily or upon significant progress | Ticket updates, email to requestor | Affected users |
| P4 | Upon resolution | Ticket updates | Requestor only |
Status Update Template
All incident communications should include:
- Subject Line: [Incident] - [Brief Description] - [Status]
- Current Status: What is happening right now?
- Impact: Who/what is affected?
- Actions Taken: What have we done so far?
- Next Steps: What are we doing next?
- Estimated Resolution: When do we expect resolution? (if known)
- Workaround: Can users do anything to continue working?
- Next Update: When will we provide the next update?
External Communication
For incidents affecting customers or requiring public statement:
- All external communications must be approved by Legal and Executive Leadership
- Marketing/PR coordinates messaging
- Customer support provides approved talking points
- IT provides technical facts and timeline
- Avoid technical jargon; focus on business impact and resolution
Vendor Escalation Procedures
When to Engage Vendors
- Issue is related to vendor product or service
- Internal troubleshooting has exhausted options
- Vendor-specific knowledge is required
- Software bug or defect is suspected
- Performance issue related to vendor service
Vendor Support Levels
| Vendor | Support Tier | Contact Method | SLA |
|---|---|---|---|
| Microsoft | Premier Support | Portal + Phone: 1-800-xxx-xxxx | P1: 1 hour; P2: 4 hours |
| AWS | Enterprise Support | Portal + Phone: 1-888-xxx-xxxx | Critical: 15 min; Urgent: 1 hour |
| Salesforce | Premier Success | Portal + Phone: 1-800-xxx-xxxx | P1: 1 hour; P2: 4 hours |
| Cisco | SmartNet 24x7 | Phone: 1-800-xxx-xxxx | P1: 1 hour; P2: 4 hours |
Information to Provide Vendors
- Internal incident ticket number
- Customer/account number
- Product version and configuration
- Detailed problem description and symptoms
- Steps to reproduce the issue
- Troubleshooting already performed
- Error messages and log files
- Business impact and urgency
Business Continuity During Incidents
Workaround Strategies
When primary systems are unavailable:
- Alternative Systems: Use backup or redundant systems
- Manual Processes: Temporarily revert to manual workflows
- Reduced Functionality: Operate with limited features
- Cloud Alternatives: Leverage cloud-based alternatives
- Mobile Solutions: Use mobile apps when desktops unavailable
Critical System Recovery Priorities
Recovery order based on business impact:
- Email and communication systems
- ERP and financial systems
- CRM and customer-facing systems
- Core business applications
- Collaboration and productivity tools
- Administrative and support systems
Incident Documentation Requirements
Required Documentation During Incident
- Timeline of events with timestamps
- All troubleshooting steps performed
- Conference bridge attendance and participants
- Decisions made and decision-makers
- Communications sent to stakeholders
- Vendor interactions and case numbers
- System changes or configurations modified
Post-Incident Report
Required for all P1 and P2 incidents:
- Executive Summary: Brief overview in business terms
- Incident Details: What happened, when, who was affected
- Timeline: Chronological sequence of events
- Root Cause: Why did this happen?
- Resolution: How was it fixed?
- Business Impact: Quantify downtime, revenue impact, user impact
- Response Assessment: What went well, what didn't?
- Preventive Measures: How will we prevent recurrence?
- Action Items: Follow-up tasks with owners and due dates
After-Hours and Weekend Support
On-Call Rotation
IT maintains 24/7 on-call coverage:
- Primary On-Call: IT Manager or Senior Engineer (rotating weekly)
- Secondary On-Call: IT Director (escalation point)
- Tertiary On-Call: CTO/CIO (critical incidents only)
On-Call Responsibilities
- Monitor phone/email for critical alerts
- Respond to P1 incidents within 15 minutes
- Triage and escalate as needed
- Coordinate vendor engagement if required
- Document all actions and handoff to day team
After-Hours Contact Methods
- Emergency Hotline: 1-800-555-HELP (redirects to on-call)
- PagerDuty: Automated alerting and escalation
- IT Leadership Mobile: Direct contact numbers in emergency contact list
Metrics and Reporting
Incident Management KPIs
| Metric | Definition | Target |
|---|---|---|
| Mean Time to Acknowledge (MTTA) | Average time from incident logged to first response | P1: 15 min; P2: 30 min; P3: 2 hours |
| Mean Time to Resolve (MTTR) | Average time from incident logged to resolution | P1: 4 hours; P2: 8 hours; P3: 24 hours |
| First Contact Resolution (FCR) | % of incidents resolved by first responder | 70% |
| SLA Compliance | % of incidents meeting response/resolution SLAs | 95% |
| Escalation Rate | % of incidents requiring escalation | Under 20% |
Monthly Incident Reports
IT Managers must provide monthly reports including:
- Total incidents by severity and category
- Top 5 incident types and trends
- SLA compliance by severity level
- Average MTTA and MTTR
- Major incidents summary and lessons learned
- Repeat incidents and systemic issues
- Recommendations for improvements
Continuous Improvement
Incident Trend Analysis
Regularly analyze incident data to identify:
- Recurring problems requiring permanent fixes
- Knowledge gaps needing training or documentation
- Process inefficiencies in incident handling
- Tool or automation opportunities
- Vendor performance issues
Process Improvements
- Update runbooks and procedures after major incidents
- Conduct tabletop exercises for critical scenarios
- Enhance monitoring and alerting to catch issues earlier
- Invest in automation to reduce manual response time
- Improve documentation and knowledge base articles
Contact Information
Emergency contacts for incident escalation:
- IT Service Desk: ext. 4357 (24/7)
- IT Manager On-Call: Via PagerDuty or 1-800-555-HELP
- IT Director: director@company.com, mobile: xxx-xxx-xxxx
- Security Team: security@company.com, ext. 4500
- CTO/CIO: cto@company.com, mobile: xxx-xxx-xxxx
Last Updated: November 2025
Policy Owner: IT Director
Confidentiality: IT Management Only
