MEDIUM

Incident Response Playbook: From Detection to Recovery

A comprehensive guide to building and executing an incident response plan, with practical templates and real-world scenarios.

Why Incident Response Matters

The average time to identify a breach is 204 days. The average time to contain it is 73 additional days. Organizations with tested incident response plans reduce breach costs by $2.66 million on average.

The Incident Response Lifecycle

PhaseActivitiesOutput
1. PreparationBuild team, create policies, deploy tools, train staffIR plan, playbooks, contact lists
2. Detection & AnalysisMonitor alerts, validate incidents, determine scopeIncident classification, timeline
3. ContainmentIsolate systems, preserve evidence, limit damageContained threat, forensic images
4. EradicationRemove malware, close vulnerabilities, reset credentialsClean systems, patched gaps
5. RecoveryRestore systems, verify functionality, monitor closelyBusiness operations restored
6. Post-IncidentDocument lessons, update procedures, improve defensesUpdated IR plan, metrics

Note: This is a continuous cycle - lessons learned feed back into preparation.

Phase 1: Preparation

Build Your Team

Incident Response Team Structure:
  Core Team:
    - IR Manager/Lead
    - Security Analysts (Tier 1-3)
    - Forensic Investigators
    - Threat Intelligence Analyst

  Extended Team:
    - IT Operations
    - Network Engineering
    - Legal Counsel
    - Communications/PR
    - Human Resources
    - Executive Sponsor

  External Resources:
    - IR Retainer (DFIR firm)
    - Cyber Insurance Provider
    - Law Enforcement Contacts
    - Regulatory Contacts

Essential Documentation

## IR Documentation Checklist

### Policies & Procedures
- [ ] Incident Response Plan
- [ ] Incident Classification Matrix
- [ ] Escalation Procedures
- [ ] Communication Templates
- [ ] Evidence Handling Procedures

### Technical Documentation
- [ ] Network Diagrams
- [ ] Asset Inventory
- [ ] Critical System List
- [ ] Backup Procedures
- [ ] Recovery Procedures

### Contact Information
- [ ] On-call Rotation Schedule
- [ ] Escalation Contact List
- [ ] Vendor Support Contacts
- [ ] Law Enforcement Contacts
- [ ] Legal/PR Contacts

Incident Classification

SeverityExamplesResponse
SEV 1 - CriticalActive data breach, ransomware encryption in progress, critical infrastructure compromiseAll hands on deck, 15-min updates
SEV 2 - HighConfirmed compromise (limited scope), malware on multiple systems, privileged account compromiseCore team engaged, hourly updates
SEV 3 - MediumSingle system compromise, phishing with credential capture, policy violation with security impactOn-call team, daily updates
SEV 4 - LowAttempted attack blocked, minor policy violation, suspicious activity requiring investigationNormal queue, standard SLA

Phase 2: Detection & Analysis

Initial Triage

## First 15 Minutes Checklist

### Validate the Alert
- [ ] Is this a true positive?
- [ ] What triggered the alert?
- [ ] Initial scope assessment
- [ ] Assign incident number

### Initial Data Collection
- [ ] Alert details and timeline
- [ ] Affected systems/users
- [ ] Network logs (5-min window)
- [ ] Initial IOCs

### Immediate Decisions
- [ ] Severity classification
- [ ] Escalation needed?
- [ ] Containment required?
- [ ] Evidence preservation priority

Investigation Queries

Windows Event Log Analysis:

# Recent successful logins
Get-WinEvent -FilterHashtable @{LogName='Security';ID=4624} -MaxEvents 100 |
  Select-Object TimeCreated, @{N='User';E={$_.Properties[5].Value}},
  @{N='LogonType';E={$_.Properties[8].Value}},
  @{N='SourceIP';E={$_.Properties[18].Value}}

# Process creation events
Get-WinEvent -FilterHashtable @{LogName='Security';ID=4688} -MaxEvents 100 |
  Select-Object TimeCreated,
  @{N='User';E={$_.Properties[1].Value}},
  @{N='Process';E={$_.Properties[5].Value}},
  @{N='CommandLine';E={$_.Properties[8].Value}}

# Service installations
Get-WinEvent -FilterHashtable @{LogName='System';ID=7045} -MaxEvents 50

Linux Investigation:

# Recent authentication
grep -E "Accepted|Failed" /var/log/auth.log | tail -100

# Recently modified files
find / -type f -mtime -1 -ls 2>/dev/null

# Running processes with network connections
netstat -tulpn
lsof -i -P -n

# Cron jobs (persistence)
cat /etc/crontab
ls -la /etc/cron.d/
crontab -l

# Recently installed packages
rpm -qa --last | head -20  # RHEL/CentOS
dpkg -l --no-pager | tail -20  # Debian/Ubuntu

Network Traffic Analysis:

# Capture traffic for analysis
tcpdump -i eth0 -w capture.pcap -c 10000

# Find beaconing behavior
tshark -r capture.pcap -T fields -e ip.src -e ip.dst -e tcp.dstport |
  sort | uniq -c | sort -rn | head -20

# DNS queries
tshark -r capture.pcap -Y "dns.flags.response == 0" -T fields -e dns.qry.name |
  sort | uniq -c | sort -rn

Timeline Building

Example Timeline:

Date/Time (UTC)SourceEvent Description
2025-01-08 14:23:15Email GWPhishing email received
2025-01-08 14:25:42ProxyUser clicked malicious URL
2025-01-08 14:25:47EDRMalware download blocked
2025-01-08 14:26:01EDRSecond attempt successful
2025-01-08 14:26:15EDRProcess injection detected
2025-01-08 14:30:00DC LogsLateral movement attempt
2025-01-08 14:32:00SIEMAlert generated
2025-01-08 14:35:00SOCIncident declared

Phase 3: Containment

Short-term Containment

## Immediate Containment Actions

### Network Isolation
- [ ] Isolate affected systems (network ACLs/VLANs)
- [ ] Block malicious IPs at firewall
- [ ] Sinkhole malicious domains
- [ ] Disable compromised accounts

### Evidence Preservation (BEFORE imaging)
- [ ] Capture volatile data (memory, connections)
- [ ] Screenshot active sessions
- [ ] Document running processes
- [ ] Note network connections

Memory Acquisition:

# Linux
sudo dd if=/dev/mem of=/mnt/forensics/memory.raw bs=1M

# Using LiME
sudo insmod lime.ko "path=/mnt/forensics/memory.lime format=lime"

# Windows (with winpmem)
winpmem_mini_x64.exe memory.raw

Long-term Containment

## Sustained Containment

### System Hardening
- [ ] Patch exploited vulnerability
- [ ] Reset compromised credentials
- [ ] Implement additional monitoring
- [ ] Block newly discovered IOCs

### Business Continuity
- [ ] Activate backup systems if needed
- [ ] Communicate with affected users
- [ ] Coordinate with business units

Phase 4: Eradication

Malware Removal

## Eradication Checklist

### Identify All Affected Systems
- [ ] Scan all systems with IOCs
- [ ] Review EDR detections
- [ ] Check for persistence mechanisms
- [ ] Identify patient zero

### Remove Threats
- [ ] Delete malicious files
- [ ] Remove persistence (scheduled tasks, services, registry)
- [ ] Remove malicious accounts
- [ ] Revoke compromised certificates/keys

### Verify Clean State
- [ ] Rescan with updated signatures
- [ ] Verify persistence removal
- [ ] Confirm no ongoing C2 communication

Common Persistence Locations:

Windows Persistence:
├── Registry Run Keys
│   └── HKLM/HKCU\Software\Microsoft\Windows\CurrentVersion\Run
├── Scheduled Tasks
│   └── C:\Windows\System32\Tasks\
├── Services
│   └── HKLM\System\CurrentControlSet\Services
├── Startup Folders
│   └── %AppData%\Microsoft\Windows\Start Menu\Programs\Startup
└── WMI Subscriptions

Linux Persistence:
├── Cron Jobs
│   └── /etc/crontab, /etc/cron.d/, user crontabs
├── Systemd Services
│   └── /etc/systemd/system/
├── Init Scripts
│   └── /etc/init.d/
├── SSH Keys
│   └── ~/.ssh/authorized_keys
└── Shell Profiles
    └── ~/.bashrc, ~/.profile, /etc/profile.d/

Phase 5: Recovery

Recovery Plan

## Recovery Procedures

### System Restoration
- [ ] Rebuild from known-good images (preferred)
- [ ] Restore from clean backups
- [ ] Reinstall and reconfigure (if no backup)
- [ ] Apply all patches before reconnecting

### Validation
- [ ] Vulnerability scan restored systems
- [ ] Verify business functionality
- [ ] Confirm security controls active
- [ ] Test backup/recovery procedures

### Reconnection
- [ ] Gradual reconnection to network
- [ ] Enhanced monitoring during transition
- [ ] User communication and testing

Recovery Priority

PrioritySystemsRTO Target
P1Domain Controllers, Core Network Infra, Security Systems4 hours
P2Email/Communication, Critical Applications, Database Servers8 hours
P3Business Applications, File Servers24 hours
P4End User Devices, Non-critical Systems48-72 hours

Phase 6: Post-Incident

Lessons Learned Meeting

## Post-Incident Review Agenda

### Timeline Review (30 min)
- Walk through incident timeline
- Identify key decision points
- Note what information was available when

### What Went Well (15 min)
- Effective detection
- Good team coordination
- Successful containment

### What Needs Improvement (30 min)
- Detection gaps
- Process bottlenecks
- Communication issues
- Tool/capability gaps

### Action Items (15 min)
- Assign owners
- Set deadlines
- Define success criteria

Incident Report Template

# Incident Report: [IR-2025-001]

## Executive Summary
Brief 2-3 paragraph overview for leadership.

## Incident Details
- **Incident ID:** IR-2025-001
- **Date Detected:** 2025-01-08 14:32 UTC
- **Date Contained:** 2025-01-08 16:45 UTC
- **Date Resolved:** 2025-01-09 09:00 UTC
- **Severity:** SEV 2 - High
- **Classification:** Malware/Ransomware

## Impact Assessment
- Systems affected: 15 workstations, 2 servers
- Data affected: No confirmed exfiltration
- Business impact: 4 hours downtime
- Financial impact: ~$50,000 (estimated)

## Root Cause Analysis
[Detailed technical analysis]

## Timeline of Events
[Detailed timeline]

## Response Actions
[What was done]

## Recommendations
[Improvements to prevent recurrence]

## Appendices
- IOCs
- Affected asset list
- Evidence inventory

Tabletop Exercise Template

Scenario: Ransomware Attack

## Tabletop Exercise: Ransomware Scenario

### Inject 1 (T+0 min)
"It's Monday 2:00 AM. Your SIEM alerts on unusual
PowerShell activity on multiple workstations.
EDR shows attempted disabling of security tools."

Discussion Questions:
- Who gets notified?
- What's our first action?
- Do we have after-hours coverage?

### Inject 2 (T+15 min)
"Investigation reveals 50+ systems showing encryption
activity. Ransom notes appearing. Domain admin
credentials may be compromised."

Discussion Questions:
- Do we isolate the network?
- Who authorizes shutdown of systems?
- How do we communicate internally?

### Inject 3 (T+30 min)
"Attackers claim to have exfiltrated data. They're
demanding $2M in Bitcoin. Media is calling."

Discussion Questions:
- Do we engage with attackers?
- What's our disclosure obligation?
- Who handles media?

### Inject 4 (T+45 min)
"Backups from the past 30 days are encrypted.
Last clean backup is 45 days old."

Discussion Questions:
- What's our recovery strategy?
- How long can the business operate?
- Do we consider paying?

References


The best incident response is the one you’ve practiced. Train like it’s real, so when it’s real, it feels like training.