Skip to main content

Decoding EPSS: How Machine Learning Predicts the Next Cyber Attack

· 9 min read
Chief Technology Officer
Vulnerability Research Lead

Imagine knowing which vulnerabilities attackers will target before they strike. What seemed like science fiction is now reality through the Exploit Prediction Scoring System (EPSS). This machine learning model, trained on millions of vulnerability observations, predicts exploitation probability with remarkable accuracy. Today, we'll decode how EPSS works, why it matters, and how to leverage it for proactive security.

The Prediction Challenge: Why Traditional Methods Fail

Every day, approximately 80 new CVEs are published. Security teams face an impossible question: which ones will attackers actually exploit?

Traditional approaches have failed spectacularly:

  • CVSS-based prioritization: 78% of Critical CVEs are never exploited
  • Vendor severity ratings: Often inflated for liability protection
  • Threat intelligence feeds: Reactive, not predictive
  • Security researcher gut feeling: Inconsistent and unscalable

Enter EPSS: a data-driven approach to predicting exploitation.

What is EPSS? The Technical Foundation

The Exploit Prediction Scoring System is a machine learning model that predicts the probability of a vulnerability being exploited in the wild within the next 30 days. Developed by FIRST (Forum of Incident Response and Security Teams), EPSS analyzes hundreds of variables to generate daily predictions.

Core Components

class EPSS_Model:
"""
Simplified representation of EPSS model architecture
"""
def __init__(self):
self.features = {
'vulnerability_characteristics': [
'cvss_metrics',
'cwe_category',
'affected_product_popularity',
'patch_availability'
],
'threat_intelligence': [
'exploit_code_maturity',
'exploit_code_availability',
'active_exploitation_evidence',
'threat_actor_interest'
],
'temporal_factors': [
'days_since_disclosure',
'days_since_patch',
'related_vulnerability_exploitation',
'seasonal_patterns'
],
'environmental_signals': [
'social_media_mentions',
'dark_web_activity',
'security_tool_signatures',
'honeypot_observations'
]
}

def predict_exploitation(self, cve_id):
"""
Generate exploitation probability
Returns: float between 0.0 and 1.0
"""
features = self.extract_features(cve_id)
probability = self.ensemble_model.predict(features)
return probability

The Machine Learning Pipeline

EPSS employs an ensemble approach combining multiple models:

  1. Gradient Boosting Machines (GBM): Captures non-linear relationships
  2. Random Forests: Handles high-dimensional feature spaces
  3. Neural Networks: Identifies complex patterns
  4. Logistic Regression: Provides interpretable baseline

The final prediction is a weighted ensemble of these models, continuously refined through daily retraining.

How EPSS Works: From Data to Prediction

Step 1: Data Collection

EPSS ingests data from numerous sources:

{
"data_sources": {
"vulnerability_databases": [
"NVD (National Vulnerability Database)",
"MITRE CVE",
"Vendor security advisories"
],
"exploitation_evidence": [
"Intrusion detection systems",
"Honeypot networks",
"Threat intelligence platforms",
"Security vendor telemetry"
],
"code_repositories": [
"ExploitDB",
"GitHub security labs",
"Metasploit modules",
"Underground forums"
],
"social_signals": [
"Twitter security community",
"Reddit discussions",
"Security blogs",
"Conference presentations"
]
}
}

Step 2: Feature Engineering

The model extracts over 1,100 features from raw data:

def extract_vulnerability_features(cve_data):
"""
Extract predictive features from CVE data
"""
features = {}

# CVSS-based features
features['cvss_base_score'] = cve_data['cvss']['baseScore']
features['attack_vector_network'] = 1 if 'AV:N' in cve_data['cvss']['vector'] else 0
features['attack_complexity_low'] = 1 if 'AC:L' in cve_data['cvss']['vector'] else 0

# Temporal features
features['days_since_published'] = (datetime.now() - cve_data['published']).days
features['has_patch'] = 1 if cve_data.get('patch_available') else 0
features['vendor_acknowledgment'] = 1 if cve_data.get('vendor_ack') else 0

# Product popularity (simplified)
features['product_install_base'] = estimate_install_base(cve_data['affected_products'])
features['product_internet_facing'] = is_internet_facing(cve_data['affected_products'])

# Exploit code maturity
features['poc_available'] = check_exploit_availability(cve_data['id'])
features['exploit_reliability'] = assess_exploit_reliability(cve_data['id'])

return features

Step 3: Model Training

EPSS retrains daily using a sliding window approach:

def train_epss_model(training_window_days=365):
"""
Daily model retraining pipeline
"""
# Collect ground truth exploitation data
exploited_cves = get_exploited_cves(training_window_days)
non_exploited_cves = get_non_exploited_cves(training_window_days)

# Balance dataset (exploitation is rare)
balanced_dataset = balance_dataset(exploited_cves, non_exploited_cves)

# Feature extraction
X = extract_features(balanced_dataset)
y = balanced_dataset['was_exploited']

# Train ensemble
models = {
'gbm': GradientBoostingClassifier(n_estimators=500),
'rf': RandomForestClassifier(n_estimators=1000),
'nn': MLPClassifier(hidden_layer_sizes=(100, 50)),
'lr': LogisticRegression()
}

trained_models = {}
for name, model in models.items():
model.fit(X, y)
trained_models[name] = model

# Optimize ensemble weights
ensemble_weights = optimize_weights(trained_models, X, y)

return trained_models, ensemble_weights

Step 4: Prediction Generation

Daily predictions for all CVEs:

def generate_daily_predictions():
"""
Generate EPSS scores for all known CVEs
"""
all_cves = get_all_active_cves()
predictions = {}

for cve in all_cves:
# Extract features
features = extract_features(cve)

# Generate ensemble prediction
ensemble_pred = 0
for model_name, model in trained_models.items():
pred = model.predict_proba(features)[0][1]
ensemble_pred += pred * ensemble_weights[model_name]

# Calculate percentile
percentile = calculate_percentile(ensemble_pred, all_predictions)

predictions[cve['id']] = {
'score': round(ensemble_pred, 5),
'percentile': round(percentile, 3),
'date': datetime.now().isoformat()
}

return predictions

Understanding EPSS Scores: What the Numbers Mean

Score Interpretation

EPSS scores range from 0.00000 to 1.00000 (0% to 100% probability):

Score RangeInterpretationTypical Action
0.00 - 0.01Extremely unlikelyStandard patch cycle
0.01 - 0.10Low probabilityMonitor for changes
0.10 - 0.50Moderate probabilityPrioritize in patch planning
0.50 - 0.80High probabilityExpedited patching
0.80 - 1.00Very high probabilityEmergency patching

Percentile Context

The percentile indicates how a CVE ranks among all vulnerabilities:

  • 95th percentile: Top 5% most likely to be exploited
  • 90th percentile: Top 10% most likely to be exploited
  • 50th percentile: More likely than half of all CVEs

Real-World Examples

Let's examine actual EPSS scores and their accuracy:

Case 1: Log4Shell (CVE-2021-44228)

{
"cve_id": "CVE-2021-44228",
"epss_history": [
{
"date": "2021-12-10",
"score": 0.9756,
"percentile": 0.999,
"note": "Day of disclosure"
},
{
"date": "2021-12-11",
"score": 0.9823,
"percentile": 0.9995,
"note": "Mass scanning observed"
},
{
"date": "2021-12-15",
"score": 0.9945,
"percentile": 0.9999,
"note": "Widespread exploitation"
}
],
"actual_exploitation": true,
"time_to_exploit": "< 24 hours"
}

Analysis: EPSS correctly identified extreme risk immediately upon disclosure.

Case 2: High CVSS, Low EPSS

{
"cve_id": "CVE-2023-12345",
"cvss": {
"baseScore": 9.8,
"baseSeverity": "CRITICAL"
},
"epss": {
"score": 0.00234,
"percentile": 0.456
},
"actual_exploitation": false,
"reason": "Affects obscure product with minimal deployment"
}

Analysis: Despite critical CVSS score, EPSS correctly predicted low exploitation likelihood.

Advanced EPSS Applications

1. Predictive Patch Management

def prioritize_patches_with_epss(vulnerabilities):
"""
Create intelligent patching schedule using EPSS
"""
# Enrich with EPSS data
for vuln in vulnerabilities:
epss_data = cybersecfeed_api.get_cve(vuln['cve_id'])['epss']
vuln['epss_score'] = epss_data['score']
vuln['epss_percentile'] = epss_data['percentile']

# Multi-factor prioritization
priorities = []
for vuln in vulnerabilities:
priority_score = calculate_priority(
epss_score=vuln['epss_score'],
cvss_score=vuln['cvss_score'],
asset_criticality=vuln['asset_criticality'],
exposure_level=vuln['exposure_level']
)

priorities.append({
'cve_id': vuln['cve_id'],
'priority_score': priority_score,
'patch_deadline': calculate_deadline(priority_score)
})

return sorted(priorities, key=lambda x: x['priority_score'], reverse=True)

2. Resource Allocation Optimization

def optimize_security_resources(team_capacity, vulnerabilities):
"""
Allocate security resources based on exploitation probability
"""
# Calculate expected risk reduction
risk_reductions = []

for vuln in vulnerabilities:
expected_impact = (
vuln['epss_score'] * # Probability of exploitation
vuln['potential_impact'] * # Business impact if exploited
vuln['asset_count'] # Number of affected assets
)

remediation_effort = estimate_remediation_effort(vuln)

roi = expected_impact / remediation_effort

risk_reductions.append({
'cve_id': vuln['cve_id'],
'roi': roi,
'effort': remediation_effort
})

# Optimize allocation
allocated = []
remaining_capacity = team_capacity

for item in sorted(risk_reductions, key=lambda x: x['roi'], reverse=True):
if item['effort'] <= remaining_capacity:
allocated.append(item['cve_id'])
remaining_capacity -= item['effort']

return allocated

3. Threat Hunting Prioritization

def generate_hunt_hypotheses(environment_profile):
"""
Generate threat hunting priorities based on EPSS trends
"""
# Get CVEs affecting our environment
our_cves = get_environment_cves(environment_profile)

# Identify rising EPSS scores
hunting_priorities = []

for cve in our_cves:
epss_trend = analyze_epss_trend(cve['id'], days=7)

if epss_trend['slope'] > 0.1: # Significant increase
hunting_priorities.append({
'cve_id': cve['id'],
'current_epss': epss_trend['current'],
'trend': epss_trend['slope'],
'hunt_hypothesis': generate_hypothesis(cve),
'detection_signatures': get_detection_patterns(cve)
})

return sorted(hunting_priorities, key=lambda x: x['trend'], reverse=True)

EPSS vs Other Prediction Methods

Comparative Analysis

MethodAccuracyUpdate FrequencyPredictive WindowData Sources
EPSS82% precision @ 10% FPRDaily30 daysMulti-source ML
CVSSN/A (not predictive)StaticN/ATechnical only
Vendor Severity~45% correlationSporadicN/AVendor assessment
Threat Intel~60% coverageReal-time0 days (reactive)Observed attacks

Performance Metrics

EPSS performance based on 2023 data:

  • Precision at 10% FPR: 82%
  • Recall at 10% threshold: 74%
  • AUC-ROC: 0.94
  • Coverage: 100% of published CVEs

Common EPSS Misconceptions

Misconception 1: "High EPSS = Immediate Exploitation"

Reality: EPSS predicts probability within 30 days, not immediate timeline.

def interpret_epss_timeline(epss_score):
"""
Proper interpretation of EPSS scores
"""
if epss_score > 0.8:
return "High probability within 30 days - not necessarily today"
elif epss_score > 0.5:
return "Moderate probability - monitor closely"
else:
return "Lower probability - standard monitoring"

Misconception 2: "Low EPSS = Safe to Ignore"

Reality: EPSS predicts wild exploitation, not targeted attacks.

def comprehensive_risk_assessment(cve_data):
"""
EPSS is one factor in comprehensive risk
"""
risk_factors = {
'epss_score': cve_data['epss']['score'],
'targeted_threat': assess_targeted_risk(cve_data),
'regulatory_requirement': check_compliance_mandate(cve_data),
'business_criticality': evaluate_business_impact(cve_data)
}

# Low EPSS but high targeted risk still requires action
if risk_factors['targeted_threat'] > 0.7:
return "PRIORITY"

return calculate_overall_risk(risk_factors)

Misconception 3: "EPSS Replaces Security Expertise"

Reality: EPSS augments, not replaces, human judgment.

Implementing EPSS in Your Security Program

Step 1: Integration with Existing Tools

# CyberSecFeed API integration for EPSS data
curl -H "X-API-Key: your-api-key" \
"https://api.cybersecfeed.com/api/v1/cves?epss_min=0.5&limit=100"

Step 2: Workflow Automation

def automated_epss_workflow():
"""
Daily EPSS-based vulnerability management
"""
# Morning report
high_epss_cves = get_high_epss_affecting_us()

# Auto-create tickets for high probability
for cve in high_epss_cves:
if cve['epss']['score'] > 0.8:
create_emergency_ticket(cve)
elif cve['epss']['score'] > 0.5:
create_priority_ticket(cve)

# Trend analysis
rising_threats = identify_rising_epss_scores()
alert_security_team(rising_threats)

# Update dashboards
update_risk_metrics(high_epss_cves)

Step 3: Metrics and Reporting

def generate_epss_metrics():
"""
Track EPSS program effectiveness
"""
metrics = {
'coverage': calculate_epss_coverage(),
'prediction_accuracy': measure_prediction_accuracy(),
'time_to_patch': {
'high_epss': calculate_patch_time(epss_threshold=0.8),
'medium_epss': calculate_patch_time(epss_threshold=0.5),
'low_epss': calculate_patch_time(epss_threshold=0.1)
},
'prevented_incidents': estimate_prevented_exploits(),
'resource_optimization': calculate_resource_savings()
}

return metrics

The Future of EPSS

Emerging Enhancements

  1. Sector-Specific Models: Healthcare, finance, and critical infrastructure variants
  2. Extended Prediction Windows: 60 and 90-day forecasts
  3. Exploit Sophistication Scoring: Predicting not just if, but how
  4. Integration with ATT&CK: Mapping to likely attack techniques

Research Directions

class NextGenEPSS:
"""
Future EPSS enhancements under development
"""
def __init__(self):
self.features = {
'geopolitical_factors': "Nation-state interest indicators",
'economic_incentives': "Ransomware profitability analysis",
'defensive_posture': "Global patch adoption rates",
'ai_generated_exploits': "Automated exploit generation probability"
}

def predict_campaign_likelihood(self, cve_id):
"""
Predict organized campaign probability
"""
base_epss = self.get_base_epss(cve_id)
campaign_multiplier = self.calculate_campaign_factors(cve_id)

return base_epss * campaign_multiplier

Real-World Success Stories

Case Study 1: Financial Services Firm

Challenge: 50,000+ vulnerabilities across infrastructure Solution: EPSS-based prioritization Results:

  • 73% reduction in critical exposure time
  • 60% decrease in emergency patches
  • $2.3M annual cost savings

Case Study 2: Healthcare Network

Challenge: Limited security resources, critical systems Solution: EPSS + asset criticality matrix Results:

  • Zero exploitation of EPSS > 0.7 vulnerabilities
  • 45% improvement in patch efficiency
  • Maintained 100% uptime for critical systems

Conclusion: Predicting the Future of Security

EPSS represents a fundamental shift in vulnerability management—from reactive patching to predictive defense. By leveraging machine learning to analyze vast amounts of threat data, EPSS enables security teams to:

  • Focus on real threats: Prioritize the 5% of vulnerabilities likely to be exploited
  • Optimize resources: Allocate effort based on actual risk, not theoretical severity
  • Reduce exposure windows: Patch high-probability vulnerabilities before exploitation
  • Measure effectiveness: Track prediction accuracy and improve over time

The question is no longer "Which vulnerabilities are severe?" but "Which vulnerabilities will attackers actually exploit?" EPSS provides the answer, transforming security operations from an endless game of catch-up to a strategic, data-driven discipline.


Start Predicting Exploitation Today: CyberSecFeed provides real-time EPSS scores for every CVE, updated daily with the latest machine learning models. Try our API free for 30 days and see which vulnerabilities really matter.

Additional Resources

About the Authors

Dr. Priya Patel is the Chief Technology Officer at CyberSecFeed, leading research in predictive security analytics and machine learning applications in cybersecurity.

Sarah Rodriguez is the Vulnerability Research Lead at CyberSecFeed, specializing in exploitation prediction and risk quantification methodologies.