Decoding EPSS: How Machine Learning Predicts the Next Cyber Attack

April 8, 2024 · 9 min read

Chief Technology Officer

Vulnerability Research Lead

Imagine knowing which vulnerabilities attackers will target before they strike. What seemed like science fiction is now reality through the Exploit Prediction Scoring System (EPSS). This machine learning model, trained on millions of vulnerability observations, predicts exploitation probability with remarkable accuracy. Today, we'll decode how EPSS works, why it matters, and how to leverage it for proactive security.

The Prediction Challenge: Why Traditional Methods Fail

Every day, approximately 80 new CVEs are published. Security teams face an impossible question: which ones will attackers actually exploit?

Traditional approaches have failed spectacularly:

CVSS-based prioritization: 78% of Critical CVEs are never exploited
Vendor severity ratings: Often inflated for liability protection
Threat intelligence feeds: Reactive, not predictive
Security researcher gut feeling: Inconsistent and unscalable

Enter EPSS: a data-driven approach to predicting exploitation.

What is EPSS? The Technical Foundation

The Exploit Prediction Scoring System is a machine learning model that predicts the probability of a vulnerability being exploited in the wild within the next 30 days. Developed by FIRST (Forum of Incident Response and Security Teams), EPSS analyzes hundreds of variables to generate daily predictions.

Core Components

class EPSS_Model:
    """
    Simplified representation of EPSS model architecture
    """
    def __init__(self):
        self.features = {
            'vulnerability_characteristics': [
                'cvss_metrics',
                'cwe_category',
                'affected_product_popularity',
                'patch_availability'
            ],
            'threat_intelligence': [
                'exploit_code_maturity',
                'exploit_code_availability',
                'active_exploitation_evidence',
                'threat_actor_interest'
            ],
            'temporal_factors': [
                'days_since_disclosure',
                'days_since_patch',
                'related_vulnerability_exploitation',
                'seasonal_patterns'
            ],
            'environmental_signals': [
                'social_media_mentions',
                'dark_web_activity',
                'security_tool_signatures',
                'honeypot_observations'
            ]
        }

    def predict_exploitation(self, cve_id):
        """
        Generate exploitation probability
        Returns: float between 0.0 and 1.0
        """
        features = self.extract_features(cve_id)
        probability = self.ensemble_model.predict(features)
        return probability

The Machine Learning Pipeline

EPSS employs an ensemble approach combining multiple models:

Gradient Boosting Machines (GBM): Captures non-linear relationships
Random Forests: Handles high-dimensional feature spaces
Neural Networks: Identifies complex patterns
Logistic Regression: Provides interpretable baseline

The final prediction is a weighted ensemble of these models, continuously refined through daily retraining.

How EPSS Works: From Data to Prediction

Step 1: Data Collection

EPSS ingests data from numerous sources:

{
	"data_sources": {
		"vulnerability_databases": ["NVD (National Vulnerability Database)", "MITRE CVE", "Vendor security advisories"],
		"exploitation_evidence": [
			"Intrusion detection systems",
			"Honeypot networks",
			"Threat intelligence platforms",
			"Security vendor telemetry"
		],
		"code_repositories": ["ExploitDB", "GitHub security labs", "Metasploit modules", "Underground forums"],
		"social_signals": ["Twitter security community", "Reddit discussions", "Security blogs", "Conference presentations"]
	}
}

Step 2: Feature Engineering

The model extracts over 1,100 features from raw data:

def extract_vulnerability_features(cve_data):
    """
    Extract predictive features from CVE data
    """
    features = {}

    # CVSS-based features
    features['cvss_base_score'] = cve_data['cvss']['baseScore']
    features['attack_vector_network'] = 1 if 'AV:N' in cve_data['cvss']['vector'] else 0
    features['attack_complexity_low'] = 1 if 'AC:L' in cve_data['cvss']['vector'] else 0

    # Temporal features
    features['days_since_published'] = (datetime.now() - cve_data['published']).days
    features['has_patch'] = 1 if cve_data.get('patch_available') else 0
    features['vendor_acknowledgment'] = 1 if cve_data.get('vendor_ack') else 0

    # Product popularity (simplified)
    features['product_install_base'] = estimate_install_base(cve_data['affected_products'])
    features['product_internet_facing'] = is_internet_facing(cve_data['affected_products'])

    # Exploit code maturity
    features['poc_available'] = check_exploit_availability(cve_data['id'])
    features['exploit_reliability'] = assess_exploit_reliability(cve_data['id'])

    return features

Step 3: Model Training

EPSS retrains daily using a sliding window approach:

def train_epss_model(training_window_days=365):
    """
    Daily model retraining pipeline
    """
    # Collect ground truth exploitation data
    exploited_cves = get_exploited_cves(training_window_days)
    non_exploited_cves = get_non_exploited_cves(training_window_days)

    # Balance dataset (exploitation is rare)
    balanced_dataset = balance_dataset(exploited_cves, non_exploited_cves)

    # Feature extraction
    X = extract_features(balanced_dataset)
    y = balanced_dataset['was_exploited']

    # Train ensemble
    models = {
        'gbm': GradientBoostingClassifier(n_estimators=500),
        'rf': RandomForestClassifier(n_estimators=1000),
        'nn': MLPClassifier(hidden_layer_sizes=(100, 50)),
        'lr': LogisticRegression()
    }

    trained_models = {}
    for name, model in models.items():
        model.fit(X, y)
        trained_models[name] = model

    # Optimize ensemble weights
    ensemble_weights = optimize_weights(trained_models, X, y)

    return trained_models, ensemble_weights

Step 4: Prediction Generation

Daily predictions for all CVEs:

def generate_daily_predictions():
    """
    Generate EPSS scores for all known CVEs
    """
    all_cves = get_all_active_cves()
    predictions = {}

    for cve in all_cves:
        # Extract features
        features = extract_features(cve)

        # Generate ensemble prediction
        ensemble_pred = 0
        for model_name, model in trained_models.items():
            pred = model.predict_proba(features)[0][1]
            ensemble_pred += pred * ensemble_weights[model_name]

        # Calculate percentile
        percentile = calculate_percentile(ensemble_pred, all_predictions)

        predictions[cve['id']] = {
            'score': round(ensemble_pred, 5),
            'percentile': round(percentile, 3),
            'date': datetime.now().isoformat()
        }

    return predictions

Understanding EPSS Scores: What the Numbers Mean

Score Interpretation

EPSS scores range from 0.00000 to 1.00000 (0% to 100% probability):

Score Range	Interpretation	Typical Action
0.00 - 0.01	Extremely unlikely	Standard patch cycle
0.01 - 0.10	Low probability	Monitor for changes
0.10 - 0.50	Moderate probability	Prioritize in patch planning
0.50 - 0.80	High probability	Expedited patching
0.80 - 1.00	Very high probability	Emergency patching

Percentile Context

The percentile indicates how a CVE ranks among all vulnerabilities:

95th percentile: Top 5% most likely to be exploited
90th percentile: Top 10% most likely to be exploited
50th percentile: More likely than half of all CVEs

Real-World Examples

Let's examine actual EPSS scores and their accuracy:

Case 1: Log4Shell (CVE-2021-44228)

{
	"cve_id": "CVE-2021-44228",
	"epss_history": [
		{
			"date": "2021-12-10",
			"score": 0.9756,
			"percentile": 0.999,
			"note": "Day of disclosure"
		},
		{
			"date": "2021-12-11",
			"score": 0.9823,
			"percentile": 0.9995,
			"note": "Mass scanning observed"
		},
		{
			"date": "2021-12-15",
			"score": 0.9945,
			"percentile": 0.9999,
			"note": "Widespread exploitation"
		}
	],
	"actual_exploitation": true,
	"time_to_exploit": "< 24 hours"
}

Analysis: EPSS correctly identified extreme risk immediately upon disclosure.

Case 2: High CVSS, Low EPSS

{
	"cve_id": "CVE-2023-12345",
	"cvss": {
		"baseScore": 9.8,
		"baseSeverity": "CRITICAL"
	},
	"epss": {
		"score": 0.00234,
		"percentile": 0.456
	},
	"actual_exploitation": false,
	"reason": "Affects obscure product with minimal deployment"
}

Analysis: Despite critical CVSS score, EPSS correctly predicted low exploitation likelihood.

Advanced EPSS Applications

1. Predictive Patch Management

def prioritize_patches_with_epss(vulnerabilities):
    """
    Create intelligent patching schedule using EPSS
    """
    # Enrich with EPSS data
    for vuln in vulnerabilities:
        epss_data = cybersecfeed_api.get_cve(vuln['cve_id'])['epss']
        vuln['epss_score'] = epss_data['score']
        vuln['epss_percentile'] = epss_data['percentile']

    # Multi-factor prioritization
    priorities = []
    for vuln in vulnerabilities:
        priority_score = calculate_priority(
            epss_score=vuln['epss_score'],
            cvss_score=vuln['cvss_score'],
            asset_criticality=vuln['asset_criticality'],
            exposure_level=vuln['exposure_level']
        )

        priorities.append({
            'cve_id': vuln['cve_id'],
            'priority_score': priority_score,
            'patch_deadline': calculate_deadline(priority_score)
        })

    return sorted(priorities, key=lambda x: x['priority_score'], reverse=True)

2. Resource Allocation Optimization

def optimize_security_resources(team_capacity, vulnerabilities):
    """
    Allocate security resources based on exploitation probability
    """
    # Calculate expected risk reduction
    risk_reductions = []

    for vuln in vulnerabilities:
        expected_impact = (
            vuln['epss_score'] *  # Probability of exploitation
            vuln['potential_impact'] *  # Business impact if exploited
            vuln['asset_count']  # Number of affected assets
        )

        remediation_effort = estimate_remediation_effort(vuln)

        roi = expected_impact / remediation_effort

        risk_reductions.append({
            'cve_id': vuln['cve_id'],
            'roi': roi,
            'effort': remediation_effort
        })

    # Optimize allocation
    allocated = []
    remaining_capacity = team_capacity

    for item in sorted(risk_reductions, key=lambda x: x['roi'], reverse=True):
        if item['effort'] <= remaining_capacity:
            allocated.append(item['cve_id'])
            remaining_capacity -= item['effort']

    return allocated

3. Threat Hunting Prioritization

def generate_hunt_hypotheses(environment_profile):
    """
    Generate threat hunting priorities based on EPSS trends
    """
    # Get CVEs affecting our environment
    our_cves = get_environment_cves(environment_profile)

    # Identify rising EPSS scores
    hunting_priorities = []

    for cve in our_cves:
        epss_trend = analyze_epss_trend(cve['id'], days=7)

        if epss_trend['slope'] > 0.1:  # Significant increase
            hunting_priorities.append({
                'cve_id': cve['id'],
                'current_epss': epss_trend['current'],
                'trend': epss_trend['slope'],
                'hunt_hypothesis': generate_hypothesis(cve),
                'detection_signatures': get_detection_patterns(cve)
            })

    return sorted(hunting_priorities, key=lambda x: x['trend'], reverse=True)

EPSS vs Other Prediction Methods

Comparative Analysis

Method	Accuracy	Update Frequency	Predictive Window	Data Sources
EPSS	82% precision @ 10% FPR	Daily	30 days	Multi-source ML
CVSS	N/A (not predictive)	Static	N/A	Technical only
Vendor Severity	~45% correlation	Sporadic	N/A	Vendor assessment
Threat Intel	~60% coverage	Real-time	0 days (reactive)	Observed attacks

Performance Metrics

EPSS performance based on 2023 data:

Precision at 10% FPR: 82%
Recall at 10% threshold: 74%
AUC-ROC: 0.94
Coverage: 100% of published CVEs

Common EPSS Misconceptions

Misconception 1: "High EPSS = Immediate Exploitation"

Reality: EPSS predicts probability within 30 days, not immediate timeline.

def interpret_epss_timeline(epss_score):
    """
    Proper interpretation of EPSS scores
    """
    if epss_score > 0.8:
        return "High probability within 30 days - not necessarily today"
    elif epss_score > 0.5:
        return "Moderate probability - monitor closely"
    else:
        return "Lower probability - standard monitoring"

Misconception 2: "Low EPSS = Safe to Ignore"

Reality: EPSS predicts wild exploitation, not targeted attacks.

def comprehensive_risk_assessment(cve_data):
    """
    EPSS is one factor in comprehensive risk
    """
    risk_factors = {
        'epss_score': cve_data['epss']['score'],
        'targeted_threat': assess_targeted_risk(cve_data),
        'regulatory_requirement': check_compliance_mandate(cve_data),
        'business_criticality': evaluate_business_impact(cve_data)
    }

    # Low EPSS but high targeted risk still requires action
    if risk_factors['targeted_threat'] > 0.7:
        return "PRIORITY"

    return calculate_overall_risk(risk_factors)

Misconception 3: "EPSS Replaces Security Expertise"

Reality: EPSS augments, not replaces, human judgment.

Implementing EPSS in Your Security Program

Step 1: Integration with Existing Tools

# CyberSecFeed API integration for EPSS data
curl -H "X-API-Key: your-api-key" \
  "https://api.cybersecfeed.com/api/v1/cves?epss_min=0.5&limit=100"

Step 2: Workflow Automation

def automated_epss_workflow():
    """
    Daily EPSS-based vulnerability management
    """
    # Morning report
    high_epss_cves = get_high_epss_affecting_us()

    # Auto-create tickets for high probability
    for cve in high_epss_cves:
        if cve['epss']['score'] > 0.8:
            create_emergency_ticket(cve)
        elif cve['epss']['score'] > 0.5:
            create_priority_ticket(cve)

    # Trend analysis
    rising_threats = identify_rising_epss_scores()
    alert_security_team(rising_threats)

    # Update dashboards
    update_risk_metrics(high_epss_cves)

Step 3: Metrics and Reporting

def generate_epss_metrics():
    """
    Track EPSS program effectiveness
    """
    metrics = {
        'coverage': calculate_epss_coverage(),
        'prediction_accuracy': measure_prediction_accuracy(),
        'time_to_patch': {
            'high_epss': calculate_patch_time(epss_threshold=0.8),
            'medium_epss': calculate_patch_time(epss_threshold=0.5),
            'low_epss': calculate_patch_time(epss_threshold=0.1)
        },
        'prevented_incidents': estimate_prevented_exploits(),
        'resource_optimization': calculate_resource_savings()
    }

    return metrics

The Future of EPSS

Emerging Enhancements

Sector-Specific Models: Healthcare, finance, and critical infrastructure variants
Extended Prediction Windows: 60 and 90-day forecasts
Exploit Sophistication Scoring: Predicting not just if, but how
Integration with ATT&CK: Mapping to likely attack techniques

Research Directions

class NextGenEPSS:
    """
    Future EPSS enhancements under development
    """
    def __init__(self):
        self.features = {
            'geopolitical_factors': "Nation-state interest indicators",
            'economic_incentives': "Ransomware profitability analysis",
            'defensive_posture': "Global patch adoption rates",
            'ai_generated_exploits': "Automated exploit generation probability"
        }

    def predict_campaign_likelihood(self, cve_id):
        """
        Predict organized campaign probability
        """
        base_epss = self.get_base_epss(cve_id)
        campaign_multiplier = self.calculate_campaign_factors(cve_id)

        return base_epss * campaign_multiplier

Real-World Success Stories

Case Study 1: Financial Services Firm

Challenge: 50,000+ vulnerabilities across infrastructure Solution: EPSS-based prioritization Results:

73% reduction in critical exposure time
60% decrease in emergency patches
$2.3M annual cost savings

Case Study 2: Healthcare Network

Challenge: Limited security resources, critical systems Solution: EPSS + asset criticality matrix Results:

Zero exploitation of EPSS > 0.7 vulnerabilities
45% improvement in patch efficiency
Maintained 100% uptime for critical systems

Conclusion: Predicting the Future of Security

EPSS represents a fundamental shift in vulnerability management—from reactive patching to predictive defense. By leveraging machine learning to analyze vast amounts of threat data, EPSS enables security teams to:

Focus on real threats: Prioritize the 5% of vulnerabilities likely to be exploited
Optimize resources: Allocate effort based on actual risk, not theoretical severity
Reduce exposure windows: Patch high-probability vulnerabilities before exploitation
Measure effectiveness: Track prediction accuracy and improve over time

The question is no longer "Which vulnerabilities are severe?" but "Which vulnerabilities will attackers actually exploit?" EPSS provides the answer, transforming security operations from an endless game of catch-up to a strategic, data-driven discipline.

Start Predicting Exploitation Today: CyberSecFeed provides real-time EPSS scores for every CVE, updated daily with the latest machine learning models. Try our API free for 30 days and see which vulnerabilities really matter.

Additional Resources

FIRST EPSS Model Documentation
EPSS Research Papers
CyberSecFeed EPSS API Guide
EPSS Score Calculator

About the Authors

Dr. Priya Patel is the Chief Technology Officer at CyberSecFeed, leading research in predictive security analytics and machine learning applications in cybersecurity.

Sarah Rodriguez is the Vulnerability Research Lead at CyberSecFeed, specializing in exploitation prediction and risk quantification methodologies.

The Prediction Challenge: Why Traditional Methods Fail​

What is EPSS? The Technical Foundation​

Core Components​

The Machine Learning Pipeline​

How EPSS Works: From Data to Prediction​

Step 1: Data Collection​

Step 2: Feature Engineering​

Step 3: Model Training​

Step 4: Prediction Generation​

Understanding EPSS Scores: What the Numbers Mean​

Score Interpretation​

Percentile Context​

Real-World Examples​

Case 1: Log4Shell (CVE-2021-44228)​

Case 2: High CVSS, Low EPSS​

Advanced EPSS Applications​

1. Predictive Patch Management​

2. Resource Allocation Optimization​

3. Threat Hunting Prioritization​

EPSS vs Other Prediction Methods​

Comparative Analysis​

Performance Metrics​

Common EPSS Misconceptions​

Misconception 1: "High EPSS = Immediate Exploitation"​

Misconception 2: "Low EPSS = Safe to Ignore"​

Misconception 3: "EPSS Replaces Security Expertise"​

Implementing EPSS in Your Security Program​

Step 1: Integration with Existing Tools​

Step 2: Workflow Automation​

Step 3: Metrics and Reporting​

The Future of EPSS​

Emerging Enhancements​

Research Directions​

Real-World Success Stories​

Case Study 1: Financial Services Firm​

Case Study 2: Healthcare Network​

Conclusion: Predicting the Future of Security​

Additional Resources​

About the Authors​