How Graphene Works: AI-Generated Attack Graphs for Infrastructure Security Analysis

Breaking down how AI-Generated Attack Graphs can be used for Comprehensive Security Posture Assessment

Paulo Nascimento

Sep 20, 2024

👋 Hey there, my name is Paulo.

Welcome to my blog where I write about AI, Security, and Product.

Subscribe to see more content

References:

https://arxiv.org/html/2312.13119v2#S3

The other day, I was playing around with knowledge graphs when I stumbled upon the paper, “Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs”.

Finding vulnerabilities is easy— checking if they are actually impactful (exploitable and assessing their risk) is not.
It’s like having to point out a specific violin in a five thousand person orchestra.
This complexity is due to all the moving components of a production environment: networking, servers, versioning, etc.

Graphene takes a holistic approach by analyzing each security layer in the environment which includes: hardware, system, network, and cryptography.
By extracting insights from each layer, Graphene reveals how vulnerabilities can be exploited within and across layers, stringing them together.

an attack graph can be defined as a structured representation of the potential paths an attacker can take to compromise a network or system by exploiting vulnerabilities (idika2010extending,; aksu2018automated,)

Graphene's ultimate goal is to create graphs. Each step in the process contributes to this overarching objective, remember that.

Here’s a high-level of Graphene’s pipeline:

Refer to caption — Data Curation → ML Processing → Attack Graph Construction → Risk Analysis

Phase 1: Data Ingestion

Ingest data from the network infrastructure, network topology, communicating entities, and device specifications.

Then for each application & device, retrieve the respective CVE (Common Vulnerabilities and Exposures) disclosures that pertain to it.

Output: a list of CVEs pertaining to specific applications & devices

Next step: Pass this list to the next phase to understand its semantic meaning.

Phase 2: Data Preprocessing

Goal: Construct attack graph nodes

TLDR: Use an LLM to extract entities from CVE descriptions, then map these entities to construct attack graph node attributes.

At this point, we want to break down each CVE text description into attack graph node attributes so we can start building context.

For each CVE we will want to extract the following node attributes:

Precondition: “the preconditions required for an adversary to exploit a vulnerability”
Postcondition: “the result after exploiting the vulnerability”
Input: “the actions that attackers need to take to trigger the vulnerability and perform the exploit”
Output: “the final values or results that the system returns or produces when exploits to vulnerabilities are executed”

Based on the MITRE CVE Template, each vulnerability generally has these entities listed in its description:

vulnerability type
affected product
root cause
impact
attacker type
attack vector

Graphene maps these entities to these attack graph node attributes:

affected product → Precondition
vulnerability type → Postcondition
attacker type + root cause → Inputs
impact + attack vector → Outputs

OPTIONAL: If the CVE contains “manual evaluation scores such as exploitability, severity, and impact score” we can use that as extra data corresponding to our attack graph node.

This entity-to-attribute mapping gives us a consistent and reliable way to extract the necessary node attributes (precondition, postcondition, input, and output).

Here’s a concrete example of what the precondition, postcondition, input, and output look like for CVE-2020-5679:

Precondition: “Improper restriction of rendered UI layers or frames in EC-CUBE versions from 3.0.0 to 3.0.18”
Postcondition: “logged into the administrative page”
Input: “a user accesses a specially crafted page”
Output: “clickjacking attacks”

To extract the node attributes,

→ send CVE description to an LLM

→ ask the LLM to run entity extraction (based on: “vulnerability type”, “affected product”, “root cause”, “impact”, “attacker type”, “attack vector”)

→ follow the entity-to-attribute map to construct each attribute

Output: A graph node for each CVE.

Next step: Pass each graph node to the next phase to start constructing the attack graph.

Phase 3: Attack Graph Construction

Goal: Construct a comprehensive attack graph

TLDR: Create a multi-layered graph representing potential attack paths by connecting different types of nodes (attacker, CVEs, CWEs) based on their semantic relationships and vulnerability data.

Now that we have several nodes, it's time to start creating the graph. Each node we've created represents a vulnerability or potential attack point in our system. In this phase, we'll connect these nodes to form a comprehensive picture of possible attack paths.

The attack graph in Graphene consists of three primary node types:

Attacker nodes (source nodes)
1. The starting point of potential attacks in the graph.
CVE nodes associated with network entities (intermediate nodes)
1. Represents the vulnerabilities that could be exploited in an attack chain.
CWE nodes serving as attack targets (sink nodes)
1. Serve as the end goal/target in the attack graph.

Graph Construction Process:

Attacker to CVE connections:
- By default, connect the attacker node to each CVE node.
- Edge weights are determined by CVSS base scores, indicating exploit likelihood and vulnerability severity.
- This approach generates all conceivable attack scenarios, including those requiring specific access (e.g., physical access to a device).
CVE to CVE connections:
- Connect CVE nodes if the postcondition of one aligns with the precondition of another.
- Use word embeddings to capture semantic meaning of node attributes (preconditions, postconditions, inputs, outputs).
- Calculate similarity scores between node attributes using cosine similarity.
- Edge weights are based on: a) CVSS scores b) Node matching scores derived from word-embedding-based semantic similarity
CVE to CWE connections:
- Link each CVE node to its associated CWE node(s) based on CVSS database information.
- Edge weights are determined by CVSS scores.

The graph construction is possible through:

Semantic Matching: Allows us to accurately capture relationships between vulnerabilities by comparing their node attribute cosine similarity scores.
Configurable Thresholds: Users can set similarity score thresholds for edge creation, allowing control over graph complexity and focus on the most relevant attack paths.

If you want to optimize the graph construction, you can prune the edges with low matching scores or CVSS-based weights to focus on the most feasible attack paths.

The final attack graph provides a representation of potential attack vectors and allows users to get an in-depth analysis of the system vulnerabilities as well as their connections to each other.

Output: A comprehensive attack graph.

Next step: Pass the attack graph to the next phase to analyze the security posture.

Phase 4: Risk Scoring System

Goal: Assess the overall security posture by analyzing the attack graph

TLDR: Evaluate and score risks associated with the attack paths, identify critical vulnerabilities, and provide actionable insights for security enhancement.

Now that we have constructed our attack graph, we move on to a crucial phase: risk assessment. This phase transforms our graph into actionable intelligence, helping us understand the most significant threats and prioritize our security efforts.

Let’s take a look at the key components of the Risk Scoring System:

Edge Score Calculations:
- Edge Exploitability Score (EES): Measures how easily an attacker can exploit a particular edge in the attack graph. It considers both the immediate vulnerability and the chain of vulnerabilities leading up to it.
- Edge Impact Score (EIS): Assesses the potential damage if a vulnerability is exploited.
- Edge Risk Score (ERS): Combines exploitability and impact, weighted by the edge's importance.
Graph-level Scores:
- Compute overall exploit, impact, and risk scores for the entire graph.
Critical Path Identification:
- Find the shortest (most exploitable) paths to attacker goals.
- Identify high-severity attack paths based on risk, exploitability, and impact.
Key Vulnerability Analysis:
- Pinpoint vulnerabilities present in multiple attack paths (high-degree nodes).
- Determine the minimum set of vulnerabilities that, if patched, would disrupt all attack paths.

To perform all this analysis, Graphene relies on:

Score Calculations: Uses CVSS standards as a baseline and normalizes scores on a scale of 0 (low) to 10 (high) for interpretability.
Path Analysis: uses graph algorithms to find the most critical paths.
Vulnerability Prioritization: Uses degree centrality to identify key vulnerabilities and a minimum vertex cover algorithm to find the most efficient patching strategy.

Output: Risk scores, a list of attack paths by severity, identification of the most critical vulnerabilities, and a minimum set of vulnerabilities to patch for the maximum security improvement. Think of the last one being “how can I get the best security return on investment for each vulnerability I fix?”.

Graphene allows us to adapt to CVE descriptions for a deeper understanding of vulnerabilities across and within infrastructure layers. Let me know if you want to see an implementation of this!

Disclaimer: all images and quotes are from the Graphene paper linked above.

Cheers,

Paulo

Paulo’s Blog