Evaluating the CAFIIE Method for Event Extraction in Cyber Threat Intelligence

In the realm of Cyber Threat Intelligence (CTI), the ability to accurately extract events and their attributes from textual data is paramount. This article delves into the performance evaluation of the proposed CAFIIE (Context-Aware Feature Interaction for Information Extraction) method against baseline methods, focusing on event type extraction accuracy and multiple event attribute extractions, quantified using F1 scores.

Experiment Setup

Datasets

The evaluation of the CAFIIE method is grounded in a robust set of datasets, as outlined in Table 2. The datasets include both public and private sources, meticulously curated to ensure comprehensive testing.

CASIE Dataset: This dataset comprises 1,000 articles selected from a pool of 5,000 cyber news articles, encompassing five event types and over 20 event arguments. It serves as a foundational dataset for evaluating event extraction methods in cybersecurity.
DNRTI Dataset: A large-scale dataset designed for named entity recognition, DNRTI contains over 6,500 annotated sentences and 36,400 entities categorized into 13 types, including Hacker Organization and Attack.
MalwareTextDB: This dataset features 6,819 tagged sentences and 10,983 tagged entities from 39 Advanced Persistent Threat (APT) reports, categorized into Action, Entity, and Modifier.
Private Dataset: Compiled from 1,000 cyber threat event news articles sourced from an open CTI website in 2023, this dataset includes 7,346 named entities, categorized into five event types and 23 event arguments.

Evaluation Metrics

To assess the performance of the CAFIIE method, commonly used evaluation metrics were adopted: precision (P), recall (R), F1 score (F1), and accuracy (Accu). The formulas for these metrics are as follows:

[
P = frac{N{TP}}{N{TP} + N{FP}}, quad R = frac{N{TP}}{N{TP} + N{FN}}, quad F1 = 2 times frac{P times R}{P + R}, quad Accu = frac{N{TP} + N{TN}}{N{TP} + N{FN} + N{FP} + N{TN}}
]

Where (N{TP}), (N{FP}), (N{TN}), and (N{FN}) represent the components of the confusion matrix.

Comparison Models

The CAFIIE method was compared against eight high-performing information extraction algorithms, including CRF, Naivebayes-CRF, BiLSTM-CRF, and various BERT-based models. Each algorithm was uniformly integrated with a Conditional Random Field (CRF) layer to ensure a fair comparison.

Hyperparameter Configuration of CAFIIE

The CAFIIE architecture incorporates several layers, including Domain-Word2Vec/BERT Embedding, Dense Features, BiLSTM, Attention, and CRF. The initial word embedding dimension was standardized at 100, consistent with the CASIE experiment. The optimal parameter settings were identified through extensive experimentation, as detailed in Table 4.

Experimental Results

Event Type Detection

Using the CTI event-type annotations from the CASIE dataset, the CAFIIE method demonstrated superior detection performance. Notably, the “I-Phishing” event type achieved a detection rate of 87%, surpassing the highest recorded rate of 85% in the CASIE dataset. This improvement underscores the effectiveness of the CAFIIE method in enhancing cybersecurity event type detection.

Event Argument Detection

The CAFIIE method was also evaluated for its ability to extract event arguments across three public datasets and one private dataset. The results, summarized in Table 6, indicate that CAFIIE consistently outperformed baseline methods in extracting key entity types. The method’s implementation in various configurations revealed that the Base+BERT+FI+BiLSTM+ANN+CRF model achieved the highest precision across all datasets.

Few-Shot Learning Scenarios

In the context of few-shot learning, the CAFIIE method was tested on the CASIE and private datasets. The experiments simulated few-shot scenarios with limited samples, affirming the efficacy of the FI-based algorithm in extracting CTI information even with sparse data.

Ablation Study

An ablation study was conducted to evaluate the impact of different components on the CAFIIE method. Results indicated that the absence of the FI component led to the poorest performance, highlighting its critical role in enhancing contextual understanding. Conversely, the absence of the ANN component yielded the best performance, suggesting a nuanced interplay between the components.

Conclusion

The CAFIIE method represents a significant advancement in the extraction of events and attributes from CTI texts. Through comprehensive evaluations across multiple datasets and scenarios, it has demonstrated superior performance compared to baseline methods. The findings underscore the importance of innovative approaches, such as interactive feature mining, in addressing the complexities of cybersecurity information extraction. As the field continues to evolve, the CAFIIE method stands poised to contribute to the development of more sophisticated cybersecurity knowledge graphs and enhance the overall efficacy of threat intelligence efforts.

Extraction of Cyberattack Events and Arguments Using Feature Interaction and Few-Shot Learning

Evaluating the CAFIIE Method for Event Extraction in Cyber Threat Intelligence

Experiment Setup

Datasets

Evaluation Metrics

Comparison Models

Hyperparameter Configuration of CAFIIE

Experimental Results

Event Type Detection

Event Argument Detection

Few-Shot Learning Scenarios

Ablation Study

Conclusion

Related articles

LNER, the UK Rail Operator, Confirms Cyber Attack Compromising Passenger Data

Jaguar Land Rover Prolongs UK Plant Closures Following Cyber Attack

Qantas Airways Takes Action Against CEO Following July Data Breach

Jaguar Land Rover Employees Advised to Work from Home Following Cyber Attack

Recent articles

UK Police Face Crisis as Digital Forensics Struggles to Keep Up with Surge

Himanshu Kumar Gupta, Senior Director of Government Business and Channels for India and SAARC at Trend Micro

Himanshu Kumar Gupta, Senior Director of Government Business and Channels for India and SAARC at Trend Micro

HUB Cyber Security Ltd. Names Romke E. de Haan III as Head of Cybersecurity Strategy and Innovation Division

Latest Updates

LNER, the UK Rail Operator, Confirms Cyber...

Jaguar Land Rover Prolongs UK Plant Closures...

Qantas Airways Takes Action Against CEO Following...

Popular

Cyber Threats Facing the Retail Industry This...

Top 5 Software Solutions for Compliance Automation

Enroll in This $45.99 Course Deal for...