Evaluating the CAFIIE Method for Event Extraction in Cyber Threat Intelligence
In the realm of Cyber Threat Intelligence (CTI), the ability to accurately extract events and their attributes from textual data is paramount. This article delves into the performance evaluation of the proposed CAFIIE (Context-Aware Feature Interaction for Information Extraction) method against baseline methods, focusing on event type extraction accuracy and multiple event attribute extractions, quantified using F1 scores.
Experiment Setup
Datasets
The evaluation of the CAFIIE method is grounded in a robust set of datasets, as outlined in Table 2. The datasets include both public and private sources, meticulously curated to ensure comprehensive testing.
-
CASIE Dataset: This dataset comprises 1,000 articles selected from a pool of 5,000 cyber news articles, encompassing five event types and over 20 event arguments. It serves as a foundational dataset for evaluating event extraction methods in cybersecurity.
-
DNRTI Dataset: A large-scale dataset designed for named entity recognition, DNRTI contains over 6,500 annotated sentences and 36,400 entities categorized into 13 types, including Hacker Organization and Attack.
-
MalwareTextDB: This dataset features 6,819 tagged sentences and 10,983 tagged entities from 39 Advanced Persistent Threat (APT) reports, categorized into Action, Entity, and Modifier.
- Private Dataset: Compiled from 1,000 cyber threat event news articles sourced from an open CTI website in 2023, this dataset includes 7,346 named entities, categorized into five event types and 23 event arguments.
Evaluation Metrics
To assess the performance of the CAFIIE method, commonly used evaluation metrics were adopted: precision (P), recall (R), F1 score (F1), and accuracy (Accu). The formulas for these metrics are as follows:
[
P = frac{N{TP}}{N{TP} + N{FP}}, quad R = frac{N{TP}}{N{TP} + N{FN}}, quad F1 = 2 times frac{P times R}{P + R}, quad Accu = frac{N{TP} + N{TN}}{N{TP} + N{FN} + N{FP} + N{TN}}
]
Where (N{TP}), (N{FP}), (N{TN}), and (N{FN}) represent the components of the confusion matrix.
Comparison Models
The CAFIIE method was compared against eight high-performing information extraction algorithms, including CRF, Naivebayes-CRF, BiLSTM-CRF, and various BERT-based models. Each algorithm was uniformly integrated with a Conditional Random Field (CRF) layer to ensure a fair comparison.
Hyperparameter Configuration of CAFIIE
The CAFIIE architecture incorporates several layers, including Domain-Word2Vec/BERT Embedding, Dense Features, BiLSTM, Attention, and CRF. The initial word embedding dimension was standardized at 100, consistent with the CASIE experiment. The optimal parameter settings were identified through extensive experimentation, as detailed in Table 4.
Experimental Results
Event Type Detection
Using the CTI event-type annotations from the CASIE dataset, the CAFIIE method demonstrated superior detection performance. Notably, the “I-Phishing” event type achieved a detection rate of 87%, surpassing the highest recorded rate of 85% in the CASIE dataset. This improvement underscores the effectiveness of the CAFIIE method in enhancing cybersecurity event type detection.
Event Argument Detection
The CAFIIE method was also evaluated for its ability to extract event arguments across three public datasets and one private dataset. The results, summarized in Table 6, indicate that CAFIIE consistently outperformed baseline methods in extracting key entity types. The method’s implementation in various configurations revealed that the Base+BERT+FI+BiLSTM+ANN+CRF model achieved the highest precision across all datasets.
Few-Shot Learning Scenarios
In the context of few-shot learning, the CAFIIE method was tested on the CASIE and private datasets. The experiments simulated few-shot scenarios with limited samples, affirming the efficacy of the FI-based algorithm in extracting CTI information even with sparse data.
Ablation Study
An ablation study was conducted to evaluate the impact of different components on the CAFIIE method. Results indicated that the absence of the FI component led to the poorest performance, highlighting its critical role in enhancing contextual understanding. Conversely, the absence of the ANN component yielded the best performance, suggesting a nuanced interplay between the components.
Conclusion
The CAFIIE method represents a significant advancement in the extraction of events and attributes from CTI texts. Through comprehensive evaluations across multiple datasets and scenarios, it has demonstrated superior performance compared to baseline methods. The findings underscore the importance of innovative approaches, such as interactive feature mining, in addressing the complexities of cybersecurity information extraction. As the field continues to evolve, the CAFIIE method stands poised to contribute to the development of more sophisticated cybersecurity knowledge graphs and enhance the overall efficacy of threat intelligence efforts.