摘要:
利用具有多源、异构、重叠等特征的爆炸事故调查报告建立意外爆炸毁伤知识图谱,对进行数据驱动的爆炸评估以及溯源具有重要的作用。针对意外爆炸事故调查数据中存在重叠和嵌套事件的特点,采用以事件联合抽取为核心的知识图谱构建方法以及爆炸调查报告构建了意外爆炸毁伤知识图谱;通过余弦相似度在知识图谱中检索类似爆炸事件并采用贝叶斯分类方法进行分类,较准确地实现了对贝鲁特港口爆炸事故爆炸源物资种类的确定。知识图谱构建结果表明,在意外爆炸毁伤语料库上的事件分类以及事件元素分类分析,相较于现有抽取模型,提出的基于动态掩码的事件联合抽取方法的F1值分别提高至少2%和5.4%。溯源分析表明,基于知识图谱的溯源与传统的人工溯源相比,速度和准确性都有较大的提高。
Abstract:
Utilizing accident explosion investigation reports characterized by multi-source, heterogeneous, and overlapping information to construct an accidental explosion damage knowledge graph plays a significant role in enabling data-driven explosion assessment and traceability. To extract the overlapping and nested event commonly present in accidental explosion investigation data, a knowledge graph construction method centered on joint extraction of Chinese events was proposed. This construction process involved four key steps: semi-automatic ontology construction, corpus building, joint extraction of Chinese event based on dynamic mask, and event coreference resolution. During semi-automatic ontology construction, TextRank was employed to compute keyword importance scores, and K-Means clustering was applied to identify core domain terminology, then, domain knowledge was used to analyze the clustering results. 13 top-level ontologies and four types of accidental explosion events were obtained. Subsequently, a corpus for accidental explosion damage was built by annotating texts gathered from online explosion investigation reports. The annotation was performed using a method that combines human effort and prior knowledge. To enhance the capability of recognizing overlapping and nested events, a GlobalPointer layer was integrated into the pre-trained RoBERTa(Robustly optimized Bidirectional Encoder Representations from Transformers approach) to form the RoBERTa-GPointer model in this paper, and the RoBERTa-GPointer model was trained on the constructed corpus. Finally, extracted entities was aligned based on semantic and syntactic similarity, and the extracted events with same time and location were aligned. The constructed knowledge graph was applied to trace the explosion source of the Beirut port explosion on August 4, 2020. Initially, common explosion source materials were categorized into 11 types based on domain knowledge. The accidental explosion knowledge graph was then vectorized using confidence-optimized embedding. Descriptive texts of the Beirut port explosion phenomena were extracted, and cosine similarity was used to retrieve a set of similar historical explosion cases from constructed knowledge graph. Finally, a Bayesian classifier was applied to predict the type of explosion source material. The traceability analysis correctly identified the explosion source of the Beirut port explosion. The experimental results of knowledge graph construction show that the proposed RoBERTa-GPointer improves the F1 scores for event classification and event element classification on the accidental explosion damage corpus by at least 2% and 5.4%, respectively, compared to existing extraction models. The traceability results demonstrate that the knowledge graph-based approach offers substantial improvements in both speed and accuracy compared to traditional manual traceability methods. Furthermore, it also demonstrates that the constructed knowledge graph for accidental explosions can be adapted to downstream applications related to damage effects.