2021-02-07

基于图的文档级关系抽取

1.概述

文档级关系抽取中的实体对通常贯穿于文档的多个句子中，和句子级的关系抽取相比需要更多的信息。

文档级关系抽取任务描述：
输入：实体1、实体2和文档
输出：两个实体之间的关系

为此，需要设计新的模型来获取更多的信息。为此，有了下面的方法。
(1) GCNN (Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network)
(2) EoG (Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs)
(3) GAIN (Double Graph Based Reasoning for Document-level Relation Extraction)(Graph Aggregation-and-Inference Network)

2.模型

2.1 GCNN

使用了GCNN网络，和普通的GCN类似，但是结构有所不同。

2.1.1 结构

节点：文档中的word包含
(1) word的词向量
(2) word与目标实体的相对位置
![GCNN_example.png]

连线：关系
(1) Syntactic dependency edge:句法依赖
(2) Coreference edge:指代
(3) Adjacent sentence edge:相邻的上下句
(4) Adjacent word edge:相邻的上下word
(5) self node edge:自身

2.1.2 算法

对每一种关系图进行GCN的运算，然后将不同类型的关系图进行加权相加之后得到最后的结果。

最后使用了两个不同的全连接层，分别对应前一个和后一个实体。

![GCNN.png]

2.2 EoG

使用了边的图，而不是基于节点。与之前的模型不同。这个模型有如下的特点。
(1) 包含实体的mentions对实体之间的关系很重要
(2) 实体对中实体的关系可以通过节点间的路径来表示，而不是基于节点

2.2.1 结构

节点：
(1)Mention node
与实体相关的mention的word embedding的平均
(2)Entity node
该实体的所有mention node的平均
(3)Sentence node
句子中的word embedding平均

连线： M,E,S:Mention,Entity,Sentence
(1)MM 同一个Sentence中的两个mention
(2)ME
(3)MS
(4)ES
(5)SS

![EoG.png]

2.2.2 算法

进行EoG算法，然后得到实体到实体的边的表示：
(1) 两个实体节点，选择其中的一个节点作为中间节点，然后，该路径的两条边通过神经网络，然后得到了这条路径的表示。
(2) 如果在实体之间已经存在了路径的表示，那么，将所有的路径与通过路径的两条边进行混合
(3) 将上面的步骤进行多次，就得到了混合后的实体到实体的边表示

最后，通过它进行分类。

2.3 GAIN

GAIN图聚合推理网络继承了EoG模型，三个问题:
(1) 同一个关系的主和宾可能位于不同的句子里面，无法通过一个句子来得到关系
(2) 同一个实体可能出现在不同的句子里面就，所以需要句子间的信息来表示实体
(3) 有的关系需要逻辑推理
得到了新的GAIN模型

这个模型包含两个图结构：
(1) hMG (heterogeneous mention-level graph)异构mention级别图
针对文档中不同mentions之间的互相关系

node：mention node 和 document node
(1) mention node 表示每一个mention
(2) document node 虚拟节点，对document信息，中继节点，有利于交互
edge:intra-entity edge,inter-entity edge,document edge
(1) intra-entity edge 同一个entity的mention
(2) inter-entity edge 一个句子内的不同entity的mention
(3) document edge 所有mention和document node连接
然后使用GCN得到mention的文档级表示，将每个节点的所有层的表示都concat，作为节点的最终表示

(2) EG (entity-level graph)实体级别图
用所有的mention的表示平均作为entity的表示
根据路径推理机制来推理实体对之间的关系，能够允许模型实现多跳关系推理
将相同entity的hMG中mention融合

模型分为4个部分:
(1) encoding module:
输入 word embedding,entity type,coreference type
通过encoding得到这一层的输出。
(2) mention-level graph aggregation module

(3) entity-level graph inference module
(4) classification module