2020-12-15

LSTM分析

关于LSTM模型:

参考:https://colah.github.io/posts/2015-08-Understanding-LSTMs/

细胞状态是LSTM的核心，如下图所示。

细胞状态就像是传送带，在整个链条中一直延伸，只有一些小的线性作用，有利于信息不加改变地流动。

LSTM模型通过门结构调节，具有向细胞状态删除或者增加信息的能力。

有三个门结构，来保护控制细胞状态。

forget gate layer

我们首先需要确定之前的信息需要忘记多少
取决于h_t-1和x_t,输出0到1之间的数字。
1表示完全保留之前的信息C_t-1,0表示完全忘记之前的信息

input gate layer

然后，我们需要决定当前细胞状态需要加入多少新的信息。
这里有两个部分，第一个是sigmoid函数，叫做输入门层，决定了我们更新的系数i。然后一个tanh函数建立了一个候选的数值Ct，在下一步中，我们将会将这两个合并去更新状态。

更新旧的状态Ct-1

这样，看上面的公式就很简单，首先是对之前的状态乘以一个系数，忘记一些信息，然后增加当前的状态乘以一个系数，记住一些当前的信息。

最后，我们应该决定一个节点的输出结果了。这个就诶过取决于我们的细胞状态，但是是过滤后的版本。

我们先使用sigmoid层得到的我们应该输出的东西，然后哦将细胞状态通过tanh得到系数，将两个相乘，得到当前的状态h_t

LSTM有多个变种，这里就跳过。

其中比较重要的是BiLSTM,就是两个方向的LSTM的输出状态拼接到一起:

最后补充一下，在写代码的时候，使用的torch.nn.LSTM的官方资料说明:

下面是参数的说明:

input_size – The number of expected features in the input x

hidden_size – The number of features in the hidden state h

num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1

bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True

batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False

dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional – If True, becomes a bidirectional LSTM. Default: False