Transformers udemy

How this word is related to other word in the sentence, use similarity function- similarity function has current hidden state and all the hidden States of the encoder- use softmax function- this is how attention work in RNN- this is how we used to address the sequence to sequence take in RNN- rnn is sequential and don’t has global state- seq2seq problem may only need global state, that is…