• Chapter 3 of book
  • kaggle notebook, “iterate like a grandmaster”
  • seq2seq
  • named entity recognition
  • hydrogen torch
  • text token classification
  • iirx spacy has displacy for these pretty ner visualisation
  • get links from chat
  • 2 lstm (encoder decoder) architecture to do language translation
  • paper
  • sayambutani com
  • Some models are based only on encoder, like bert
  • gpt series only based on decoder
  • t5 uses both encoder n decoder
  • how I read a paper by Yanic
  • 6 encoder with different weights
  • tensor2tensor model
  • multihead attention in pytorch
  • // divide n return an integer
  • scale values before applying softmax
  • connext paper to apply transformers to cnn is good
  • transformers for time series data may not give those good results
  • pytorch tabular by Manu
  • matrix multiplication only works with numbers and not words
  • transformers anatomy notebook
  • demystifyong queries, keys and values