HW3

Due on May 1 (30 points)

Repeat HW2 with RNNs and attention networks. Please show plots, codes, and compare results.
1. (10 points) Repeat HW2 with RNN, you are free to design your own network architecture. But you need to have at least one LSTM layer.
2. (10 points) Repeat HW2 with attention network, you are again free to design your own architecture. Just you need to have at least one self-attention layer.
3. (10 points) Use knowledge distillation to try to shrink one of your model by half (half the number of layers or the number of parameters per layer)
4. (Extra credits) As in HW2, extra credit will be given when test MSE is below 125. You can score maximum extra 10 points for each your models.