HW3

Due on Apr 27 (30 points)

Repeat HW2 with RNNs and attention networks. Please show plots, codes, and compare results.
1. (10 points) Repeat HW2 with RNN, you are free to design your own network architecture. But you need to have at least one LSTM layer.
2. (10 points) Repeat HW2 with attention network, you are again free to design your own architecture. Just you need to have at least one self-attention layer.
3. (10 points) Use knowledge distillation to try to shrink one of your model by half (half the number of layers or the number of parameters per layer)
4. (Extra credits: 10 points) Repeat HW2 with Mamba network, you are again free to design your own architecture. Just you need to have at least one Mamba layer.

Model Design Challenge (Extra credit: Maximum 20 points) 🚀

Try to design a network that uses no more parameters than the one already constructed. Write down the number of parameters for both your network and the previously constructed network.

Scoring Criteria:

🏆 Up to 10 points if your network's test MSE is less than 120 (get all 10 points if less than 20) for any of your models.
🏅 An additional 10 points for the top-performing model.

Submission Requirements:

Please share the training procedure, including logs, as a Jupyter Notebook through a platform like GitHub
You must share the model weights through a public platform like Hugging Face Model Hub and GitHub

⚠ Important Rules:

Do not use test data for training or hyperparameter tuning.
Any violation will result in disqualification from the competition.