HW1

Due on Feb 27

  1. Consider a softmax regression classifier with only two classes (first row of \(A\) corresponds to class 1 and second row of \(A\) corresponds to class 2). Let \(A\) be initialized as \( \begin{pmatrix}1 & 2 & 3 \\4 & 5 &6 \end{pmatrix}\) and we have data batch (with only two datapoints) \((X^{(1)},y^{(1)})=([1,0,0]^\top,1), (X^{(2)},y^{(2)})=([0,1,1]^\top,2)\)

    1. (10 points) Compute an update \(A\) for a single epoch with batch gradient descent with a learning rate \(\epsilon\) of \(0.01\). That is, \(A \leftarrow A - \epsilon \frac{dL}{dA},\) where \(L\) is the cross-entropy loss.

  2. (10 points) Try to build a linear regression model to predict the stock price of MSFT from the prices of several other stocks. You can download the Jupyter notebook of the question from here.

  3. Let's repeat the prediction in the last question using a neural networks

    1. (2 points) Split the MSFT stock price data into training (9/1/2021-11/30/2021), validation (12/1/2021-12/31/2021), and test (1/1/2022-1/31/2022) datasets.

    2. (10 points) Estimate the MSFT stock price again with a fully connected neural network with 5 hidden layers. Each hidden layers have 20 neurons. And use ReLU as activation function.

    3. (2 point) Plot the training loss/validation loss vs epoch.

    4. (4 point) Try different optimization algorithms, SGD, Momentum, Adam. Plot the training loss/validation loss VS epoch.

    5. (4 points) Set learning rate schedulers using OneCycleLR, CyclicLR, and ReduceLROnPlateau. PyTorch offers learning rate schedulers to change the learning rate over time. Check this tutorial for more instruction. Observe any difference in your training

    6. (2 points) Add dropout with drop probability of 0.4 to networks and train again. Also plot the training loss/validation loss vs epoch.

    7. (4 points) Plot the predicted MSFT price (with and without dropout) for the training data. In the same plot, please include the ground truth and the predicted price by linear regression from Q2.

    8. (2 points) Create the same plot above for the test data.