0:16:24: Q1. Is the approach ML/MAP/Bayesian? ANS: ML
0:23:00: Q2 what is \(p(x|z;\theta)\) in terms of \(\mathcal{N}\)(variable, mean, variance)? ANS: \(\mathcal{N}(x; mu_z, sigma_z)\)
0:27:13: Q3. What is \(p(M;\theta)\)? ANS: \(w_M\)
0:45:24: Q4. What is the second term? ANS: \(KL(q(z^N)||p(z^N|x^N;\theta))\)
0:48:29: Q5. What is the second term inside ELBO? ANS: \(H(q(z^N))\)
0:51:45: Q6. What is \(q(z^N)\) to minimize the KL-divergence? ANS: \(p(z^N|x^N;theta)\)
1:00:46: Q7, what should be the \(mu_Z\) updated to? ANS: \(\frac{\sum_i q_i(z)x_i} {\sum_i q_i(z)}\)
1:14:44: Q8 What do you think about the update of \(w_M\)? ANS: \(\sum_i q_i(M)\)
1:36:04: Q9. Why can't we set \(\frac{\partial{ELBO}}{\partial (w_M)} =0\) to find \(w_M\)? ANS: because it is a constrained optimization problem with additional constraint \(w_M+w_F=1\)
1:46:45: Q10. what is \(\tilde{f}(x)\) when \(g(x)=C\)? ANS: \(f(x)\)
1:48:16: Q11 what is \(\tilde{f}(x)\) when \(g(x) \neq C\)? ANS: \(-\infty\)
1:53:29: Q12. should \(\lambda\ge 0\) or \(lambda \le 0\)? ANS: \(\lambda\ge 0\)
2:22:26: Q13. Why \(H(M|C) \le H(M,K|C)\)? ANS: because \(H(M,K|C)-H(M|C)=H(K|M,C)\ge 0\)
2:26:44: Q14 Why \(H(K|C) \le H(K)\)? ANS: \(H(K)-H(K|C)=I(K;C)\ge 0\)
2:43:54: Q15. Which test I should pick first? ANS: It depends. Probably anyone except accent.
2:47:13: Q16, what is the training accuracy for the shadow test? ANS: 6/8