ECE/TCOM-5583: Information Theory
This is a graduate introductory course on information theory. I have offered this course for almost ten years now. And I have decided to make some significant changes of materials for this offering. Past materials such as Kraft inequality and separation theorem are intellectually interesting. Unfortunately I found that most of my students (including myself) would not end up becoming information theorists and thus most of them do not feel connected with the topics.
Therefore, I plan to follow the approach of Professor Mackay for this offering. That is, try to incorporate core information theory materials (for example, information measures, AEP, source and channel coding theory) into statistical learning. Therefore, the main textbook will be Professor Mackay's textbook — Information Theory, Inference, and Learning Algorithms. Even that, I probably will not follow this book very closely. And I incline to borrow heavily with materials from the Internet. Two other most important reference texts are
And a nice reference on network information theory is
However, we won't go into network information theory for this introductory course.
Other Reference Materials
Shannon, C. E. (1948), A mathematical theory of communication. Bell Sys. Tech. J. 27: 379-423, 623-656.
R. W. Yeung, On entropy, information inequalities, and Groups.
Law of Large Number by Terry Tao.
A First Course in Information Theory, by Raymond W. Yeung, New York: Springer.
Information Theory and Reliable Communication by R. Gallager, New York: Wiley.
Information Theory by Csiszar and Korner, New York: Academic Press.
Entropy and Information Theory by R. M. Gray, Springer-Verlag, 1990.
Probability, Random Processes, and Ergodic Properties by R. M. Gray, Springer-Verlag, 1988.
J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding Belief Propagation and its Generalizations in Exploring Artificial Intelligence in the New Millennium: Science and Technology Books, 2003.
S. Verdu, "Fifty years of Shannon theory," Information Theory, IEEE Transactions on, vol. 44, pp. 2057-2078, 1998.
I. Csiszar, The method of types, Information Theory, IEEE Transactions on, vol. 44, pp. 2505-2523, 1998.
Information Theoretic Inequality Prover
Link to other course materials
Office Hours
There are no “regular” office hours. But you are very likely to be able to find me everyday in the afternoon. And you are welcome to come catch me anytime or contact me through emails.
Course Syllabus (Tentative)
Probability review
Maximum likelihood estimator, MAP, Bayesian estimator
Graphical models and message passing algorithms
Lossless source coding theory, Huffmann coding, and introduction to Arithmetic coding
Asymptotic equipartition property (AEP), typicality and joint typicality
Entropy, conditional entropy, mutual information, and their properties
Channel coding theory, capacity, and Fano’s inequality
Continuous random variables, differential entropy, Gaussian source, and Gaussian channel
Error correcting codes, linear codes, and introduction to low-density parity check code
Methods of type, large deviation theory, maximum entropy principle
N.B. You will expect to expose to some Python and Matlab. You won't become an expert on these things after this class. But it is good to get your hands dirty and play with them early.
Projects
Grading
Homework: 20%.
"Mid-term": 30%. Closed book test, and no Internet and cell-phones, but you can bring calculator and two pieces of letter-size cheat sheet
Final Project: 50%.
Calendar
| Topics | Materials |
8/25 | Overview of IT, Probability Overview, ML, MAP, Bayesian inference | Slides from Berkeley CS188, Slides from Wash U CSE 473 |
9/1 | Conjugate prior, Bernoulli distribution, Binomial distribution, Poisson distribution, Beta distribution, summary | Jupyter notebook on distribution |
9/8 | Multinormial distribution, Dirichlet distribution, Normal distribution, overview of Source Coding Theorem, Huffmann coding, Lagrange multiplier, summary | |
9/15 | Karush-Kuhn-Tucker conditions, Kraft's inequality, Shannon-Fano-Elias codes, proof of Source Coding Theorem, summary | |
9/22 | Different entropy, Jensen's inequality, joint entropy, conditional entropy, summary | |
9/29 | Kullback-Leibler divergence, mutual information, cross entropy, Application example: TF-IDF, summary | TF-IDF notebook by Mike Bernico |
10/6 | Decison trees, random forests, law of large number, asymptoic equipartion (AEP), typical sequences, summary | Random Forest notebook by Jason Sanchez |
10/13 | Joint typical sequences, packing and covering lemmas, channel capacity, Channel Coding Theorem, summary | |
10/20 | Converse proof of Channel Coding Theorem, rae-distortion problem, rate-distortion theorem, summary | |
10/27 | Method of types, universal source coding, large deviation theory, summary | |
11/3 | Review, summary | |
11/10 | Mid-Term | |
11/17 | Review of mid-term solution | |
11/24 | Thanksgiving holiday | |
12/1 | Overview of reinforcement learning, Markov decision process, Q-learning | Slides from Berkeley AI courses: MDP I, MDP II, Reinforcement learning I, Reinforcement learning II |
12/8 | Bayesian net, belief propagation algorithm, IRA/LDPC codes, summary | Bayesian network examples, LDPC decoding in Python
|
|