ECE/TCOM-5583: Information Theory

This is a graduate introductory course on information theory. I have offered this course for almost ten years now. And I have decided to make some significant changes of materials for this offering. Past materials such as Kraft inequality and separation theorem are intellectually interesting. Unfortunately I found that most of my students (including myself) would not end up becoming information theorists and thus most of them do not feel connected with the topics.

Therefore, I plan to follow the approach of Professor Mackay for this offering. That is, try to incorporate core information theory materials (for example, information measures, AEP, source and channel coding theory) into statistical learning. Therefore, the main textbook will be Professor Mackay's textbook — Information Theory, Inference, and Learning Algorithms. Even that, I probably will not follow this book very closely. And I incline to borrow heavily with materials from the Internet. Two other most important reference texts are

Element of Information Theory by Cover and Thomas
Pattern Recognition and Machine Learning by Bishop

And a nice reference on network information theory is

Network Information Theory by El Gamal and Kim

However, we won't go into network information theory for this introductory course.

Other Reference Materials

Shannon, C. E. (1948), A mathematical theory of communication. Bell Sys. Tech. J. 27: 379-423, 623-656.
R. W. Yeung, On entropy, information inequalities, and Groups.
Law of Large Number by Terry Tao.
A First Course in Information Theory, by Raymond W. Yeung, New York: Springer.
Information Theory and Reliable Communication by R. Gallager, New York: Wiley.
Information Theory by Csiszar and Korner, New York: Academic Press.
Entropy and Information Theory by R. M. Gray, Springer-Verlag, 1990.
Probability, Random Processes, and Ergodic Properties by R. M. Gray, Springer-Verlag, 1988.
J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding Belief Propagation and its Generalizations in Exploring Artificial Intelligence in the New Millennium: Science and Technology Books, 2003.
S. Verdu, "Fifty years of Shannon theory," Information Theory, IEEE Transactions on, vol. 44, pp. 2057-2078, 1998.
I. Csiszar, The method of types, Information Theory, IEEE Transactions on, vol. 44, pp. 2505-2523, 1998.
Information Theoretic Inequality Prover

Link to other course materials

Office Hours

There are no “regular” office hours. But you are very likely to be able to find me everyday in the afternoon. And you are welcome to come catch me anytime or contact me through emails.

Course Syllabus (Tentative)

Probability review
Maximum likelihood estimator, MAP, Bayesian estimator
Graphical models and message passing algorithms
Lossless source coding theory, Huffmann coding, and introduction to Arithmetic coding
Asymptotic equipartition property (AEP), typicality and joint typicality
Entropy, conditional entropy, mutual information, and their properties
Channel coding theory, capacity, and Fano’s inequality Continuous random variables, differential entropy, Gaussian source, and Gaussian channel
Error correcting codes, linear codes, and introduction to low-density parity check code
Methods of type, large deviation theory, maximum entropy principle

N.B. You will expect to expose to some Python and Matlab. You won't become an expert on these things after this class. But it is good to get your hands dirty and play with them early.

Projects

Grading

Homework: 20%.

"Mid-term": 30%. Closed book test, and no Internet and cell-phones, but you can bring calculator and two pieces of letter-size cheat sheet

Final Project: 50%.

Calendar

	Topics	Materials
8/25	Overview of IT, Probability Overview, ML, MAP, Bayesian inference	Slides from Berkeley CS188, Slides from Wash U CSE 473
9/1	Conjugate prior, Bernoulli distribution, Binomial distribution, Poisson distribution, Beta distribution, summary	Jupyter notebook on distribution
9/8	Multinormial distribution, Dirichlet distribution, Normal distribution, overview of Source Coding Theorem, Huffmann coding, Lagrange multiplier, summary
9/15	Karush-Kuhn-Tucker conditions, Kraft's inequality, Shannon-Fano-Elias codes, proof of Source Coding Theorem, summary
9/22	Different entropy, Jensen's inequality, joint entropy, conditional entropy, summary
9/29	Kullback-Leibler divergence, mutual information, cross entropy, Application example: TF-IDF, summary	TF-IDF notebook by Mike Bernico
10/6	Decison trees, random forests, law of large number, asymptoic equipartion (AEP), typical sequences, summary	Random Forest notebook by Jason Sanchez
10/13	Joint typical sequences, packing and covering lemmas, channel capacity, Channel Coding Theorem, summary
10/20	Converse proof of Channel Coding Theorem, rae-distortion problem, rate-distortion theorem, summary
10/27	Method of types, universal source coding, large deviation theory, summary
11/3	Review, summary
11/10	Mid-Term
11/17	Review of mid-term solution
11/24	Thanksgiving holiday
12/1	Overview of reinforcement learning, Markov decision process, Q-learning	Slides from Berkeley AI courses: MDP I, MDP II, Reinforcement learning I, Reinforcement learning II
12/8	Bayesian net, belief propagation algorithm, IRA/LDPC codes, summary	Bayesian network examples, LDPC decoding in Python