ECE/TCOM-5583: Information Theory

This is a graduate introductory course on information theory. I have offered this course for almost ten years now. And I have decided to make some significant changes of materials for this offering. Past materials such as Kraft inequality and separation theorem are intellectually interesting. Unfortunately I found that most of my students (including myself) would not end up becoming information theorists and thus most of them do not feel connected with the topics.

Therefore, I plan to follow the approach of Professor Mackay for this offering. That is, try to incorporate core information theory materials (for example, information measures, AEP, source and channel coding theory) into statistical learning. Therefore, the main textbook will be Professor Mackay's textbook — Information Theory, Inference, and Learning Algorithms. Even that, I probably will not follow this book very closely. And I incline to borrow heavily with materials from the Internet. Two other most important reference texts are

  • Element of Information Theory by Cover and Thomas

  • Pattern Recognition and Machine Learning by Bishop

And a nice reference on network information theory is

  • Network Information Theory by El Gamal and Kim

However, we won't go into network information theory for this introductory course.

Other Reference Materials

Link to other course materials

Office Hours

There are no “regular” office hours. But you are very likely to be able to find me everyday in the afternoon. And you are welcome to come catch me anytime or contact me through emails.

Course Syllabus (Tentative)

  • Probability review

  • Maximum likelihood estimator, MAP, Bayesian estimator

  • Graphical models and message passing algorithms

  • Lossless source coding theory, Huffmann coding, and introduction to Arithmetic coding

  • Asymptotic equipartition property (AEP), typicality and joint typicality

  • Entropy, conditional entropy, mutual information, and their properties

  • Channel coding theory, capacity, and Fano’s inequality Continuous random variables, differential entropy, Gaussian source, and Gaussian channel

  • Error correcting codes, linear codes, and introduction to low-density parity check code

  • Methods of type, large deviation theory, maximum entropy principle

N.B. You will expect to expose to some Python and Matlab. You won't become an expert on these things after this class. But it is good to get your hands dirty and play with them early.



Homework: 20%.

"Mid-term": 30%. Closed book test, and no Internet and cell-phones, but you can bring calculator and two pieces of letter-size cheat sheet

Final Project: 50%.


Topics Materials
8/25 Overview of IT, Probability Overview, ML, MAP, Bayesian inference Slides from Berkeley CS188, Slides from Wash U CSE 473
9/1 Conjugate prior, Bernoulli distribution, Binomial distribution, Poisson distribution, Beta distribution, summary Jupyter notebook on distribution
9/8 Multinormial distribution, Dirichlet distribution, Normal distribution, overview of Source Coding Theorem, Huffmann coding, Lagrange multiplier, summary
9/15 Karush-Kuhn-Tucker conditions, Kraft's inequality, Shannon-Fano-Elias codes, proof of Source Coding Theorem, summary
9/22 Different entropy, Jensen's inequality, joint entropy, conditional entropy, summary
9/29 Kullback-Leibler divergence, mutual information, cross entropy, Application example: TF-IDF, summary TF-IDF notebook by Mike Bernico
10/6 Decison trees, random forests, law of large number, asymptoic equipartion (AEP), typical sequences, summary Random Forest notebook by Jason Sanchez
10/13 Joint typical sequences, packing and covering lemmas, channel capacity, Channel Coding Theorem, summary
10/20 Converse proof of Channel Coding Theorem, rae-distortion problem, rate-distortion theorem, summary
10/27 Method of types, universal source coding, large deviation theory, summary
11/3 Review, summary
11/10 Mid-Term
11/17 Review of mid-term solution
11/24 Thanksgiving holiday
12/1 Overview of reinforcement learning, Markov decision process, Q-learning Slides from Berkeley AI courses: MDP I, MDP II, Reinforcement learning I, Reinforcement learning II
12/8 Bayesian net, belief propagation algorithm, IRA/LDPC codes, summary Bayesian network examples, LDPC decoding in Python