ECE-5583: Information Theory and Probabilistic Programming

There has been a strong resurgence of AI in recent years. An important core technology of AI is statistical learning, which aims to automatically “program” machines with data. While the idea can date back to the 50's of the last century, the plethora of data and inexpensive computational power allow the techniques to thrive and penetrate into every aspect of our daily lives — customer behavior prediction, financial market prediction, fully automatic surveillance, self-driving vehicles, autonomous robots, and beyond.

Information theory was first introduced and developed by the great communications engineer, Claude Shannon in the 50's of the last century. The theory was introduced in an attempt to explain the principle behind point-to-point communication and data storing. However, the technique has been incorporated into statistical learning and has inspire many of the underlying principles. In this graduate course, we would try to explore the exciting area of statistical learning from the perspectives of information theorists. It facilitates students to have a deeper understanding of the omnipresent field of statistical learning and to appreciate the wide-spread significance of information theory. Moreover, we will look into recent advance in probabilistic programming technology that facilitates users to tackle inference problems through computer programs.

The course will start by providing an overview of information theory and statistical learning. We will then aid students to establish a solid foundation on core information theory principles including information measures, AEP, source and channel coding theory. We will then introduce common and yet powerful statistical techniques such as Bayesian learning, decision forests, and belief propagation algorithms and discuss how these modern statistical learning techniques are connected to information theory.

Textbook

Information theory and probabilistic inference by Samuel Cheng

Other Reference Materials

Information Theory, Inference, and Learning Algorithms by David Mackay Learning from data by Abu-Mostafa, Magdon-Ismail, and Lin
Element of Information Theory by Cover and Thomas
Pattern Recognition and Machine Learning by Bishop
Machine Learning: A Probabilistic Perspective by Kevin P. Murphy
Practical probabilistic programming by Avi Pfefffer

Shannon, C. E. (1948), A mathematical theory of communication. Bell Sys. Tech. J. 27: 379-423, 623-656.
R. W. Yeung, On entropy, information inequalities, and Groups.
Law of Large Number by Terry Tao.
A First Course in Information Theory, by Raymond W. Yeung, New York: Springer.
Information Theory and Reliable Communication by R. Gallager, New York: Wiley.
Information Theory by Csiszar and Korner, New York: Academic Press.
Entropy and Information Theory by R. M. Gray, Springer-Verlag, 1990.
J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding Belief Propagation and its Generalizations in Exploring Artificial Intelligence in the New Millennium: Science and Technology Books, 2003.
S. Verdu, Fifty years of Shannon theory, Information Theory, IEEE Transactions on, vol. 44, pp. 2057-2078, 1998.
I. Csiszar, The method of types, Information Theory, IEEE Transactions on, vol. 44, pp. 2505-2523, 1998.
Information Theoretic Inequality Prover.
Network Information Theory by El Gamal and Kim.
Stochastic Processes: Theory for Applications by Robert Gallager, 2013.
T Hofmann, B Schölkopf, AJ Smola, Kernel Methods in Machine Learning.

Office Hours

There are no “regular” office hours. But you are welcome to come catch me anytime or contact me through emails.

Course Syllabus (Tentative)

Probability review
Maximum likelihood estimator, MAP, Bayesian estimator
Graphical models and message passing algorithms
Lossless source coding theory, Huffmann coding, and introduction to Arithmetic coding
Asymptotic equipartition property (AEP), typicality and joint typicality
Entropy, conditional entropy, mutual information, and their properties
Channel coding theory, capacity, and Fano’s inequality Continuous random variables, differential entropy, Gaussian source, and Gaussian channel
Error correcting codes, linear codes, and introduction to low-density parity check code
Methods of type, large deviation theory, maximum entropy principle

N.B. You will expect to expose to some Python and Matlab. You won't become an expert on these things after this class. But it is good to get your hands dirty and play with them early.

Projects

Final project report is due date Dec 14.

Late Policy

There will be 5% deduction per each late day for all submissions
The deduction will be saturated after 10 days. So you will get half of your marks even if you are super late

Grading

Quizzes (In class participation): 10% (extra credits).

Presentations: 20%.

Homework: 30%.

"Mid-term": 20%. take home but will only have half of a day to complete.

Final Project: 30%.

Final Grade:

A: \(\sim\) 90 and above
B: \(\sim\) between 80 and 90
C: \(\sim\) between 70 and 80
D: \(\sim\) between 60 and 70
F: Below 60

Calendar

	Topics	Materials	Quizzes
8/21	Overview of IT, probability overview, discrete and continuous random variables, Joint and conditional probabilities, independence and conditional independence, expectation (video)	probability review, slides1	link
8/28	Monty Hall problem, Monte Carlo, Lea (video)		link
9/04	Formal probability definition, ML, MAP, Bayesian inference (video)	lea examples	link
9/11	Conjugate prior, Beta distribution, Law of large number (LLN), proof of weak LLN, Kelly's criterion, Asymptotic equal partition (video)		link
9/18	Typical sequences, Source Coding Theorem, Huffman coding, SFE coding, Jensen's inequality (video)	slides2	link
9/25	Differential entropy, multivariate Gaussian entropy, Gaussian process (video)	Quantifying information, code	link
10/02	Conditional entropy, converse proof of source coding theorem, KL-divergence, Gaussian source maximizes entropy, Thiel index (video)		link
10/09	Cross-entropy, TF-IDF, introduction to EM-algorithm, mutual information, chain rule of mutual information, data processing inequality, MLE as minimization of cross-entropy and KL-divergence (video)		link
10/16	Shannon's perfect secrecy, Decision tree, channel capacity, binary symmetric channel(video)		link
10/23	Gaussian channel, Lagrange multiplier, color channel, joint typical sequences (video)		link
10/30	Mid-term
11/06	Presentations, packing lemma, covering lemma (video)	slides3	link
11/13	Presentations (video), last presentation (better audio)
11/20	Presentations, forward proof of channel coding theorem, Fisher information and Cramer-Rao lower bound Graphical models, Bayesian networks, undirected graphs (video)	slides2022c, (slides from Berkeley CS188)

Academic Integrity

Academic honesty is incredibly important within this course. Cheating is strictly prohibited at the University of Oklahoma, because it devalues the degree you are working hard to get. As a member of the OU community, it is your responsibility to protect your educational investment by knowing and following the rules. For specific definitions on what constitutes cheating, review the Student’s Guide to Academic Integrity.