ECE/TCOM-5583: Information Theory and Probabilistic Programming

There has been a strong resurgence of AI in recent years. An important core technology of AI is statistical learning, which aims to automatically “program” machines with data. While the idea can date back to the 50's of the last century, the plethora of data and inexpensive computational power allow the techniques to thrive and penetrate into every aspect of our daily lives — customer behavior prediction, financial market prediction, fully automatic surveillance, self-driving vehicles, autonomous robots, and beyond.

Information theory was first introduced and developed by the great communications engineer, Claude Shannon in the 50's of the last century. The theory was introduced in an attempt to explain the principle behind point-to-point communication and data storing. However, the technique has been incorporated into statistical learning and has inspire many of the underlying principles. In this graduate course, we would try to explore the exciting area of statistical learning from the perspectives of information theorists. It facilitates students to have a deeper understanding of the omnipresent field of statistical learning and to appreciate the wide-spread significance of information theory. Moreover, we will look into recent advance in probabilistic programming technology that facilitates users to tackle inference problems through computer programs.

The course will start by providing an overview of information theory and statistical learning. We will then aid students to establish a solid foundation on core information theory principles including information measures, AEP, source and channel coding theory. We will then introduce common and yet powerful statistical techniques such as Bayesian learning, decision forests, and belief propagation algorithms and discuss how these modern statistical learning techniques are connected to information theory. To summarize, we will skim through some probablistic programming tools. The main reference text is a book by Professor Mackay — Information Theory, Inference, and Learning Algorithms but we will also borrow heavily from materials available online. Other most important reference texts are

Other Reference Materials

Office Hours

There are no “regular” office hours. But you are welcome to come catch me anytime or contact me through emails.

Course Syllabus (Tentative)

  • Probability review

  • Maximum likelihood estimator, MAP, Bayesian estimator

  • Graphical models and message passing algorithms

  • Lossless source coding theory, Huffmann coding, and introduction to Arithmetic coding

  • Asymptotic equipartition property (AEP), typicality and joint typicality

  • Entropy, conditional entropy, mutual information, and their properties

  • Channel coding theory, capacity, and Fano’s inequality Continuous random variables, differential entropy, Gaussian source, and Gaussian channel

  • Error correcting codes, linear codes, and introduction to low-density parity check code

  • Methods of type, large deviation theory, maximum entropy principle

N.B. You will expect to expose to some Python and Matlab. You won't become an expert on these things after this class. But it is good to get your hands dirty and play with them early.

Adjustments for Pregnancy/Childbirth Related Issues

Should you need modifications or adjustments to your course requirements because of documented pregnancy-related or childbirth-related issues, please contact me as soon as possible to discuss. Generally, modifications will be made where medically necessary and similar in scope to accommodations based on temporary disability. Please see for commonly asked questions.

Title IX Resources

For any concerns regarding gender-based discrimination, sexual harassment, sexual misconduct, stalking, or intimate partner violence, the University offers a variety of resources, including advocates on-call 24.7, counseling services, mutual no contact orders, scheduling adjustments and disciplinary sanctions against the perpetrator. Please contact the Sexual Misconduct Office 405-325-2215 (8-5, M-F) or OU Advocates 405-615-0013 (24.7) to learn more or to report an incident.

Mask Policy

As outlined by the University of Oklahoma's Chief COVID Officer, until further notice, employees, students, and visitors of the OU community will be mandated to wear masks (1.) when they are inside University facilities and vehicles and (2.) when they are outdoors on campus and social distancing of at least six feet is not possible. For the well-being of the entire university community it is important that everyone demonstrate the appropriate health and safety behaviors outlined in the University Mandatory Masking Policy ( As this mandate includes all campus classrooms, please make sure you are wearing your mask while in class. If you do not have a mask or forgot yours, see the professor for available masks. If you have an exemption from the Mandatory Masking Policy, please see the professor to make accommodations before class begins. If and where possible, please make your professor aware of your exemption and/or accommodation prior to arriving in class.

If a student is unable or unwilling to wear a mask and has not made an accommodation request through the ADRC, they will be instructed to exit the classroom.


Final project report is due on 12/18.

Late Policy

  • There will be 5% deduction per each late day for all submissions

  • The deduction will be saturated after 10 days. So you will get half of your marks even if you are super late


Quizzes (In class participation): 20% (extra credits).

Presentations: 20%.

Homework: 20%.

"Mid-term": 20%. take home but will only have half of a day to complete.

Final Project: 40%.

Final Grade:

  • A: \(\sim\) 90 and above

  • B: \(\sim\) between 80 and 90

  • C: \(\sim\) between 70 and 80

  • D: \(\sim\) between 60 and 70

  • F: Below 60


Topics Materials
8/26 Overview of IT, probability overview, independence vs conditional independence, Monty Hall problem (screencast) slides_2020a
9/2 Solution of Monty Hall problem, Law of large number (LLN), Kelly's criterion, proof of weak LLN, sampling discrete distributions, asymptotic equal partition, typical sequences, Source Coding Theorem (screencast) Read Lecture 6
9/9 Formal definition of typical sequences, review of Source Coding Theorem, Jensen's inequality, mutual information, joint entropy, conditional entropy, Shannon's perfect secrecy (screencast) Read Lecture 5
9/16 More on conditional entropy, Huffman coding, Fano's inequality, converse proof of source coding theorem with side information (screencast) may read 24-34 as reference
9/23 Markov models, PageRank, data processing inequality, solution of HW2 (screencast)
9/30 Decision tree, KL-divergence, cross-entropy, TF-IDF (screencast) (slides from Berkeley CS188)
10/7 Differential entropy, entropy of Gaussian source, channel capacity, binary symmetric channel (screencast) Read Lecture 7
10/14 Pyro: Introduction to Core Concepts, Probabilistic Programming with Pyro: A tutorial guide, STAN: Probabilistic Programming Language using R, Expectation-Maximization algorithm, Lea - Discrete probability distributions in Python
10/21 An Introduction to Pyro, Edward (TensorFlow pacage for probabilistic programming), An Introduction to TensorFlow Probability, Bayesian Statistical Modelling using PyMC4
10/28 Infer.Net, Why have sex? Information acquisition and evolution, Edward2: Introduction and Tutorial, Machine Learning and Information Theory in MATLAB: An MNIST Case Study, Analyzing and Predicting the Spread of Infectious Diseases through Probablistic Modeling
11/4 Gaussian channel, packing lemma, covering lemma, forward proof of channel coding theorem (screencast)
11/11 Mid-term
11/18 ML, MAP, Bayesian inference, Conjugate prior, Beta distribution (screencast) Read Lecture 2 and Lecture 11
11/25 Thanksgiving
12/2 Graphical models, Bayesian networks, undirected graphs, factor graphs, MCMC (screencast) Markov chain convergence theorem