Math 7243 - Machine Learning and Statistical Learning Theory, Northeastern University Department of Mathematics

This course introduces both the mathematical theory of learning and the implementation of modern machine-learning algorithms appropriate for data science. Modeling everything from social organization to financial predictions, machine-learning algorithms allow us to discover information about complex systems, even when the underlying probability distributions are unknown. Algorithms discussed include regression, decision trees, clustering, neural networks and dimensionality reduction techniques. The course offers students an opportunity to learn the implications of the mathematical choices underpinning the use of each algorithm, how the results can be interpreted in actionable ways, and how to apply their knowledge through the analysis of a variety of data sets and models.

This course consists of 20 Lectures and 6 inclass-takehome Labs, covering a wide range of topics in machine learning. The course is split into four peices:

Linear Methods, Basis Methods, and Decition Trees (Lectures 1-7, 13, 18, Lab 1-3)
Neural Networks (Lecture 8-11, Labs 4-7)
Introduction to Unspervised Learning: PCA and Clustering (Lectures 12, 14-16)
Mathematical Framework of Statistical Learning (Lectures 18-20)

This course is designed to be both a practical introduction to the hands on aspects of machine learning as well as to the theory. As part of the course students will produce a novel end-to-end machine learning project. While I will help you with this project you will need to struggle with all aspects of the pipeline, from data acquisition to model selection and training to the presentation of your results. It is hard to understand the purpose of the mathematical pieces of machine learning without struggle with the practicalities. Similarly, the theory allows us to perform careful model selection and construct new models in an ever widening space of algorithms. The great results in machine learning come from the synergy of practical verification and theoretical robustness. This course strives to serve as a strong introduction to both.

Lectures

Lectures are hosted on Google Drive to size constraints. If you find a mis-attributed image/dataset discussed in these lectures please do not hesitate to reach out. These lectures follow closely

Statistical Considerations and Best Practices: Elements of Statistical Learning by Hastie, Tibshirani, Friedman.
Implementation and Neural Networks: Hands-On Machine Learning with Scikit-earn, Keras, and Tensorflow, by Géron, Git Hub.
Axiomatic Mathematical Framework and Theory: Understanding Machine Learning, by Shalev-Shwartz, Ben-David.
How-to guides for every Neural Network: Machine Learning Masterty by Jason Brownlee.
Inspiration and Visualization: Distill Online Journal.

Lecture Number	Title	Topics
Lecture 1	Introduction to Machine Learning	What is machine learning, terminology and notation, first examples: linear regression and binary classification with k-nearest neighbors
Lecture 2	Matrix Differentiation and Optimization	Brief introduction to the bias-variance trade off and the curse of dimensionality, matrix differentiation
Lecture 3	Linear Regression	Variance of linear parameters, confidence intervals and z-scores for linear parameters, feature selection via statistical significance, subset selection
Lecture 4	Parameter Shrinkage Methods	Gauss-Markov Theorem, Ridge Regression, Lasso Regression, Degrees of Freedom
Lecture 5	Linear Methods in Classification	Multilabel Classification, Regression on Categorical Variables, Linear Discriminant Analysis, Logistic Regression, Fitting Logistic Regression with Newtons Method, Extra: Bayes Classifier
Lecture 6	Iterative Methods	Gradient Decent, Stochastic Gradient Decent, Newtons Method, Example: Polynomial Fitting, Example: Fitting Nonpolynomial Functions
Lecture 7	Smoothing Methods	Piecewise Polynomials and Splines, Endpoint Selection and Smoothing Splines, Multidimensional Splines, Kernel Smoothing, Other Bases
Lecture 8	Artificial Neural Networks	Artificial Neural Networks, Linear Classifier, Neural Networks and the Perceptron, Multilabel Perceptrons, Gradient Decent and Back Propagation, Back Propagation
Lecture 9	Convolutional Neural Networks	Types of Artificial Neural Networks, Convolutional Neural Networks, History of CNNs, Using Pretrained CNN’s
Lecture 10	Recurrent Neural Networks	Types of Artificial Neural Networks, Recurrent Networks, Recurrence Nodes, Applying RNN’s to Natural Language Processing, Extra: Symmetrically Connected Networks
Lecture 11	Training Deep Networks	Vanishing Gradients and Activation Functions, Batch Normalization and Gradient Clipping, Faster Optimizers, Regularization
Lecture 12	Factor Analysis and PCA	Feature Construction and Dimensional Reduction, Exploratory Factor Analysis, Principle Component Analysis, Nonlinear PCA
Lecture 13	Decision Trees and Support Vector Machines	Decision Trees, Support Vector Machines, SVM’s and the Kernel Trick
Lecture 14	Cluster Analysis Part 1	Overview of Clustering, K-Means Clusters, Example: Gene Expression Clustering, Density Based Clustering, Dissimilarity
Lecture 15	Cluster Anaylsis Part 2	Dissimilarity, Agglomerative Clustering, Divisive Clustering, Spectral Clustering
Lecture 16	Cluster Analysis Part 3	Spectral Clustering, Gaussian Mixture Models, Evaluating Clustering: Internal and External Measures, The Theoretical Problem of Clustering
Lecture 17	Boosting, Bagging and Bootstrapping	Boosting and Adaboost, Adaboost in the Loss Minimization Framework, Boosting Trees, Bootstraping Confidence Intervals, Bagging and Bumping
Lecture 18	Mathematical Foundations of Machine Learning	Formal Model for Statistical Learning, PAC and APAC Learning, Uniform Convergence, Hoeffding’s Inequality, Finite Hypothesis Classes are APAC Learnable
Lecture 19	No Free Lunch	Proving Bounds in APAC, Finite Hypothesis Classes are APAC Learnable, No Free Lunch Theorem, The Bias-Variance Tradeoff for RSS, Examples: k nearest neighbors and Linear Predictor
Lecture 20	The Vapnik-Chervonenkis Dimension	The class of all functions is unlearnable, VC Dimension with Examples, The Fundamental Theorem of PAC Learning, Sauers Lemma, Examples: Boosting and PAC learning, VC-Dimension of Neural Networks
Fin	Survey Of Further Directions	What We Have Done, Topics We Missed, Where To Go From Here

Additional Lectures

Lecture Number	Title	Topics
Guest Lecture 12 Part 2 (Jorio Cocola)	PCA and Random Matricies	The theory of random matricies and the resulting statistical guarentee for PCA on high dimensional spaces
Lecture 21	Validation and Model Selection	Hyperparameter Tuning Algorithms, Managing Accuracy For Large Numbers of Models, Comparing Model Performance Using Statistical Tests, Bayesian Optimization

Labs

Lab Number	Title	Topics
Lab 1	Exploratory Analysis	Loading CSV files, data frames, graphing, exploratory analysis, linear regression.
Lab 2	Linear Regression	Linear regression with linear algebra, sklearn and statsmodel.api; ridge and lasso regression; subset selection methods.
Lab 3	Linear Methods in Classification	Visualization of categorically labeled data, categorical linear regression, logistic regression, linear and quadratic discriminant analysis.
Lab 4	Artificial Neural Networks with Tensorflow and Keras	Building, training, using and saving ANN’s using Keras with a Tensorflow backend.
Lab 4.5	Optimizing Deep Nerual Networks	Selecting Activation Functions, Choosing Weight Initializations, Batch Normalization, Gradient Clipping, Choosing Optimizers and Regularization
Lab 5	Convolutional Neural Networks with Keras	Building and training CNN’s, using pretrained CNN’s, using Keras.
Lab 6	Recurrent Neural Networks with Keras	Building and training RNN’s for sequence generation and text generation.
Lab 7	Introduction to Tensorflow	Introduction to using Tensorflow 2.1 and subclassing to properly implament loss functions, layers and models.

Homework

Homework 1: Matrix Calculus, Loss Functions, and Linear Regression
Homework 2: LDA, Cubic Splines, and Gradiant Decent.

Datasets

Hosted External Datasets

Dataset	Title	Lab
Ames Housing Prices	Ames	Lab 1
NYC AirBnB Prices	NYCAirBnB	Lab 1
Gendered Voice	GenderedVoice	Lab 3

Covid Datasets

Dataset	Description
Boston 311 Covid	Boston 311 has been collecting data on reported COVID cases. This is included in the standard Boston 311 dataset (This seems to no longer be true, info?).
COVID Lung Damage CT Semgentation	A semgentation dataset containing CT scans of the lungs of COVID patients. Open questions include automatic segmentation, classifying damage by location and extent, subtyping damage using clustering, and charting disease progreassion
New York Times By County Data	New York Times data on infection rate and deaths due to COVID by county in the US
Johns Hopkins Data Aggragator	This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).