Machine Learning and Statistical Learning Theory
View the course on GitHub tipthederiver/Math-7243-2020
This course introduces both the mathematical theory of learning and the implementation of modern machine-learning algorithms appropriate for data science. Modeling everything from social organization to financial predictions, machine-learning algorithms allow us to discover information about complex systems, even when the underlying probability distributions are unknown. Algorithms discussed include regression, decision trees, clustering, neural networks and dimensionality reduction techniques. The course offers students an opportunity to learn the implications of the mathematical choices underpinning the use of each algorithm, how the results can be interpreted in actionable ways, and how to apply their knowledge through the analysis of a variety of data sets and models.
This course consists of 20 Lectures and 6 inclass-takehome Labs, covering a wide range of topics in machine learning. The course is split into four peices:
This course is designed to be both a practical introduction to the hands on aspects of machine learning as well as to the theory. As part of the course students will produce a novel end-to-end machine learning project. While I will help you with this project you will need to struggle with all aspects of the pipeline, from data acquisition to model selection and training to the presentation of your results. It is hard to understand the purpose of the mathematical pieces of machine learning without struggle with the practicalities. Similarly, the theory allows us to perform careful model selection and construct new models in an ever widening space of algorithms. The great results in machine learning come from the synergy of practical verification and theoretical robustness. This course strives to serve as a strong introduction to both.
Lectures are hosted on Google Drive to size constraints. If you find a mis-attributed image/dataset discussed in these lectures please do not hesitate to reach out. These lectures follow closely
Lecture Number | Title | Topics |
---|---|---|
Lecture 1 | Introduction to Machine Learning | What is machine learning, terminology and notation, first examples: linear regression and binary classification with k-nearest neighbors |
Lecture 2 | Matrix Differentiation and Optimization | Brief introduction to the bias-variance trade off and the curse of dimensionality, matrix differentiation |
Lecture 3 | Linear Regression | Variance of linear parameters, confidence intervals and z-scores for linear parameters, feature selection via statistical significance, subset selection |
Lecture 4 | Parameter Shrinkage Methods | Gauss-Markov Theorem, Ridge Regression, Lasso Regression, Degrees of Freedom |
Lecture 5 | Linear Methods in Classification | Multilabel Classification, Regression on Categorical Variables, Linear Discriminant Analysis, Logistic Regression, Fitting Logistic Regression with Newtons Method, Extra: Bayes Classifier |
Lecture 6 | Iterative Methods | Gradient Decent, Stochastic Gradient Decent, Newtons Method, Example: Polynomial Fitting, Example: Fitting Nonpolynomial Functions |
Lecture 7 | Smoothing Methods | Piecewise Polynomials and Splines, Endpoint Selection and Smoothing Splines, Multidimensional Splines, Kernel Smoothing, Other Bases |
Lecture 8 | Artificial Neural Networks | Artificial Neural Networks, Linear Classifier, Neural Networks and the Perceptron, Multilabel Perceptrons, Gradient Decent and Back Propagation, Back Propagation |
Lecture 9 | Convolutional Neural Networks | Types of Artificial Neural Networks, Convolutional Neural Networks, History of CNNs, Using Pretrained CNN’s |
Lecture 10 | Recurrent Neural Networks | Types of Artificial Neural Networks, Recurrent Networks, Recurrence Nodes, Applying RNN’s to Natural Language Processing, Extra: Symmetrically Connected Networks |
Lecture 11 | Training Deep Networks | Vanishing Gradients and Activation Functions, Batch Normalization and Gradient Clipping, Faster Optimizers, Regularization |
Lecture 12 | Factor Analysis and PCA | Feature Construction and Dimensional Reduction, Exploratory Factor Analysis, Principle Component Analysis, Nonlinear PCA |
Lecture 13 | Decision Trees and Support Vector Machines | Decision Trees, Support Vector Machines, SVM’s and the Kernel Trick |
Lecture 14 | Cluster Analysis Part 1 | Overview of Clustering, K-Means Clusters, Example: Gene Expression Clustering, Density Based Clustering, Dissimilarity |
Lecture 15 | Cluster Anaylsis Part 2 | Dissimilarity, Agglomerative Clustering, Divisive Clustering, Spectral Clustering |
Lecture 16 | Cluster Analysis Part 3 | Spectral Clustering, Gaussian Mixture Models, Evaluating Clustering: Internal and External Measures, The Theoretical Problem of Clustering |
Lecture 17 | Boosting, Bagging and Bootstrapping | Boosting and Adaboost, Adaboost in the Loss Minimization Framework, Boosting Trees, Bootstraping Confidence Intervals, Bagging and Bumping |
Lecture 18 | Mathematical Foundations of Machine Learning | Formal Model for Statistical Learning, PAC and APAC Learning, Uniform Convergence, Hoeffding’s Inequality, Finite Hypothesis Classes are APAC Learnable |
Lecture 19 | No Free Lunch | Proving Bounds in APAC, Finite Hypothesis Classes are APAC Learnable, No Free Lunch Theorem, The Bias-Variance Tradeoff for RSS, Examples: k nearest neighbors and Linear Predictor |
Lecture 20 | The Vapnik-Chervonenkis Dimension | The class of all functions is unlearnable, VC Dimension with Examples, The Fundamental Theorem of PAC Learning, Sauers Lemma, Examples: Boosting and PAC learning, VC-Dimension of Neural Networks |
Fin | Survey Of Further Directions | What We Have Done, Topics We Missed, Where To Go From Here |
Lecture Number | Title | Topics |
---|---|---|
Guest Lecture 12 Part 2 (Jorio Cocola) | PCA and Random Matricies | The theory of random matricies and the resulting statistical guarentee for PCA on high dimensional spaces |
Lecture 21 | Validation and Model Selection | Hyperparameter Tuning Algorithms, Managing Accuracy For Large Numbers of Models, Comparing Model Performance Using Statistical Tests, Bayesian Optimization |
Lab Number | Title | Topics |
---|---|---|
Lab 1 | Exploratory Analysis | Loading CSV files, data frames, graphing, exploratory analysis, linear regression. |
Lab 2 | Linear Regression | Linear regression with linear algebra, sklearn and statsmodel.api; ridge and lasso regression; subset selection methods. |
Lab 3 | Linear Methods in Classification | Visualization of categorically labeled data, categorical linear regression, logistic regression, linear and quadratic discriminant analysis. |
Lab 4 | Artificial Neural Networks with Tensorflow and Keras | Building, training, using and saving ANN’s using Keras with a Tensorflow backend. |
Lab 4.5 | Optimizing Deep Nerual Networks | Selecting Activation Functions, Choosing Weight Initializations, Batch Normalization, Gradient Clipping, Choosing Optimizers and Regularization |
Lab 5 | Convolutional Neural Networks with Keras | Building and training CNN’s, using pretrained CNN’s, using Keras. |
Lab 6 | Recurrent Neural Networks with Keras | Building and training RNN’s for sequence generation and text generation. |
Lab 7 | Introduction to Tensorflow | Introduction to using Tensorflow 2.1 and subclassing to properly implament loss functions, layers and models. |
Dataset | Title | Lab |
---|---|---|
Ames Housing Prices | Ames | Lab 1 |
NYC AirBnB Prices | NYCAirBnB | Lab 1 |
Gendered Voice | GenderedVoice | Lab 3 |
Dataset | Description |
---|---|
Boston 311 Covid | Boston 311 has been collecting data on reported COVID cases. This is included in the standard Boston 311 dataset (This seems to no longer be true, info?). |
COVID Lung Damage CT Semgentation | A semgentation dataset containing CT scans of the lungs of COVID patients. Open questions include automatic segmentation, classifying damage by location and extent, subtyping damage using clustering, and charting disease progreassion |
New York Times By County Data | New York Times data on infection rate and deaths due to COVID by county in the US |
Johns Hopkins Data Aggragator | This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL). |