18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning (Spring 2018, MIT OCW): Lecture 22

18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning

18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning (Spring 2018, MIT OCW). Instructor: Prof. Gilbert Strang. Linear algebra concepts are key for understanding and creating machine learning algorithms, especially as applied to deep learning and neural networks. This course reviews linear algebra with applications to probability and statistics and optimization-and above all a full explanation of deep learning. (from ocw.mit.edu)

Lecture 22 - Gradient Descent: Downhill to a Minimum

Gradient descent is the most common optimization algorithm in deep learning and machine learning. It only takes into account the first derivative when performing updates on parameters - the stepwise process that moves downhill to reach a local minimum.

Go to the Course Home or watch other lectures:

Lecture 01 - The Column Space of A Contains All Vectors Ax

Lecture 02 - Multiplying and Factoring Matrices

Lecture 03 - Orthogonal Columns in Q Give Q^TQ = I

Lecture 04 - Eigenvalues and Eigenvectors

Lecture 05 - Positive Definite and Semidefinite Matrices

Lecture 06 - Singular Value Decomposition (SVD)

Lecture 07 - Eckart-Young: The Closest Rank k Matrix to A

Lecture 08 - Norms of Vectors and Matrices

Lecture 09 - Four Ways to Solve Least Squares Problems

Lecture 10 - Survey of Difficulties with Ax = b

Lecture 11 - Minimizing ∥X∥ Subject to Ax = b