InfoCoBuild

18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning

18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning (Spring 2018, MIT OCW). Instructor: Prof. Gilbert Strang. Linear algebra concepts are key for understanding and creating machine learning algorithms, especially as applied to deep learning and neural networks. This course reviews linear algebra with applications to probability and statistics and optimization-and above all a full explanation of deep learning. (from ocw.mit.edu)

Lecture 22 - Gradient Descent: Downhill to a Minimum

Gradient descent is the most common optimization algorithm in deep learning and machine learning. It only takes into account the first derivative when performing updates on parameters - the stepwise process that moves downhill to reach a local minimum.


Go to the Course Home or watch other lectures:

Lecture 01 - The Column Space of A Contains All Vectors Ax
Lecture 02 - Multiplying and Factoring Matrices
Lecture 03 - Orthogonal Columns in Q Give QTQ = I
Lecture 04 - Eigenvalues and Eigenvectors
Lecture 05 - Positive Definite and Semidefinite Matrices
Lecture 06 - Singular Value Decomposition (SVD)
Lecture 07 - Eckart-Young: The Closest Rank k Matrix to A
Lecture 08 - Norms of Vectors and Matrices
Lecture 09 - Four Ways to Solve Least Squares Problems
Lecture 10 - Survey of Difficulties with Ax = b
Lecture 11 - Minimizing ∥X∥ Subject to Ax = b
Lecture 12 - Computing Eigenvalues and Singular Values
Lecture 13 - Randomized Matrix Multiplication
Lecture 14 - Low Rank Changes in A and its Inverse
Lecture 15 - Matrices A(t) Depending on t, Derivative = dA/dt
Lecture 16 - Derivatives of Inverse and Singular Values
Lecture 17 - Rapidly Decreasing Singular Values
Lecture 18 - Counting Parameters in SVD, LU, QR, Saddle Points
Lecture 19 - Saddle Points (cont.), Maxmin Principle
Lecture 20 - Definitions and Inequalities
Lecture 21 - Minimizing a Function Step by Step
Lecture 22 - Gradient Descent: Downhill to a Minimum
Lecture 23 - Accelerating Gradient Descent (Use Momentum)
Lecture 24 - Linear Programming and Two-Person Games
Lecture 25 - Stochastic Gradient Descent
Lecture 26 - Structure of Neural Nets for Deep Learning
Lecture 27 - Backpropagation: Find Partial Derivatives
Lecture 28
Lecture 29
Lecture 30 - Completing a Rank-One Matrix, Circulants
Lecture 31 - Eigenvectors of Circulant Matrices: Fourier Matrix
Lecture 32 - ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule
Lecture 33 - Neural Nets and the Learning Function
Lecture 34 - Distance Matrices, Procrustes Problem
Lecture 35 - Finding Clusters in Graphs
Lecture 36 - Alan Edelman and Julia Language