System Message: premium flash cards are for paid pro members. Log in as a pro member to view.

Machine Learning Training Loop Explained

Tutorial by Uniqtech Guide

In this smart tutorial, we want to start tackling two complex topics in deep learning : training loop, autograd. These two topics open doors to a myriad of related topics: auto differentiation, gradient descent, gradient tape, training loop, back propagation etc. All these concepts are related, yet there are plenty of differentials. No pun intended.

Learning goals | key takeaways from this page : It’s important to understand : What is back propagation, update function, gradient descent, and different flavors of gradient descent, how are they related and what is a batch, mini-batch, versus stochastic gradient descent?

The gradient is a vector pointing at the direction of the steepest descent. There are some differences between gradients and derivatives. The gradient is a vector, and the derivative is a scalar (a real number).

Training is an iterative process. May need to be repeated many times. Check the ML workflow chart to see where training fits in the overall workflow.

Gradient Descent. Stochastic Gradient Descent.

Define gradient descent in two sentences.
Gradient Descent Update Function, Update Equation.

Concepts

It's important to understand: What is back propagation, update function, gradient descent, and different flavors of gradient descent, how are they related and what is a batch, mini-batch?
Introduction to machine learning training
Datasets:
Training dataset is used for training. Cross validation dataset is used for evaluating the model, fine tuning the model parameters. The third dataset, aka the test dataset is for final model selection. Testing the model's performance on unseen dataset (mimicking real world data), testing its ability to generalize.
Train test split (a review): Train test split basics
More train test split
- Train test split prevents overfitting
- Train test split with shuffle
- Train test split basics
- Train test split 02
- Data Split : Shuffling and splitting can be done more than once. Once the dataset is split into two parts such as training and validation, we can grab the validation dataset and call train_test_split() on it again to split further into validation and testing (holdout dataset). This way we have three datasets. We can also implement custom indexing to sample data records.
Training loop:
Training in Pytorch requires us to write a more custom detailed training loop. In scikit-learn, in tensorflow, training can be calling a simple high level API .fit(). Though each model has different architecture under the hood, the high level API has been abstracted to be .fit(). Here's a fully annotated note of pytorch training loop [important, high quality]
What is model.eval()? [pytorch]
What is a forward pass? What is a backward pass? [high quality, training loop, definition]
For Gradient descent to work well, in the real world, GD requires smooth, continuous convex functions as loss functions, because those are easy to compute, differentiate. If an activation function slope is too steep, it is hard for gradient descent to perform well.
Autograd, what is it? Pytorch Example. Auto differentiation. What is Autograd?
Back propagation (BP, backpropagation), what is it? What is Back Propagation? Explain BP in two sentences.
The sigmoid activation function is easy to differentiate, has a known formula. Without the smooth sigmoid function, output of a neuron, perceptron is discrete 0 or 1, instead of smooth spans between 0 and 1.
Deep learning, neural networks often use stochastic gradient descent and back propagation to optimize weights and bias (model parameters).
Error calculation : a simple basic error calculation is y - y_hat or y - y_pred.
Error analysis :
Optimization:
Extra: create and collect your own dataset. How to collect and generate data in real life.
Best practice: randomize the training data, shuffle

Flash cards (skills) will be displayed here. Log in as a Paid Pro Member to view pro cards.