Backpropagation and Gradient Descent for Training Neural Networks CS 349-02 April 10, 2017 1 Overview Given a neural network architecture and labeled training data, we want to nd the weights that minimize the loss on the training data. The loss function varies depending on the output layer and labels.The total loss is the sum of two terms: the data loss and the regularization loss. J= J data

