neural networks

# Artificial Neural Networks: Matrix Form (Part 5)

To actually implement a multilayer perceptron learning algorithm, we do not want to hard code the update rules for each weight. Instead, we can formulate both feedforward propagation and backpropagation as a series of matrix multiplies, which leads to better usability. This is what leads to the impressive performance of neural nets - dumping matrix multiplies to a graphics card allows for massive parallelization and large amounts of data.

This tutorial will cover how to build a neural network that uses matrices. It builds heavily off of the theory in Part 4 of this series, so make sure you understand the math there!

# Artificial Neural Networks: Mathematics of Backpropagation (Part 4)

Up until now, we haven't utilized any of the expressive non-linear power of neural networks - all of our simple one layer models corresponded to a linear model such as multinomial logistic regression. These one-layer models had a simple derivative. We only had one set of weights the fed directly to our output, and it was easy to compute the derivative with respect to these weights. However, what happens when we want to use a deeper model? What happens when we start stacking layers?

# Artificial Neural Networks: Linear Multiclass Classification (Part 3)

In the last section, we went over how to use a linear neural network to perform classification. We covered using both the perceptron algorithm and gradient descent with a sigmoid activation function to learn the placement of the decision boundary in our feature space. However, we only covered binary classification. What if we instead want to classify a point belonging to one of $K$ classes?

# Artificial Neural Networks: Linear Classification (Part 2)

So far we've covered using neural networks to perform linear regression. What if we want to perform classification using a single-layer network?  In this post, I will cover two methods: the perceptron algorithm and using a sigmoid activation function to generate a likelihood. I will not cover the delta rule because it is a special case of the more general backpropagation algorithm, which will be covered in detail in Part 4.