A simple network to classify handwritten digits

Categories:

OK, let’s get into the real problem handwriting recognition

First, we’d like a way of breaking an image containing many digits into a sequence of separate images, each containing a single digit.

We’ll focus on writing a program to solve the second problem

We do this because it turns out that the segmentation problem is not so difficult to solve, once you have a good way of classifying individual digits.

The first layer is input layer.

For simplicity I’ve omitted most of the 784784 input neurons in the diagram above.

The second layer is hidden layer We denote the number of neurons in this hidden layer by n

The output layer contain 10 neurons， we number the output neurons from 0 through 9, and figure out which neuron has the highest activation value.

why we need three layer to reco digit instead of two? what’s the role of hidden and output layer? why we need a outputlayer? how the weight comes up?

to explain those, let’s think about what the neural network is doing from first principles, we had input of every pixels, and output layer add all evidence and decide true or false

that’s quite simple, but where the evidence comes from?

hidden layers provided evidence, and let’s concentrate on the first hidden neurons detects whether or not an image like the following is present

It can do this by heavily weighting input pixels which overlap with the image, and only lightly weighting the other inputs.

the same way, if other neural fired.

we can get an 0 from output

Exercise

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below.

Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01

we had input layer which hold pixels in grayscale, through hidden layer, we get output from output layer, as said, output closer to 1 can indicate the number. now add new output layer which convert old output to 0 or 1. we had map as below

For the first neuron, only digits 8 or 9 can be activated. This means that neurons 8 and 9 in the old input layer had a greater weight influence on the first neuron in the new output layer.

The same principle applies to other neurons.
But how do we determine the actual values of the biases and weights?

That doesn’t matter—we don’t need to manually design a set of weights or biases. Instead, we should understand that a node in the old output layer that is closer to 1 can be interpreted as having a positive influence. This influence is then mapped to the corresponding node in the new output layer.

for example we have neurons 8 closer to 1. and we fire first node in new output layer

The weights and biases are actually optimized by the system itself using a methodology called gradient descent.

Last modified March 12, 2025: update ai part (9e50048)