This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Graphics and AI

Massey University Dr. Juncheng Liu

Under instruction of Dr.Liu i had an opprtunrty start my very early academic exploration.


3/6 实验室, 学术准备前的入门工作

1 - A simple network to classify handwritten digits

OK, let’s get into the real problem  handwriting recognition

First, we’d like a way of breaking an image containing many digits into a sequence of separate images, each containing a single digit.

image.png image.png

We’ll focus on writing a program to solve the second problem

We do this because it turns out that the segmentation problem is not so difficult to solve, once you have a good way of classifying individual digits.

image.png

The first layer is input layer.

For simplicity I’ve omitted most of the 784784 input neurons in the diagram above.

The second layer is hidden layer We denote the number of neurons in this hidden layer by n

The output layer contain 10 neurons, we number the output neurons from 0 through 9, and figure out which neuron has the highest activation value.

why we need three layer to reco digit instead of two? what’s the role of hidden and output layer? why we need a outputlayer? how the weight comes up?

to explain those, let’s  think about what the neural network is doing from first principles, we had input of every pixels, and output layer add all evidence and decide true or false

that’s quite simple, but where the evidence comes from?

hidden layers provided evidence, and let’s concentrate on the first hidden neurons detects whether or not an image like the following is present

image.png

It can do this by heavily weighting input pixels which overlap with the image, and only lightly weighting the other inputs.

the same way, if other neural fired. image.png

we can get an 0 from output image.png

Exercise

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below.

Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01

image.png

we had input layer which hold pixels in grayscale, through hidden layer, we get output from output layer, as said, output closer to 1 can indicate the number. now add new output layer which convert old output to 0 or 1. we had map as below

image.png

For the first neuron, only digits 8 or 9 can be activated. This means that neurons 8 and 9 in the old input layer had a greater weight influence on the first neuron in the new output layer.

The same principle applies to other neurons.
But how do we determine the actual values of the biases and weights?

That doesn’t matter—we don’t need to manually design a set of weights or biases. Instead, we should understand that a node in the old output layer that is closer to 1 can be interpreted as having a positive influence. This influence is then mapped to the corresponding node in the new output layer.

for example we have neurons 8 closer to 1. and we fire first node in new output layer

The weights and biases are actually optimized by the system itself using a methodology called gradient descent.

2 - Learning with gradient descent

 our goal in training a neural network is to find weights and biases which minimize the quadratic cost function C(w,b)

We already kown how neural network is functioned. an input layer and outputlayer and hidden layer.

To let network learn itself, we need some data. Like digits

image.png

Actually, these digits come from a well-known training dataset called the MNIST dataset, which contains tens of thousands of scanned images of handwritten digits along with their correct classifications.

The MNIST data is divided into two parts: Training and Test datasets. We primarily use the training dataset to improve the model’s accuracy and the test dataset to evaluate the model’s performance.

For simplicity, let’s simplify the process. In linear algebra, we know that multiple values can define a ‘dot’ in space. So, we transform the input layer into a vector (which we can think of as a dot), and then use a function to map it to the target value y.

image.png

In this case, y is the output, which ranges from 0 to 9. This means y is a 10-dimensional vector.

As for x, we simplify the entire input layer into a single notation: x. Let me explain further:

x is a 28×28=784-dimensional vector. We define each pixel as an input neuron, and the vector represents the grayscale value of a single pixel in the image.

So, in this process, we map the 784-dimensional vector (input) into a 10-dimensional vector (output).

image.png

turning a row vector into an ordinary (column) vector.

Actially, we want a method lets us find weights and biases, so that the output from the network approximates y(x) for all training inputs x.

To achieve that we introduce cost function

image.png

a simple explain here

w denotes the collection of all weights in the network, b all the biases

n is the total number of training inputs

a is the vector of outputs from the network when x is input, and the sum is over all training inputs

call C the quadratic cost function or mean squared error or just MSE

C(w,b) is non-negative and mainly we can considerd as our training algorithm has done a good job if it can find weights and biases so that C(w,b)≈0

OK, why we not just simply let y(x) deduct a?

The problem with that is that the number of images correctly classified is not a smooth function of the weights and biases in the network.

image.png

we can barly get infomation from that, That makes it difficult to figure out how to change the weights and biases to get improved performance. If we instead use a smooth cost function like the quadratic cost it turns out to be easy to figure out how to make small changes in the weights and biases so as to get an improvement in the cost.

C(v) v This could be any real-valued function of many variables, v=v1,v2,…v=v1,v2,….

 To minimize C(v) it helps to imagine C as a function of just two variables, which we’ll call v1 and v2   image.png

Fuction above is sample, the main idea is to find the minimize value However,  calculus doesn’t work. beacuse we will have tons of variables in function far more than 2

there is a beautiful analogy which suggests an algorithm which works pretty well.   We could do this simulation simply by computing derivatives (and perhaps some second derivatives) of C - those derivatives would tell us everything we need to know about the local “shape” and this is nothing with physics we just trying to find where derivatives stop.

let’s think about what happens when we move the ball a small amount Δv1 in the v1 direction, and a small amount Δv2 in the v2 direction.

image.png

with liner alog let’s try seprated changes in v to changes in C

image.png

image.png

ΔC can be rewritten as

image.png

what’s really exciting about the equation is that it lets us see how to choose ΔvΔv so as to make ΔC negative. do not forget we keep tracing down the shape

image.png

where η is a small, positive parameter,we called learning rate.

we indicate position by

image.png

which will show in graph like “falling down”

image.png

But we have one problem here, we need to choose the learning rate to be small enough and approach good approximation. ==If we don’t, we might end up with ΔC>0. If we don’t, we might end up with ΔC>0.==  At the same time, we don’t want η to be too small, since that will make the changes Δv tiny

image.png

with speed 0.03 we used 2495 steps and output curve relativly smooth

image.png

with speed 3 output curve is more abstract and reach a mini spot needs lucky

3 - Neutral Network

introduce Perceptrons and sigmoid

refer: Using neural nets to recognize handwritten digits

patience in the face of such frustration is the only way to truly understand and internalize a subject. FIND YOUR PROJECT

Overview

recognize hand digits

Let’s start from a basic task,  recognize those digits  we don’t usually appreciate how tough a problem our visual systems solve.    write a computer program to recognize digits like those above    try to make such rules precise will got tou confuse

Neural networks

The idea is to take a large number of handwritten digits

 develop a system which can learn from those training examples

artificial neuron called a perceptron. the main neuron model used is one called the sigmoid neuron

write a computer program implementing a neural network

ocusing on handwriting recognition because it’s an excellent prototype problem for learning about neural networks in general.

improve accuracy

Perceptrons

a perceptrons is a device to add evidence to make decisions

perceptron is that it’s a device that makes decisions by weighing up evidence

 ###### how do perceptrons work?    ##### algebraic terms   A perceptron takes several binary inputs, x1,x2,…x1,x2,…, and produces a single binary output:

image.png

Rosenblatt use weights the importance of the respective inputs to the output The neuron’s output, 0 or 1, is determined by whether the weighted sum  is less than or greater than some threshold value.   image.png

A way you can think about the perceptron is that it’s a device that makes decisions by weighing up evidence.

By varying the weights and the threshold, we can get different models of decision-making.

image.png

we’ll call the first layer of perceptrons - is making three very simple decisions, by weighing the input evidence.

 In this way a perceptron in the second layer can make a decision at a more complex and more abstract level than perceptrons in the first layer.

In this way, a many-layer network of perceptrons can engage in sophisticated decision making.

In fact, they’re still single output. The multiple output arrows are merely a useful way of indicating that the output from a perceptron is being used as the input to several other perceptrons.

simplify one The first change is to write 

image.png as a dot product

image.png

 w and x are vectors whose components are the weights and inputs

 move the threshold to the other side of the inequality, and to replace it by what’s known as the perceptron’s bias

image.png

bias can be seen as a way to measure how easy  it is to get the perceptron to output a 1

the bias is a measure of how easy it is to get the perceptron to fire.

 it leads to further notational simplifications. Because of this, in the remainder of the book we won’t use the threshold, we’ll always use the bias.

 ###### compute the elementary logical functions  suppose we have a perceptron with two inputs, each with weight −2, and an overall bias of 3.  image.png the input 1 produces output 0, nputs 1 and 0 produce output 1 NAND gate!

we can use networks of perceptrons to compute any logical function at all.

Actually, in neural networks, a neuron has only one output, which is then broadcasted to all its outgoing connections.

image.png

for encoding input, we define a input layer, This notation for input perceptrons, in which we have an output, but no inputs. better to consider special units which are simply defined to output the desired values.

image.png

It turns out that we can devise learning algorithms which can automatically tune the weights and biases of a network of artificial neurons.

Sigmoid neurons

we already know that our neural networks can simply learn to solve problems, automatically tune the weights and biases of a network of artificial neurons.

This tuning happens in response to external stimuli, without direct intervention by a programmer.

how can we devise such algorithms for a neural network?

before design ourself, let’s see how network behave when a small change happen

image.png

a small change will cause a small corresponding change in the output

this property will make learning possible what we need is to make small change and let output closer to actual value.

But the problem is, when small change happens. output from perceptron more likely to flip like 0 -> 1, That flip may then cause the output overcontrol and unpredictable

That’s where sigmoid neuron comes

We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron

Sigmoid neurons are similar to perceptrons, but modified so that small changes in their weights and bias cause only a small change in their output.

image.png

output instead of 0 and 1 is image.png

where σ is called the sigmoid function or logistic neurons

image.png

we put it explicaitly

image.png

we can take the algebraic form as a technical detail than barrier of understatnding

the similarity to the perceptron model

image.png

when z is large and positive, the output from the sigmoid neuron is approximately 1, On the Other hand the output is approximately 0

the behaviour of a sigmoid neuron also closely approximates a perceptron. Only z has a modest size that there’s much deviation from the perceptron model.

Indeed, it’s the smoothness of the σσ function that is the crucial fact, not its detailed form.

image.png

and we have another version

image.png

In that case, the the sigmoid neuron would be a perceptron since the output would be 1 or 0 (we ignore modest value) so sigmoid is basicly smoother version of perceptron

 Indeed, it’s the smoothness of the σσ function that is the crucial fact

we already know image.png can be approximated by

image.png

Don’t panic !!! Only the shape of fuction matters, actually we will talk more for other activation function

where the output is f(w⋅x+b) for some other activation function f(⋅)

Δoutput is a linear function of the changes Δwj and Δb in the weights and bias

This linearity makes it easy to choose small changes in the weights and biases to achieve any desired small change in the output.

So sigmod simulating perceptrons make it much easier to figure out how changing the weights and biases will change the output.

Anyway, one big difference between perceptrons and sigmoid neurons is that sigmoid neurons don’t just output 0 or 1.

Exercises

Sigmoid neurons simulating perceptrons, part I

Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, c>0. Show that the behaviour of the network doesn’t change.

20250308_003235.jpg

Sigmoid neurons simulating perceptrons, part II

Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won’t need the actual input value, we just need the input to have been fixed.

Suppose the weights and biases are such that  image.pngfor the input x to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant c>0.

Show that in the limit as c→∞ the behaviour of this network of sigmoid neurons is exactly the same as the network of perceptrons.

How can this fail when w⋅x+b=0 for one of the perceptrons?

20250308_005335.jpg

4 - The architecture of neural networks

we will forcus onfeedforward network which contains input, hidden, output layers with no loop

neural network have three layers , input layer , output layer and ’not an input or output layer'

again, we have an network

image.png

the leftmost layer is input neurons the rightmost is outputneurons The middle layer is called a hidden layer t really means nothing more than “not an input or an output”.

input and output often strightforword, while there is often an art to the design of the hidden layers.

There are some design heuristics for the hidden layers. we are not included here

until now, the output from one layer is used as input to the next layer is called feedforward network  information is always fed forward, no loop acceptable

recurrent neural networks allow such loop, They’re much closer in spirit to how our brains work than feedforward networks.

simpleclassifynet

6 - Pytorch

  • Pytorch basics tutorials:

Learn the Basics — PyTorch Tutorials 2.6.0+cu124 documentation pytorch.org Most machine learning workflows involve working with data, creating models, optimizing model parameters, and saving the trained models.

  • Training an image classifier using CNN

Deep Learning with PyTorch: A 60 Minute Blitz — PyTorch Tutorials 2.6.0+cu124 documentation pytorch.org

6.1 - Quick Start

runs through the API for common tasks in machine learning

PyTorch is an open-source deep learning framework that’s known for its flexibility and ease-of-use.

Python Grammer

understand that as a playground with instructure. define a function use def , function in python can return multiable values no need of type attribute

In python list is reinitializable, and turple is fixed and unchangable

import math

def move(x, y, step, angle=0):
    nx = x + step * math.cos(angle)
    ny = y - step * math.sin(angle)
    return nx, ny


>>> x, y = move(100, 100, 60, math.pi / 6)
>>> print(x, y)
151.96152422706632 70.0

A huge improve is Python supports default para. In tons of value need to write value, we can ignore the fixed values and only trans uniqe ones.

WARN: python 的默认参数是在方法外部传入的, 一定注意Python是引用传参, 默认参数必须指向不变对象!

Written in Python, it’s relatively easy for most machine learning developers to learn and use.

For data, we have two primitives in toch

torch.utils.data.DataLoader torch.utils.data.Dataset

Dataset include a bunch of samples and corresponding lables.

Note here In python map called dict

DataLoad add Itrable ability

we mainly use TorchVision package of  FashionMNIST dataset dataset.pytorch offer domain-specific libraryies.

In this Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

DataSet & DataLoaders
Tensors

Pytorch use an unique data type called Tensors, similar to a multidimensional array, used to store and caculate input and output of a model. a important feature is

tensors can run on GPUs to accelerate computing.

Graphs