csatblogspotdotcom: (转) AI : Neural Network for beginners

转自：
https://www.codeproject.com/Articles/16419/AI-Neural-Network-for-beginners-Part-of
https://www.codeproject.com/Articles/16508/AI-Neural-Network-for-beginners-Part-of
https://www.codeproject.com/Articles/16732/AI-Neural-Network-for-Beginners-Part-of
文中的这个表达式有误：

正确的写法是bias在sigma之外，或者由w0x0（i从0开始）替换。

三部分里面第一部分和第二部分前半段，关于生物学神经元的基本原理和人工神经元（文中perceptron）的基本原理阐述的很好，后面部分从Learning Algorithm开始，Learning Algorithm直接给出结果没有阐述推导过程，接下来的代码(语言为C#)也没去看了。

以下是全文：

Article

Add your own
alternative version

Tagged as

C#.NETWindowsVisual-StudioDevAI

Stats

715.6K views

1 download

503 bookmarked

Posted 16 Nov 2006

Introduction

This article is Part 1 of a series of 3 articles that I am going to post. The proposed article content will be as follows:

Part 1: This one, will be an introduction into Perceptron networks (single layer neural networks)
Part 2: Will be about multi layer neural networks, and the back propogation training method to solve a non-linear classification problem such as the logic of an XOR logic gate. This is something that a Perceptron can't do. This is explained further within this article
Part 3: Will be about how to use a genetic algorithm (GA) to train a multi layer neural network to solve some logic problem

Let's start with some biology

Nerve cells in the brain are called neurons. There is an estimated 1010 to the power(1013) neurons in the human brain. Each neuron can make contact with several thousand other neurons. Neurons are the unit which the brain uses to process information.

So what does a neuron look like

A neuron consists of a cell body, with various extensions from it. Most of these are branches called dendrites. There is one much longer process (possibly also branching) called the axon. The dashed line shows the axon hillock, where transmission of signals starts

The following diagram illustrates this.

Figure 1 Neuron

The boundary of the neuron is known as the cell membrane. There is a voltage difference (the membrane potential) between the inside and outside of the membrane.

If the input is large enough, an action potential is then generated. The action potential (neuronal spike) then travels down the axon, away from the cell body.

Figure 2 Neuron Spiking

Synapses

The connections between one neuron and another are called synapses. Information always leaves a neuron via its axon (see Figure 1 above), and is then transmitted across a synapse to the receiving neuron.

Neuron Firing

Neurons only fire when input is bigger than some threshold. It should, however, be noted that firing doesn't get bigger as the stimulus increases, its an all or nothing arrangement.

Figure 3 Neuron Firing

Spikes (signals) are important, since other neurons receive them. Neurons communicate with spikes. The information sent is coded by spikes.

The input to a Neuron

Synapses can be excitatory or inhibitory.

Spikes (signals) arriving at an excitatory synapse tend to cause the receiving neuron to fire. Spikes (signals) arriving at an inhibitory synapse tend to inhibit the receiving neuron from firing.

The cell body and synapses essentially compute (by a complicated chemical/electrical process) the difference between the incoming excitatory and inhibitory inputs (spatial and temporal summation).

When this difference is large enough (compared to the neuron's threshold) then the neuron will fire.

Roughly speaking, the faster excitatory spikes arrive at its synapses the faster it will fire (similarly for inhibitory spikes).

So how about artificial neural networks

Suppose that we have a firing rate at each neuron. Also suppose that a neuron connects with m other neurons and so receives m-many inputs "x1 …. … xm", we could imagine this configuration looking something like:

Figure 4 Artificial Neuron configuration

This configuration is actually called a Perceptron. The perceptron (an invention of Rosenblatt [1962]), was one of the earliest neural network models. A perceptron models a neuron by taking a weighted sum of inputs and sending the output 1, if the sum is greater than some adjustable threshold value (otherwise it sends 0 - this is the all or nothing spiking described in the biology, see neuron firing section above) also called an activation function.

The inputs (x1,x2,x3..xm) and connection weights (w1,w2,w3..wm) in Figure 4 are typically real values, both postive (+) and negative (-). If the feature of some xi tends to cause the perceptron to fire, the weight wi will be positive; if the feature xi inhibits the perceptron, the weight wi will be negative.

The perceptron itself, consists of weights, the summation processor, and an activation function, and an adjustable threshold processor (called bias here after).

For convenience the normal practice is to treat the bias, as just another input. The following diagram illustrates the revised configuration.

Figure 5 Artificial Neuron configuration, with bias as additinal input

The bias can be thought of as the propensity (a tendency towards a particular way of behaving) of the perceptron to fire irrespective of its inputs. The perceptron configuration network shown in Figure 5 fires if the weighted sum > 0, or if you're into math-type explanations

Activation Function

The activation usually uses one of the following functions.

Sigmoid Function

The stronger the input, the faster the neuron fires (the higher the firing rates). The sigmoid is also very useful in multi-layer networks, as the sigmoid curve allows for differentation (which is required in Back Propogation training of multi layer networks).

or if your into maths type explanations

Step Function

A basic on/off type function, if 0 > x then 0, else if x >= 0 then 1

or if your into math-type explanations

Learning

A foreword on learning

Before we carry on to talk about perceptron learning lets consider a real world example :

How do you teach a child to recognize a chair? You show him examples, telling him, "This is a chair. That is not a chair," until the child learns the concept of what a chair is. In this stage, the child can look at the examples we have shown him and answer correctly when asked, "Is this object a chair?"

Furthermore, if we show to the child new objects that he hasn't seen before, we could expect him to recognize correctly whether the new object is a chair or not, providing that we've given him enough positive and negative examples.

This is exactly the idea behind the perceptron.

Learning in perceptrons

Is the process of modifying the weights and the bias. A perceptron computes a binary function of its input. Whatever a perceptron can compute it can learn to compute.

"The perceptron is a program that learn concepts, i.e. it can learn to respond with True (1) or False (0) for inputs we present to it, by repeatedly "studying" examples presented to it.

The Perceptron is a single layer neural network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The training technique used is called the perceptron learning rule. The perceptron generated great interest due to its ability to generalize from its training vectors and work with randomly distributed connections. Perceptrons are especially suited for simple problems in pattern classification."

Professor Jianfeng feng, Centre for Scientific Computing, Warwick university, England.

The Learning Rule

The perceptron is trained to respond to each input vector with a corresponding target output of either 0 or 1. The learning rule has been proven to converge on a solution in finite time if a solution exists.

The learning rule can be summarized in the following two equations:

b = b + [ T - A ]

For all inputs i:

W(i) = W(i) + [ T - A ] * P(i)

Where W is the vector of weights, P is the input vector presented to the network, T is the correct result that the neuron should have shown, A is the actual output of the neuron, and b is the bias.

Training

Vectors from a training set are presented to the network one after another.

If the network's output is correct, no change is made.

Otherwise, the weights and biases are updated using the perceptron learning rule (as shown above). When each epoch (an entire pass through all of the input training vectors is called an epoch) of the training set has occured without error, training is complete.

At this time any input training vector may be presented to the network and it will respond with the correct output vector. If a vector, P, not in the training set is presented to the network, the network will tend to exhibit generalization by responding with an output similar to target vectors for input vectors close to the previously unseen input vector P.

So what can we use do with neural networks

Well if we are going to stick to using a single layer neural network, the tasks that can be achieved are different from those that can be achieved by multi-layer neural networks. As this article is mainly geared towards dealing with single layer networks, let's dicuss those further:

Single layer neural networks

Single-layer neural networks (perceptron networks) are networks in which the output unit is independent of the others - each weight effects only one output. Using perceptron networks it is possible to achieve linear seperability functions like the diagrams shown below (assuming we have a network with 2 inputs and 1 output)

It can be seen that this is equivalent to the AND / OR logic gates, shown below.

Figure 6 Classification tasks

So that's a simple example of what we could do with one perceptron (single neuron essentially), but what if we were to chain several perceptrons together? We could build some quite complex functionality. Basically we would be constructing the equivalent of an electronic circuit.

Perceptron networks do however, have limitations. If the vectors are not linearly separable, learning will never reach a point where all vectors are classified properly. The most famous example of the perceptron's inability to solve problems with linearly nonseparable vectors is the boolean XOR problem.

Multi layer neural networks

With muti-layer neural networks we can solve non-linear seperable problems such as the XOR problem mentioned above, which is not acheivable using single layer (perceptron) networks. The next part of this article series will show how to do this using muti-layer neural networks, using the back propogation training method.

Well that's about it for this article. I hope it's a nice introduction to neural networks. I will try and publish the other two articles when I have some spare time (in between MSc disseration and other assignments). I want them to be pretty graphical so it may take me a while, but i'll get there soon, I promise.

What Do You Think ?

Thats it, I would just like to ask, if you liked the article please vote for it.

Points of Interest

I think AI is fairly interesting, that's why I am taking the time to publish these articles. So I hope someone else finds it interesting, and that it might help further someones knowledge, as it has my own.

History

v1.0 17/11/06

Bibliography

Artificial Intelligence 2nd edition, Elaine Rich / Kevin Knight. McGraw Hill Inc.

Artificial Intelligence, A Modern Approach, Stuart Russell / Peter Norvig. Prentice Hall.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Sacha Barber

	Software Developer (Senior)
	United Kingdom

I currently hold the following qualifications (amongst others, I also studied Music Technology and Electronics, for my sins)

- MSc (Passed with distinctions), in Information Technology for E-Commerce
- BSc Hons (1st class) in Computer Science & Artificial Intelligence

Both of these at Sussex University UK.

Award(s)

I am lucky enough to have won a few awards for Zany Crazy code articles over the years

Microsoft C# MVP 2016
Codeproject MVP 2016
Microsoft C# MVP 2015
Codeproject MVP 2015
Microsoft C# MVP 2014
Codeproject MVP 2014
Microsoft C# MVP 2013
Codeproject MVP 2013
Microsoft C# MVP 2012
Codeproject MVP 2012
Microsoft C# MVP 2011
Codeproject MVP 2011
Microsoft C# MVP 2010
Codeproject MVP 2010
Microsoft C# MVP 2009
Codeproject MVP 2009
Microsoft C# MVP 2008
Codeproject MVP 2008
And numerous codeproject awards which you can see over at my blog

You may also be interested in...

AI: Neural Network for Beginners (Part 3 of 3)	SAPrefs - Netscape-like Preferences Dialog
AI : Neural Network for beginners (Part 2 of 3)	Generate and add keyword variations using AdWords API
10 Ways to Boost COBOL Application Development	Window Tabs (WndTabs) Add-In for DevStudio

Comments and Discussions

You must Sign In to use this message board.

Search Comments
	Spacing Layout Per page

General

News

Suggestion

Question

Bug

Answer

Joke

Praise

Rant

Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Go to top

Layout: fixed | fluid

Article

Add your own
alternative version

Tagged as

C#Windows.NET.NET2.0Visual-StudioVS2005DevAI

Stats

220.7K views

7.5K downloads

261 bookmarked

Posted 24 Nov 2006

Download demo project (includes source code) - 812 Kb

Introduction

This article is part 2 of a series of 3 articles that I am going to post. The proposed article content will be as follows:

Part 1 : Is an introduction into Perceptron networks (single layer neural networks).
Part 2 : This one, is about multi layer neural networks, and the back propagation training method to solve a non linear classification problem such as the logic of an XOR logic gate. This is something that a Perceptron can't do. This is explained further within this article.
Part 3 : Will be about how to use a genetic algorithm (GA) to train a multi layer neural network to solve some logic problem.

Summary

This article will show how to use a multi-layer neural network to solve the XOR logic problem.

A Brief Recap (From part 1 of 3)

Before we commence with the nitty gritty of this new article which deals with multi layer Neural Networks, let just revisit a few key concepts. If you haven't read Part 1, perhaps you should start there.

Perceptron Configuration ( Single layer network)

The inputs (x1,x2,x3..xm) and connection weights(w1,w2,w3..wm) shown below are typically real values, both positive (+) and negative (-).

The perceptron itself, consists of weights, the summation processor, an activation function, and an adjustable threshold processor (called bias here after).

For convenience, the normal practice is to treat the bias as just another input. The following diagram illustrates the revised configuration.

The bias can be thought of as the propensity (a tendency towards a particular way of behaving) of the perceptron to fire irrespective of it's inputs. The perceptron configuration network shown above fires if the weighted sum > 0, or if you have into maths type explanations

So that's the basic operation of a perceptron. But we now want to build more layers of these, so let's carry on to the new stuff.

So Now The New Stuff (More layers)

From this point on, anything that is being discussed relates directly to this article's code.

In the summary at the top, the problem we are trying to solve was how to use a multi-layer neural network to solve the XOR logic problem. So how is this done. Well it's really an incremental build on what Part 1 already discussed. So let's march on.

What does the XOR logic problem look like? Well, it looks like the following truth table:

Remember with a single layer (perceptron) we can't actually achieve the XOR functionality, as it is not linearly separable. But with a multi-layer network, this is achievable.

What Does The New Network Look Like

The new network that will solve the XOR problem will look similar to a single layer network. We are still dealing with inputs / weights / outputs. What is new is the addition of the hidden layer.

As already explained above, there is one input layer, one hidden layer and one output layer.

It is by using the inputs and weights that we are able to work out the activation for a given node. This is easily achieved for the hidden layer as it has direct links to the actual input layer.

The output layer, however, knows nothing about the input layer as it is not directly connected to it. So to work out the activation for an output node we need to make use of the output from the hidden layer nodes, which are used as inputs to the output layer nodes.

This entire process described above can be thought of as a pass forward from one layer to the next.

This still works like it did with a single layer network; the activation for any given node is still worked out as follows:

Where (wi is the weight(i), and Ii is the input(i) value)

You see it the same old stuff, no demons, smoke or magic here. It's stuff we've already covered.

So that's how the network looks/works. So now I guess you want to know how to go about training it.

Types Of Learning

There are essentially 2 types of learning that may be applied, to a Neural Network, which is "Reinforcement" and "Supervised"

Reinforcement

In Reinforcement learning, during training, a set of inputs is presented to the Neural Network, the Output is 0.75, when the target was expecting 1.0.

The error (1.0 - 0.75) is used for training ('wrong by 0.25').

What if there are 2 outputs, then the total error is summed to give a single number (typically sum of squared errors). Eg "your total error on all outputs is 1.76"

Note that this just tells you how wrong you were, not in which direction you were wrong.

Using this method we may never get a result, or it could be a case of 'Hunt the needle'.

NOTE : Part 3 of this series will be using a GA to train a Neural Network, which is Reinforcement learning. The GA simply does what a GA does, and all the normal GA phases to select weights for the Neural Network. There is no back propagation of values. The Neural Network is just good or just bad. As one can imagine, this process takes a lot more steps to get to the same result.

Supervised

In Supervised Learning the Neural Network is given more information.
Not just 'how wrong' it was, but 'in what direction it was wrong' like 'Hunt the needle' but where you are told 'North a bit', 'West a bit'.

So you get, and use, far more information in Supervised Learning, and this is the normal form of Neural Network learning algorithm. Back Propagation (what this article uses, is Supervised Learning)

Learning Algorithm

In brief, to train a multi-layer Neural Network, the following steps are carried out:

Start off with random weights (and biases) in the Neural Network
Try one or more members of the training set, see how badly the output(s) are compared to what they should be (compared to the target output(s))
Jiggle weights a bit, aimed at getting improvement on outputs
Now try with a new lot of the training set, or repeat again,
jiggling weights each time
Keep repeating until you get quite accurate outputs

This is what this article submission uses to solve the XOR problem. This is also called "Back Propagation" (normally called BP or BackProp)

Backprop allows you to use this error at output, to adjust the weights arriving at the output layer, but then also allows you to calculate the effective error 1 layer back, and use this to adjust the weights arriving there, and so on, back-propagating errors through any number of layers.

The trick is the use of a sigmoid as the non-linear transfer function (which was covered in Part 1. The sigmoid is used as it offers the ability to apply differentiation techniques.

Because this is nicely differentiable – it so happens that

Which in context of the article can be written as

delta_outputs[i] = outputs[i] * (1.0 - outputs[i]) * (targets[i] - outputs[i])

It is by using this calculation that the weight changes can be applied back through the network.

Things To Watch Out For

Valleys: Using the rolled ball metaphor, there may well be valleys like this, with steep sides and a gently sloping floor. Gradient descent tends to waste time swooshing up and down each side of the valley (think ball!)

So what can we do about this. Well we add a momentum term, that tends to cancel out the back and forth movements and emphasizes any consistent direction, then this will go down such valleys with gentle bottom-slopes much more successfully (faster)

Starting The Training

This is probably best demonstrated with a code snippet from the article's actual code:

Hide Shrink

Copy Code

/// <summary>
/// The main training. The expected target values are passed in to this
/// method as parameters, and the <see cref="NeuralNetwork">NeuralNetwork</see>
/// is then updated with small weight changes, for this training iteration
/// This method also applied momentum, to ensure that the NeuralNetwork is
/// nurtured into proceeding in the correct direction. We are trying to avoid valleys.
/// If you don't know what valleys means, read the articles associated text
/// </summary>
/// <param name="target">A double[] array containing the target value(s)</param>
private void train_network(double[] target)
{
    //get momentum values (delta values from last pass)
    double[] delta_hidden = new double[nn.NumberOfHidden + 1];
    double[] delta_outputs = new double[nn.NumberOfOutputs];

    // Get the delta value for the output layer
    for (int i = 0; i < nn.NumberOfOutputs; i++)
    {
        delta_outputs[i] =
        nn.Outputs[i] * (1.0 - nn.Outputs[i]) * (target[i] - nn.Outputs[i]);
    }
    // Get the delta value for the hidden layer
    for (int i = 0; i < nn.NumberOfHidden + 1; i++)
    {
        double error = 0.0;
        for (int j = 0; j < nn.NumberOfOutputs; j++)
        {
            error += nn.HiddenToOutputWeights[i, j] * delta_outputs[j];
        }
        delta_hidden[i] = nn.Hidden[i] * (1.0 - nn.Hidden[i]) * error;
    }
    // Now update the weights between hidden & output layer
    for (int i = 0; i < nn.NumberOfOutputs; i++)
    {
        for (int j = 0; j < nn.NumberOfHidden + 1; j++)
        {
            //use momentum (delta values from last pass),
            //to ensure moved in correct direction
            nn.HiddenToOutputWeights[j, i] += nn.LearningRate * delta_outputs[i] * nn.Hidden[j];
        }
    }
    // Now update the weights between input & hidden layer
    for (int i = 0; i < nn.NumberOfHidden; i++)
    {
        for (int j = 0; j < nn.NumberOfInputs + 1; j++)
        {
            //use momentum (delta values from last pass),
            //to ensure moved in correct direction
            nn.InputToHiddenWeights[j, i] += nn.LearningRate * delta_hidden[i] * nn.Inputs[j];
        }
    }
}

So Finally The Code

Well, the code for this article looks like the following class diagram (It's Visual Studio 2005 C#, .NET v2.0)

The main classes that people should take the time to look at would be :

NN_Trainer_XOR : Trains a Neural Network to solve the XOR problem
TrainerEventArgs : Training event args, for use with a GUI
NeuralNetwork : A configurable Neural Network
NeuralNetworkEventArgs : Training event args, for use with a GUI
SigmoidActivationFunction : A static method to provide the sigmoid activation function

The rest are a GUI I constructed simply to show how it all fits together.

NOTE : the demo project contains all code, so I won't list it here.

Code Demos

The DEMO application attached has 3 main areas which are described below:

LIVE RESULTS Tab

It can be seen that this has very nearly solved the XOR problem (You will probably never get it 100% accurate)

TRAINING RESULTS Tab

Viewing the training phase target/outputs together

Viewing the training phase errors

TRAINED RESULTS Tab

Viewing the trained target/outputs together

Viewing the trained errors

It is also possible to view the Neural Networks final configuration using the "View Neural Network Config" button. If people are interested in what weights the Neural Network ended up with, this is the place to look.

What Do You Think ?

That's it. I would just like to ask, if you liked the article, please vote for it.

Points of Interest

I think AI is fairly interesting, that's why I am taking the time to publish these articles. So I hope someone else finds it interesting, and that it might help further someone's knowledge, as it has my own.

Anyone that wants to look further into AI type stuff, that finds the content of this article a bit basic should check out Andrew Krillovs articles, at Andrew Krillov CP articles as his are more advanced, and very good. In fact anything Andrew seems to do, is very good.

History

v1.0 24/11/06

Bibliography

Artificial Intelligence 2nd edition, Elaine Rich / Kevin Knight. McGraw Hill Inc.
Artificial Intelligence, A Modern Approach, Stuart Russell / Peter Norvig. Prentice Hall.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Sacha Barber

	Software Developer (Senior)
	United Kingdom

I am lucky enough to have won a few awards for Zany Crazy code articles over the years

Microsoft C# MVP 2016
Codeproject MVP 2016
Microsoft C# MVP 2015
Codeproject MVP 2015
Microsoft C# MVP 2014
Codeproject MVP 2014
Microsoft C# MVP 2013
Codeproject MVP 2013
Microsoft C# MVP 2012
Codeproject MVP 2012
Microsoft C# MVP 2011
Codeproject MVP 2011
Microsoft C# MVP 2010
Codeproject MVP 2010
Microsoft C# MVP 2009
Codeproject MVP 2009
Microsoft C# MVP 2008
Codeproject MVP 2008
And numerous codeproject awards which you can see over at my blog

You may also be interested in...

Pro Demystifying Cloud Latency	Announcing Ubuntu and Wind River Pulsar support with Intel® IoT Developer Kit 5.0
AI: Neural Network for Beginners (Part 3 of 3)	Visual COBOL New Release: Small point. Big deal
AI : Neural Network for beginners (Part 1 of 3)	Using the Intel® Edison Module to Control Robots

Comments and Discussions

You must Sign In to use this message board.

Search Comments
	Spacing Layout Per page

General

News

Suggestion

Question

Bug

Answer

Joke

Praise

Rant

Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Go to top

Layout: fixed | fluid

Article

Add your own
alternative version

Tagged as

C#.NETWindowsVisual-StudioVS2005DevAI

Stats

235.5K views

5.9K downloads

260 bookmarked

Posted 11 Dec 2006

Download demo project (includes source code) - 812 KB

Introduction

This article is part 3 of a series of three articles that I am going to post. The proposed article content will be as follows:

Part 1: This one will be an introduction into Perceptron networks (single layer neural networks)
Part 2: Will be about multi-layer neural networks, and the back propagation training method to solve a non-linear classification problem such as the logic of an XOR logic gate. This is something that a Perceptron can't do. This is explained further within this article.
Part 3: This one is about how to use a genetic algorithm (GA) to train a multi-layer neural network to solve some logic problem, ;f you have never come across genetic algorithms, perhaps my other article located here may be a good place to start to learn the basics.

Summary

This article will show how to use a Microbial Genetic Algorithm to train a multi-layer neural network to solve the XOR logic problem.

A Brief Recap (From Parts 1 and 2)

Before we commence with the nitty griity of this new article which deals with multi-layer neural networks, let's just revisit a few key concepts. If you haven't read Part 1 or Part 2, perhaps you should start there.

Part 1: Perceptron Configuration (Single Layer Network)

The inputs (x1,x2,x3..xm) and connection weights (w1,w2,w3..wm) in figure 4 are typically real values, both positive (+) and negative (-). If the feature of some xi tends to cause the perceptron to fire, the weight wi will be positive; if the feature xi inhibits the perceptron, the weight wi will be negative.

The perceptron itself consists of weights, the summation processor, and an activation function, and an adjustable threshold processor (called bias hereafter).

For convenience, the normal practice is to treat the bias as just another input. The following diagram illustrates the revised configuration:

Part 2: Multi-Layer Configuration

The multi-layer network that will solve the XOR problem will look similar to a single layer network. We are still dealing with inputs / weights / outputs. What is new is the addition of the hidden layer.

As already explained above, there is one input layer, one hidden layer, and one output layer.

It is by using the inputs and weights that we are able to work out the activation for a given node. This is easily achieved for the hidden layer as it has direct links to the actual input layer.

The output layer, however, knows nothing about the input layer as it is not directly connected to it. So to work out the activation for an output node, we need to make use of the output from the hidden layer nodes, which are used as inputs to the output layer nodes.

This entire process described above can be thought of as a pass forward from one layer to the next.

This still works like it did with a single layer network; the activation for any given node is still worked out as follows:

where wi is the weight(i), and Ii is the input(i) value. You see it the same old stuff, no demons, smoke, or magic here. It's stuff we've already covered.

So that's how the network looks. Now I guess you want to know how to go about training it.

Learning

There are essentially two types of learning that may be applied to a neural network, which are "Reinforcement" and "Supervised".

Reinforcement

In Reinforcement learning, during training, a set of inputs is presented to the neural network. The output is 0.75 when the target was expecting 1.0. The error (1.0 - 0.75) is used for training ("wrong by 0.25"). What if there are two outputs? Then the total error is summed to give a single number (typically sum of squared errors). E.g., "your total error on all outputs is 1.76". Note that this just tells you how wrong you were, not in which direction you were wrong. Using this method, we may never get a result, or could be hunt the needle.

Using a generic algorithm to train a multi-layer neural network offers a Reinforcement type training arrangement, where the mutation is responsible for "jiggling the weights a bit". This is what this article is all about.

Supervised

In Supervised learning, the neural network is given more information. Not just "how wrong" it was, but "in what direction it was wrong", like "Hunt the needle", but where you are told "North a bit" "West a bit". So you get, and use, far more information in Supervised learning, and this is the normal form of neural network learning algorithm.

This training method is normally conducted using a Back Propagation training method, which I covered in Part 2, so if this is your first article of these three parts, and the back propagation method is of particular interest, then you should look there.

So Now the New Stuff

From this point on, anything that is being discussed relates directly to this article's code.

What is the problem we are trying to solve? Well, it's the same as it was for Part 2, it's the simple XOR logic problem. In fact, this articles content is really just an incremental build, on knowledge that was covered in Part 1 and Part 2, so let's march on.

For the benefit of those that may have only read this one article, the XOR logic problem looks like the following truth table:

Remember with a single layer (perceptron), we can't actually achieve the XOR functionality as it's not linearly separable. But with a multi-layer network, this is achievable.

So with this in mind, how are we going to achieve this? Well, we are going to use a Genetic Algorithm (GA from this point on) to breed a population of neural networks that will hopefully evolve to provide a solution to the XOR logic problem; that's the basic idea anyway.

So what does this all look like?

As can be seen from the figure above, what we are going to do is have a GA which will actually contain a population of neural networks. The idea being that the GA will jiggle the weights of the neural networks, within the population, in the hope that the jiggling of the weights will push the neural network population towards a solution to the XOR problem.

So How Does This Translate Into an Algorithm

The basic operation of the Microbial GA training is as follows:

Pick two genotypes at random
Compare scores (fitness) to come up with a winner and loser
Go along genotype, at each locus (point)
So only the loser gets changed, which gives a version of Elitism for free; this ensures the best in breed remains in the population.
- With some probability, copy from winner to loser (overwrite)
- With some probability, mutate that locus of the loser

That's it. That is the complete algorithm.

But there are some essential issues to be aware of when playing with GAs:

The genotype will be different for a different problem domain
The fitness function will be different for a different problem domain

These two items must be developed again whenever a new problem is specified. For example, if we wanted to find a person's favourite pizza toppings, the genotype and fitness would be different from that which is used for this article's problem domain.

These two essential elements of a GA (for this article problem domain) are specified below.

1. The Geneotype

For this article, the problem domain states that we had a population of neural networks. So I created a single dimension array of NeuralNetwork objects. This can be seen from the constructor code within the GA_Trainer_XOR object:

Hide Copy Code

//ANN's
private NeuralNetwork[] networks;

public GA_Trainer_XOR()
{
    networks = new NeuralNetwork[POPULATION];
    //create new ANN objects, random weights applied at start
    for (int i = 0; i <= networks.GetUpperBound(0); i++)
    {
       networks[i] = new NeuralNetwork(2, 2, 1);
       networks[i].Change += 
         new NeuralNetwork.ChangeHandler(GA_Trainer_NN_Change);
    }
}

2. The Fitness Function

Remembering the problem domain description stated, the following truth table is what we are trying to achieve:

So how can we tell how fit (how close) the neural network is to this ? It is fairly simply really. What we do is present the entire set of inputs to the Neural Network one at a time and keep an accumulated error value, which is worked out as follows:

Within the NeuralNetwork class, there is a getError(..) method like this:

Hide Copy Code

public double getError(double[] targets)
{
    //storage for error
    double error = 0.0;
    //this calculation is based on something I read about weight space in
    //Artificial Intellegence - A Modern Approach, 2nd edition.Prentice Hall
    //2003. Stuart Rusell, Peter Norvig. Pg 741
    error = Math.Sqrt(Math.Pow((targets[0] - outputs[0]), 2));
    return error;
}

Then in the NN_Trainer_XOR class, there is an Evaluate method that accepts an int value which represents the member of the population to fetch and evaluate (get fitness for). This overall fitness is then returned to the GA training method to see which neural network should be the winner and which neural network should be the loser.

Hide Copy Code

private double evaluate(int popMember)
{
    double error = 0.0;
    //loop through the entire training set
    for (int i = 0; i <= train_set.GetUpperBound(0); i++)
    {
        //forward these new values through network
        //forward weights through ANN
        forwardWeights(popMember, getTrainSet(i));
        double[] targetValues = getTargetValues(getTrainSet(i));
        error += networks[popMember].getError(targetValues);
    }
    //if the Error term is < acceptableNNError value we have found
    //a good configuration of weights for teh NeuralNetwork, so tell
    //GA to stop looking
    if (error < acceptableNNError)
    {
        bestConfiguration = popMember;
        foundGoodANN = true;
    }
    //return error
    return error;
}

So how do we know when we have a trained neural network? In this article's code, what I have done is provide a fixed limit value within the NN_Trainer_XOR class that, when reached, indicates that the training has yielded a best configured neural network.

If, however, the entire training loop is done and there is still no well-configured neural network, I simply return the value of the winner (of the last training epoch) as the overall best configured neural network.

This is shown in the code snippet below; this should be read in conjunction with the evaluate(..) method shown above:

Hide Copy Code

//check to see if there was a best configuration found, may not have done
//enough training to find a good NeuralNetwork configuration, so will simply
//have to return the WINNER
if (bestConfiguration == -1)
{
    bestConfiguration = WINNER;
}
//return the best Neural network
return networks[bestConfiguration];

So Finally the Code

Well, the code for this article looks like the following class diagram (it's Visual Studio 2005, C#, .NET v2.0):

The main classes that people should take the time to look at would be:

GA_Trainer_XOR: Trains a neural network to solve the XOR problem using a Microbial GA.
TrainerEventArgs: Training event args, for use with a GUI.
NeuralNetwork: A configurable neural network.
NeuralNetworkEventArgs: Training event args, for use with a GUI.
SigmoidActivationFunction: A static method to provide the sigmoid activation function.

The rest are the GUI I constructed simply to show how it all fits together.

Note: The demo project contains all code, so I won't list it here. Also note that most of these classes are quite similar to those included with the Part 2 article code. I wanted to keep the code similar so people who have already looked at Part 2 would recognize the common pattern.

Code Demos

The demo application attached has three main areas which are described below:

Live Results Tab

It can be seen that this has very nearly solved the XOR problem; it did however take nearly 45000 iterations (epoch) of a training loop. Remembering that we have to also present the entire training set to the network, and also do this twice, once to find a winner and once to find a loser. That is quite a lot of work; I am sure you would all agree. This is why neural networks are not normally trained by GAs; this article is really about how to apply a GA to a problem domain. Because the GA training took 45000 epochs to yield an acceptable result does not mean that GAs are useless. Far from it, GAs have their place, and can be used for many problems, such as:

Sudoko solver (the popular game)
Backpack problem (trying to optimize the use of a backpack of limited size, to get as many items in as will fit)
Favourite pizza toppings problem (try and find out what someone's favourite pizza is)

To name but a few, basically, if you can come up with the genotype and a Fitness function, you should be able to get a GA to work out a solution. GAs have also been used to grow entire syntax trees of grammar, in order to predict which grammar is more optimal. There is more research being done in this area as I write this article; in fact, there is a nice article on this topic (Gene Expression Programming) by Andrew Krillov, right here at the CodeProject, if anyone wants to read further.

Training Results Tab

Viewing the target/outputs together:

Viewing the errors:

Trained Results Tab

Viewing the target/outputs together:

It is also possible to view the neural network's final configuration using the "View Neural Network Config" button.

What Do You Think?

That is it; I would just like to ask, if you liked the article, please vote for it.

Points of Interest

Anyone that wants to look further into AI type stuff, that finds the content of this article a bit basic, should check out Andrew Krillov's articles at Andrew Krillov CP articles as his are more advanced, and very good.

History

v1.1: 27/12/06: Modified the GA_Trainer_XOR class to have a random number seed of 5.
v1.0: 11/12/06: Initial article.

Bibliography

Artificial Intelligence 2nd edition, Elaine Rich / Kevin Knight. McGraw Hill Inc.
Artificial Intelligence, A Modern Approach, Stuart Russell / Peter Norvig. Prentice Hall.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Sacha Barber

	Software Developer (Senior)
	United Kingdom

I am lucky enough to have won a few awards for Zany Crazy code articles over the years

Microsoft C# MVP 2016
Codeproject MVP 2016
Microsoft C# MVP 2015
Codeproject MVP 2015
Microsoft C# MVP 2014
Codeproject MVP 2014
Microsoft C# MVP 2013
Codeproject MVP 2013
Microsoft C# MVP 2012
Codeproject MVP 2012
Microsoft C# MVP 2011
Codeproject MVP 2011
Microsoft C# MVP 2010
Codeproject MVP 2010
Microsoft C# MVP 2009
Codeproject MVP 2009
Microsoft C# MVP 2008
Codeproject MVP 2008
And numerous codeproject awards which you can see over at my blog

You may also be interested in...

Pro Demystifying Cloud Latency	Announcing Ubuntu and Wind River Pulsar support with Intel® IoT Developer Kit 5.0
AI : Neural Network for beginners (Part 1 of 3)	10 Ways to Boost COBOL Application Development
AI : Neural Network for beginners (Part 2 of 3)	Using the Intel® Edison Module to Control Robots

Comments and Discussions

Add a Comment or Question Search Comments
	Spacing Layout Per page

General

News

Suggestion

Question

Bug

Answer

Joke

Praise

Rant

Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Go to top

Layout: fixed | fluid

Labels: AI, neural networks

csatblogspotdotcom

Thursday, January 12, 2017

(转) AI : Neural Network for beginners

Tagged as

Stats

AI : Neural Network for beginners (Part 1 of 3)

Introduction

Let's start with some biology

So what does a neuron look like

Synapses

Neuron Firing

The input to a Neuron

So how about artificial neural networks

Activation Function

Sigmoid Function

Step Function

Learning

A foreword on learning

Learning in perceptrons

The Learning Rule

Training

So what can we use do with neural networks

Single layer neural networks

Multi layer neural networks

What Do You Think ?

Points of Interest

History

Bibliography

License

Share

About the Author

You may also be interested in...

Comments and Discussions

Tagged as

Stats

AI : Neural Network for beginners (Part 2 of 3)

Introduction

Summary

A Brief Recap (From part 1 of 3)

Perceptron Configuration ( Single layer network)

So Now The New Stuff (More layers)

What Does The New Network Look Like

Types Of Learning

Reinforcement

Supervised

Learning Algorithm

Things To Watch Out For

Starting The Training

So Finally The Code

NOTE : the demo project contains all code, so I won't list it here.

Code Demos

LIVE RESULTS Tab

TRAINING RESULTS Tab

TRAINED RESULTS Tab

What Do You Think ?

Points of Interest

History

Bibliography

License

Share

About the Author

You may also be interested in...

Comments and Discussions

Tagged as

Stats

AI: Neural Network for Beginners (Part 3 of 3)

Introduction

Summary

A Brief Recap (From Parts 1 and 2)

Part 1: Perceptron Configuration (Single Layer Network)

Part 2: Multi-Layer Configuration

Learning

Reinforcement

Supervised

So Now the New Stuff

So How Does This Translate Into an Algorithm

1. The Geneotype

2. The Fitness Function

So Finally the Code

Code Demos