最近,我研究了一些基本的多层神经网络,我决定尝试使用隐藏层中有100个神经元的2层神经网络来处理手写体数字的MNIST数据库。
我的网络有一个使用ReLU的隐藏层,输出层使用sigmoid函数。成本由MSE计算,权重用SGD更新。下面是用MATLAB编写的训练网络的代码。
在训练这个网络时,我发现它不知怎么没能识别出数据集中的8和9。它在识别其他数字方面存在一些问题,但不像8或9那样多。
我怎样才能解决这个问题?
编辑:从随机的一组权重训练网络的结果是非常不一致的,而且往往精度很低。反向传播步骤是否有问题,还是网络结构不是最好的数字识别方法?
*dsigmoid和dReLU是它们各自函数的导数。
function [trainedWeights1, trainedWeights2] = twoLayerSGD(weights1, weights2, inputs, outputs, alpha, shuffle)
% Trains a 2 layer perceptron using stochastic gradient descent and MSE
%
% function [trainedWeights1, trainedWeights2] = twoLayerSGD(weights2, weights1, inputs, outputs, alpha, numHiddenLayers)
% Inputs:
% weights1 = weights between input layer and hidden layer as a matrix
% where row i and column j represents the weight between node
% j of the input layer and node i of the hidden layer
% weights2 = weights between hidden layer and output layer as a matrix
% where row i and column j represents the weight between node j of the
% input layer and node i of the hidden layer
% inputs = inputs of the training data, where each set of inputs is
% stored as a row vector
% outputs = the correct outputs of the training data, where each set of
% outputs is stored as a row vector
% alpha = the learning rate, default is 0.005
% shuffle = boolean value, determines whether the training cases will be
% randomly ordered, default is false
%
% Outputs:
% trainedWeights1 = the trained weight matrix for the weights between the
% input layer and hidden layer
% trainedWeights2 = the trained weight matrix for the weights between the
% hidden layer and output layer
if ~exist('alpha','var') || isempty(alpha)
alpha = 0.005;
end
if ~exist('shuffle','var') || isempty(shuffle)
shuffle = false;
end
% Determining the size of the input data set
N = size(inputs,1);
fprintf('Data set size: %.0f\n', N);
% Shuffle input data
if shuffle
fprintf('Shuffling data...\n')
order = randperm(size(inputs, 1));
inputs = inputs(order, :);
outputs = outputs(order, :);
fprintf('Shuffle complete\n')
end
% Train Model
fprintf('Starting training process...\n')
trainedWeights1 = weights1;
trainedWeights2 = weights2;
for dataSet = 1:N
% Get inputs of current data set
input = inputs(dataSet,:); % row vector
% Calculate activations of hidden nodes and the output
Z1 = trainedWeights1*input'; % column vector
hiddenActivation = ReLU(Z1); % column vector
Z2 = trainedWeights2*hiddenActivation; % column vector
output = sigmoid(Z2)'; % row vector
% Calculate the error/cost of the current weights
correctOutput = outputs(dataSet,:); % row vector
cost = sum((output - correctOutput).^2);
% Update Weights
trainedWeights2 = trainedWeights2 - alpha * 2*(output'-correctOutput') .* hiddenActivation' .* dsigmoid(Z2);
S = sum(2*(output'-correctOutput') .* trainedWeights2 .* dsigmoid(Z2),1);
trainedWeights1 = trainedWeights1 - alpha * S' .* input .* dReLU(Z1);
% Display status
if mod(dataSet, 1000) == 0
fprintf('Data set %.0fk/%.0fk complete!\n',dataSet/1000,N/1000);
end
end
end发布于 2020-12-17 20:20:25
乙状结肠激活和MSE丢失通常不太好。首先,我将使用标准方法,即softmax和交叉熵损失.
https://datascience.stackexchange.com/questions/86652
复制相似问题