首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >MNIST 2层神经网络无法识别某些数字

MNIST 2层神经网络无法识别某些数字
EN

Data Science用户
提问于 2020-12-14 03:36:47
回答 1查看 120关注 0票数 0

最近,我研究了一些基本的多层神经网络,我决定尝试使用隐藏层中有100个神经元的2层神经网络来处理手写体数字的MNIST数据库。

我的网络有一个使用ReLU的隐藏层,输出层使用sigmoid函数。成本由MSE计算,权重用SGD更新。下面是用MATLAB编写的训练网络的代码。

在训练这个网络时,我发现它不知怎么没能识别出数据集中的8和9。它在识别其他数字方面存在一些问题,但不像8或9那样多。

我怎样才能解决这个问题?

编辑:从随机的一组权重训练网络的结果是非常不一致的,而且往往精度很低。反向传播步骤是否有问题,还是网络结构不是最好的数字识别方法?

*dsigmoid和dReLU是它们各自函数的导数。

代码语言:javascript
复制
function [trainedWeights1, trainedWeights2] = twoLayerSGD(weights1, weights2, inputs, outputs, alpha, shuffle)
% Trains a 2 layer perceptron using stochastic gradient descent and MSE
%
% function [trainedWeights1, trainedWeights2] = twoLayerSGD(weights2, weights1, inputs, outputs, alpha, numHiddenLayers)
% Inputs:
%   weights1 = weights between input layer and hidden layer as a matrix
%              where row i and column j represents the weight between node
%              j of the input layer and node i of the hidden layer
%   weights2 = weights between hidden layer and output layer as a matrix
%              where row i and column j represents the weight between node j of the
%              input layer and node i of the hidden layer
%   inputs = inputs of the training data, where each set of inputs is
%            stored as a row vector
%   outputs = the correct outputs of the training data, where each set of
%             outputs is stored as a row vector
%   alpha = the learning rate, default is 0.005
%   shuffle = boolean value, determines whether the training cases will be
%             randomly ordered, default is false
%
% Outputs:
%   trainedWeights1 = the trained weight matrix for the weights between the
%                     input layer and hidden layer
%   trainedWeights2 = the trained weight matrix for the weights between the
%                     hidden layer and output layer

if ~exist('alpha','var') || isempty(alpha)
    alpha = 0.005;
end

if ~exist('shuffle','var') || isempty(shuffle)
    shuffle = false;
end

% Determining the size of the input data set
N = size(inputs,1);
fprintf('Data set size: %.0f\n', N);

% Shuffle input data
if shuffle
    fprintf('Shuffling data...\n')
    order = randperm(size(inputs, 1));
    inputs = inputs(order, :);
    outputs = outputs(order, :);
    fprintf('Shuffle complete\n')
end

% Train Model
fprintf('Starting training process...\n')
trainedWeights1 = weights1;
trainedWeights2 = weights2;

for dataSet = 1:N
    % Get inputs of current data set
    input = inputs(dataSet,:); % row vector
    
    % Calculate activations of hidden nodes and the output
    Z1 = trainedWeights1*input'; % column vector
    hiddenActivation = ReLU(Z1); % column vector
    Z2 = trainedWeights2*hiddenActivation; % column vector
    output = sigmoid(Z2)'; % row vector
    
    % Calculate the error/cost of the current weights
    correctOutput = outputs(dataSet,:); % row vector
    cost = sum((output - correctOutput).^2);
    
    % Update Weights
    trainedWeights2 = trainedWeights2 - alpha * 2*(output'-correctOutput') .* hiddenActivation' .* dsigmoid(Z2);
    S = sum(2*(output'-correctOutput') .* trainedWeights2 .* dsigmoid(Z2),1);
    trainedWeights1 = trainedWeights1 - alpha * S' .* input .* dReLU(Z1);
    
    % Display status
    if mod(dataSet, 1000) == 0
        fprintf('Data set %.0fk/%.0fk complete!\n',dataSet/1000,N/1000);
    end
end
end
EN

回答 1

Data Science用户

回答已采纳

发布于 2020-12-17 20:20:25

乙状结肠激活和MSE丢失通常不太好。首先,我将使用标准方法,即softmax和交叉熵损失.

票数 0
EN
页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://datascience.stackexchange.com/questions/86652

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档