已知的XOR问题是由多层感知器解决的,给定所有4个布尔输入和输出,它训练和存储再现I/O等所需的权重。
import numpy as np
np.random.seed(0)
def sigmoid(x): # Returns values that sums to one.
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(sx):
# See https://math.stackexchange.com/a/1225116
return sx * (1 - sx)
# Cost functions.
def cost(predicted, truth):
return truth - predicted
xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
xor_output = np.array([[0,1,1,0]]).T
X = xor_input
Y = xor_output
# Define the shape of the weight vector.
num_data, input_dim = X.shape
# Lets set the dimensions for the intermediate layer.
hidden_dim = 5
# Initialize weights between the input layers and the hidden layer.
W1 = np.random.random((input_dim, hidden_dim))
# Define the shape of the output vector.
output_dim = len(Y.T)
# Initialize weights between the hidden layers and the output layer.
W2 = np.random.random((hidden_dim, output_dim))
num_epochs = 10000
learning_rate = 1.0
for epoch_n in range(num_epochs):
layer0 = X
# Forward propagation.
# Inside the perceptron, Step 2.
layer1 = sigmoid(np.dot(layer0, W1))
layer2 = sigmoid(np.dot(layer1, W2))
# Back propagation (Y -> layer2)
# How much did we miss in the predictions?
layer2_error = cost(layer2, Y)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_delta = layer2_error * sigmoid_derivative(layer2)
# Back propagation (layer2 -> layer1)
# How much did each layer1 value contribute to the layer2 error (according to the weights)?
layer1_error = np.dot(layer2_delta, W2.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1)
# update weights
W2 += learning_rate * np.dot(layer1.T, layer2_delta)
W1 += learning_rate * np.dot(layer0.T, layer1_delta)我们看到,我们已经对网络进行了充分的培训,以便记住XOR的输出:
# On the training data
[int(prediction > 0.5) for prediction in layer2] 输出
[0, 1, 1, 0]如果我们重新输入相同的输入,就会得到相同的输出:
for x, y in zip(X, Y):
layer1_prediction = sigmoid(np.dot(W1.T, x)) # Feed the unseen input into trained W.
prediction = layer2_prediction = sigmoid(np.dot(W2.T, layer1_prediction)) # Feed the unseen input into trained W.
print(int(prediction > 0.5), y)输出
0 [0]
1 [1]
1 [1]
0 [0]但是如果我们没有一个数据点对参数(W1和W2)进行重新训练,即
xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
xor_output = np.array([[0,1,1,0]]).T让我们删除最后一行数据,并将其用作不可见的测试。
X = xor_input[:-1]
Y = xor_output[:-1]对于其他相同的代码,无论我如何更改超参数,都无法学习XOR函数和复制I/O。
for x, y in zip(xor_input, xor_output):
layer1_prediction = sigmoid(np.dot(W1.T, x)) # Feed the unseen input into trained W.
prediction = layer2_prediction = sigmoid(np.dot(W2.T, layer1_prediction)) # Feed the unseen input into trained W.
print(int(prediction > 0.5), y)输出
0 [0]
1 [1]
1 [1]
1 [0]即使我们洗牌输入/输出:
# Shuffle the order of the inputs
_temp = list(zip(X, Y))
random.shuffle(_temp)
xor_input_shuff, xor_output_shuff = map(np.array, zip(*_temp))我们不能完全训练XOR函数:‘
for x, y in zip(xor_input, xor_output):
layer1_prediction = sigmoid(np.dot(W1.T, x)) # Feed the unseen input into trained W.
prediction = layer2_prediction = sigmoid(np.dot(W2.T, layer1_prediction)) # Feed the unseen input into trained W.
print(x, int(prediction > 0.5), y)输出
[0 0] 1 [0]
[0 1] 1 [1]
[1 0] 1 [1]
[1 1] 0 [0]因此,当文献指出多层感知器(也称基础深度学习)解决了异或问题时,是否意味着它可以完全学习和记忆给定全部输入/输出集的权重,但在缺少一个数据点?的情况下,不能推广异或问题。
下面是Kaggle数据集的链接,回答者可以自己测试网络:https://www.kaggle.com/alvations/xor-with-mlp/
发布于 2018-02-05 01:46:36
我认为学习(概括)异或和记忆异或是不同的。
两层感知器可以记忆异或,也就是说,在损失最小等于0(绝对最小)的情况下,存在一个权重组合。
如果这些权重是随机初始化的,那么最终可能会出现实际学习XOR而不仅仅是记忆的情况。
请注意,多层感知器是非凸函数,因此可能存在多个极小(多个全局极小)。当数据缺少一个输入时,就会有多个极小值(而且所有的值都相等),并且存在丢失点将被正确分类的极小值。因此,MLP可以学习异或。(不过,要找出权重组合可能很难,因为缺少了一点)。
神经网络是一种通用的函数逼近器,甚至可以逼近非意义标号。在这种情况下,您可能想看看这个工作,https://arxiv.org/abs/1611.03530
https://stackoverflow.com/questions/48614723
复制相似问题