我试图在两个神经网络之间添加一个亏损项,使它们尽可能相似,同时仍在执行不同的任务。我能找到的结论是本文中的答案:Pytorch: how to add L1 regularizer to activations?
但尝试解决办法我无法让它起作用。模型将两个模型训练到一个很好的精度,但是忽略了正则化(即使设置为一个疯狂的高值),而且两者之间的差别似乎一直在上升。我是否还需要用额外的正则化损失项来使它不被忽略呢?
我目前的最佳尝试如下:
def train_combined(nets, dataset_train, dataset_test, num_epochs, alpha=0):
criterion = nn.L1Loss()
optimizers = [optim.SGD(net.parameters(), lr=0.01, momentum=0.9 ) for net in nets]
trainloader = DataLoader(dataset_train, batch_size=32, shuffle=True )
train_losses = []
test_losses = []
for epoch in range(num_epochs): # loop over the dataset multiple times
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, *labels = data
inputs = inputs
# get the average of the paramaters between the two networks
with t.no_grad():
params = t.stack([t.cat(tuple(t.flatten(p.data) for p in net.parameters())) for net in nets])
avg = t.sum(params, dim=0)*0.5
# keep track of loss for both models
all_losses = np.zeros( 2 )
all_reg_losses = np.zeros( 2 )
all_final_losses = np.zeros( 2 )
# forward + backward + optimize
for i, (net, optimizer, label) in enumerate(zip(nets, optimizers, labels)):
optimizer.zero_grad()
# calculate normal loss
outputs = net(inputs)
loss = criterion(outputs, label)
# calculate regularization loss loss
params = t.cat(tuple(t.flatten(p.data) for p in net.parameters()))
regularization_loss = t.sum(t.abs( params - avg ))
regularization = regularization_loss * alpha
# calculate total loss
final_loss = loss + regularization
final_loss.backward()
optimizer.step()
# keep track of losses
all_losses[i] = float( loss.item() )
all_reg_losses[i] = float( 0 if (regularization == 0) else regularization.item() )
all_final_losses[i] = float( final_loss.item() )
# keep track of performance
train_losses.append( loss )
with t.no_grad():
for i in range(2):
test_losses.append( light_eval( nets[i], data_test, index=i ) )
# log performance each epoch
for i in range(2):
print("%3d" % (epoch+1), i, ':',
f' train loss = { ("%.4f "*3) % (all_losses[i], all_reg_losses[i], all_final_losses[i]) }',
f', test_losses = { "%.4f" % test_losses[-(2-i)] }')
print('Finished Training')
models = [ Net().to(device) for i in range(2) ]
train_combined( models, dataset_train, dataset_test, 50, alpha=1e-2 )我做错了什么?
发布于 2022-08-24 22:39:30
问题解决了。这个问题似乎来自于在获取p.data时使用p而不是p。
工作解决方案如下所示:
params = t.cat(tuple(t.flatten(p) for p in net.parameters()))
assert params.requires_grad
distance = criterion(params, avg)
regularization_loss = t.sum( distance )https://stackoverflow.com/questions/73411720
复制相似问题