首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >两种神经网络间的交叉正则化

两种神经网络间的交叉正则化
EN

Stack Overflow用户
提问于 2022-08-19 04:06:24
回答 1查看 32关注 0票数 0

我试图在两个神经网络之间添加一个亏损项,使它们尽可能相似,同时仍在执行不同的任务。我能找到的结论是本文中的答案:Pytorch: how to add L1 regularizer to activations?

但尝试解决办法我无法让它起作用。模型将两个模型训练到一个很好的精度,但是忽略了正则化(即使设置为一个疯狂的高值),而且两者之间的差别似乎一直在上升。我是否还需要用额外的正则化损失项来使它不被忽略呢?

我目前的最佳尝试如下:

代码语言:javascript
复制
def train_combined(nets, dataset_train, dataset_test, num_epochs, alpha=0):
  criterion = nn.L1Loss()
  optimizers = [optim.SGD(net.parameters(), lr=0.01, momentum=0.9 ) for net in nets]
  trainloader = DataLoader(dataset_train, batch_size=32, shuffle=True )

  train_losses = []
  test_losses  = []

  for epoch in range(num_epochs):  # loop over the dataset multiple times

    for i, data in enumerate(trainloader, 0):
      # get the inputs; data is a list of [inputs, labels]
      inputs, *labels = data
      inputs = inputs

      # get the average of the paramaters between the two networks
      with t.no_grad():
        params = t.stack([t.cat(tuple(t.flatten(p.data) for p in net.parameters())) for net in nets])
        avg = t.sum(params, dim=0)*0.5

      # keep track of loss for both models
      all_losses = np.zeros( 2 )
      all_reg_losses = np.zeros( 2 )
      all_final_losses = np.zeros( 2 )

      # forward + backward + optimize
      for i, (net, optimizer, label) in enumerate(zip(nets, optimizers, labels)):
        optimizer.zero_grad()

        # calculate normal loss
        outputs = net(inputs)
        loss = criterion(outputs, label)        
        
        # calculate regularization loss loss
        params = t.cat(tuple(t.flatten(p.data) for p in net.parameters()))
        regularization_loss = t.sum(t.abs( params - avg ))
        regularization =  regularization_loss * alpha
        
        # calculate total loss
        final_loss = loss + regularization
        final_loss.backward()
        optimizer.step()

        # keep track of losses
        all_losses[i] = float( loss.item() )
        all_reg_losses[i] = float( 0 if (regularization == 0) else regularization.item() )
        all_final_losses[i] = float( final_loss.item() )
      
      # keep track of performance
      train_losses.append( loss )
      with t.no_grad():
        for i in range(2):
          test_losses.append( light_eval( nets[i], data_test, index=i ) )

    # log performance each epoch
    for i in range(2):
      print("%3d" % (epoch+1),  i, ':',
            f'  train loss  = { ("%.4f "*3) % (all_losses[i], all_reg_losses[i], all_final_losses[i]) }',
            f', test_losses = { "%.4f" % test_losses[-(2-i)] }')
    

  print('Finished Training')

models = [ Net().to(device) for i in range(2) ]
train_combined( models, dataset_train, dataset_test, 50, alpha=1e-2 )

我做错了什么?

EN

回答 1

Stack Overflow用户

发布于 2022-08-24 22:39:30

问题解决了。这个问题似乎来自于在获取p.data时使用p而不是p

工作解决方案如下所示:

代码语言:javascript
复制
params = t.cat(tuple(t.flatten(p) for p in net.parameters()))
assert params.requires_grad

distance = criterion(params, avg)
regularization_loss = t.sum( distance )
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73411720

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档