神经元网络的权重初始化

24 提交 / 0个新回复
最新回复
神经元网络的权重初始化

普通的神经元网络的权重初始化是随机的
rmb和autoencoder可以得到更好的w1
我也进行了相关的实验,明白了这样的预训练的意义,对收敛的加速作用。
我想问的问题是:
既然sparse autoencoder学到的w1看起来非常类似或大或小的数字局部图像,
为什么不可以直接随机在训练集中抽一些样本,用他们的像素数据作为w1数据呢?
比如用10个0的平均点阵数据作为某个隐藏层神经元的的对应的权重。
10个0 的平均 - 10个6的平均就更像了
可以乘个比例稀疏,变小一点。
这样不说替代autoencder,
起码这样初始化autoencoder 可以收敛的快点
 
作者 think__123
 
程序

load mnist_uint8;

train_x = double(train_x(1:10000,:)) / 255;
test_x = double(test_x(1:10000,:)) / 255;
train_y = double(train_y(1:10000,:));
test_y = double(test_y(1:10000,:));
 

rng('default')
nn = nnsetup([784 196 784]);
nn.activation_function = 'sigm'; % Sigmoid activation function
nn.learningRate = 1; % Sigm require a lower learning rate
nn.weightPenaltyL2 = 3e-3; % L2 weight decay

nn.nonSparsityPenalty=0.5;
nn.sparsityTarget= 0.1;

opts.numepochs = 20; % Number of full sweeps through data
opts.batchsize = 100; % Take a mean gradient step over this many samples

% for i=1:196
% for j=1:784
% nn.W{1}(i,j+1)=train_x(i,j)-0.5;
% end
% end

nn = nntrain(nn, train_x, train_x, opts);
visualize(nn.W{1}(:,2:end)');

rng('default')
nn1 = nnsetup([784 196 30 10]);
nn1.W{1}=nn.W{1};
nn1.activation_function = 'sigm'; % Sigmoid activation function
nn1.learningRate = 1; % Sigm require a lower learning rate
opts1.numepochs = 10; % Number of full sweeps through data
opts1.batchsize = 100; % Take a mean gradient step over this many samples
nn1 = nntrain(nn1, train_x, train_y, opts1);
[er, bad] = nntest(nn1, test_x, test_y);
fprintf('ex1: %f\n',er);

这一段把权值初始化成训练集前196个图片的像素值
% for i=1:196
% for j=1:784
% nn.W{1}(i,j+1)=train_x(i,j)-0.5;
% end
% end

 

注释掉前面的一段,普通的autoencoder,结果
epoch 1/20. Took 6.2646 seconds. Mini-batch mean squared error on training set is 45.8097; Full-batch train err = 44.682200
epoch 2/20. Took 6.2589 seconds. Mini-batch mean squared error on training set is 44.8034; Full-batch train err = 45.405862
epoch 3/20. Took 6.2565 seconds. Mini-batch mean squared error on training set is 45.4828; Full-batch train err = 45.003434
epoch 4/20. Took 6.2676 seconds. Mini-batch mean squared error on training set is 44.4783; Full-batch train err = 43.699532
epoch 5/20. Took 6.3393 seconds. Mini-batch mean squared error on training set is 40.6535; Full-batch train err = 31.140229
epoch 6/20. Took 6.3261 seconds. Mini-batch mean squared error on training set is 24.6434; Full-batch train err = 20.231511
epoch 7/20. Took 6.2473 seconds. Mini-batch mean squared error on training set is 17.6681; Full-batch train err = 15.764289
epoch 8/20. Took 6.2586 seconds. Mini-batch mean squared error on training set is 14.7982; Full-batch train err = 13.898265
epoch 9/20. Took 6.271 seconds. Mini-batch mean squared error on training set is 13.3432; Full-batch train err = 12.654397
epoch 10/20. Took 6.2101 seconds. Mini-batch mean squared error on training set is 12.3963; Full-batch train err = 12.161691
epoch 11/20. Took 6.2599 seconds. Mini-batch mean squared error on training set is 12.0695; Full-batch train err = 11.748013
epoch 12/20. Took 6.2371 seconds. Mini-batch mean squared error on training set is 11.1609; Full-batch train err = 10.662571
epoch 13/20. Took 6.3086 seconds. Mini-batch mean squared error on training set is 10.5359; Full-batch train err = 10.301135
epoch 14/20. Took 6.275 seconds. Mini-batch mean squared error on training set is 10.0407; Full-batch train err = 9.749924
epoch 15/20. Took 6.299 seconds. Mini-batch mean squared error on training set is 9.6841; Full-batch train err = 9.444252
epoch 16/20. Took 6.2976 seconds. Mini-batch mean squared error on training set is 9.3998; Full-batch train err = 9.239191
epoch 17/20. Took 6.1581 seconds. Mini-batch mean squared error on training set is 9.1589; Full-batch train err = 9.040671
epoch 18/20. Took 6.2882 seconds. Mini-batch mean squared error on training set is 8.9649; Full-batch train err = 8.822645
epoch 19/20. Took 6.2051 seconds. Mini-batch mean squared error on training set is 8.7814; Full-batch train err = 8.647376
epoch 20/20. Took 6.2032 seconds. Mini-batch mean squared error on training set is 8.6312; Full-batch train err = 8.455579
epoch 1/10. Took 2.7435 seconds. Mini-batch mean squared error on training set is 0.40187; Full-batch train err = 0.208848
epoch 2/10. Took 2.7355 seconds. Mini-batch mean squared error on training set is 0.16113; Full-batch train err = 0.127411
epoch 3/10. Took 2.8344 seconds. Mini-batch mean squared error on training set is 0.11294; Full-batch train err = 0.098367
epoch 4/10. Took 2.87 seconds. Mini-batch mean squared error on training set is 0.092322; Full-batch train err = 0.083290
epoch 5/10. Took 2.7638 seconds. Mini-batch mean squared error on training set is 0.079848; Full-batch train err = 0.073627
epoch 6/10. Took 2.8281 seconds. Mini-batch mean squared error on training set is 0.071357; Full-batch train err = 0.066496
epoch 7/10. Took 2.8073 seconds. Mini-batch mean squared error on training set is 0.064648; Full-batch train err = 0.059970
epoch 8/10. Took 2.7639 seconds. Mini-batch mean squared error on training set is 0.059448; Full-batch train err = 0.055559
epoch 9/10. Took 2.7906 seconds. Mini-batch mean squared error on training set is 0.055195; Full-batch train err = 0.051803
epoch 10/10. Took 2.7447 seconds. Mini-batch mean squared error on training set is 0.051309; Full-batch train err = 0.048974
ex1: 0.063600

加上上面的初始化,结果
epoch 1/20. Took 6.2769 seconds. Mini-batch mean squared error on training set is 45.038; Full-batch train err = 43.455181
epoch 2/20. Took 6.7989 seconds. Mini-batch mean squared error on training set is 39.5563; Full-batch train err = 31.081169
epoch 3/20. Took 6.4376 seconds. Mini-batch mean squared error on training set is 22.8837; Full-batch train err = 16.479621
epoch 4/20. Took 6.1272 seconds. Mini-batch mean squared error on training set is 14.3783; Full-batch train err = 12.360571
epoch 5/20. Took 6.0906 seconds. Mini-batch mean squared error on training set is 11.248; Full-batch train err = 10.533576
epoch 6/20. Took 6.1384 seconds. Mini-batch mean squared error on training set is 9.9598; Full-batch train err = 9.628474
epoch 7/20. Took 6.4556 seconds. Mini-batch mean squared error on training set is 9.355; Full-batch train err = 9.418593
epoch 8/20. Took 6.5009 seconds. Mini-batch mean squared error on training set is 8.8441; Full-batch train err = 8.640105
epoch 9/20. Took 6.3617 seconds. Mini-batch mean squared error on training set is 8.5479; Full-batch train err = 8.390937
epoch 10/20. Took 6.3585 seconds. Mini-batch mean squared error on training set is 8.3111; Full-batch train err = 8.079984
epoch 11/20. Took 6.3012 seconds. Mini-batch mean squared error on training set is 8.1607; Full-batch train err = 8.024266
epoch 12/20. Took 6.3925 seconds. Mini-batch mean squared error on training set is 8.0145; Full-batch train err = 7.874561
epoch 13/20. Took 6.4009 seconds. Mini-batch mean squared error on training set is 7.906; Full-batch train err = 7.834201
epoch 14/20. Took 6.3638 seconds. Mini-batch mean squared error on training set is 7.8116; Full-batch train err = 7.675205
epoch 15/20. Took 6.333 seconds. Mini-batch mean squared error on training set is 7.7107; Full-batch train err = 7.637490
epoch 16/20. Took 6.2938 seconds. Mini-batch mean squared error on training set is 7.6476; Full-batch train err = 7.514303
epoch 17/20. Took 6.3722 seconds. Mini-batch mean squared error on training set is 7.5598; Full-batch train err = 7.553050
epoch 18/20. Took 6.3658 seconds. Mini-batch mean squared error on training set is 7.4918; Full-batch train err = 7.385632
epoch 19/20. Took 6.3587 seconds. Mini-batch mean squared error on training set is 7.428; Full-batch train err = 7.337475
epoch 20/20. Took 6.2882 seconds. Mini-batch mean squared error on training set is 7.3654; Full-batch train err = 7.221459
epoch 1/10. Took 2.8028 seconds. Mini-batch mean squared error on training set is 0.40361; Full-batch train err = 0.179936
epoch 2/10. Took 2.848 seconds. Mini-batch mean squared error on training set is 0.13758; Full-batch train err = 0.108879
epoch 3/10. Took 2.8721 seconds. Mini-batch mean squared error on training set is 0.097708; Full-batch train err = 0.085454
epoch 4/10. Took 2.8105 seconds. Mini-batch mean squared error on training set is 0.080341; Full-batch train err = 0.072266
epoch 5/10. Took 2.7828 seconds. Mini-batch mean squared error on training set is 0.069557; Full-batch train err = 0.063750
epoch 6/10. Took 2.8505 seconds. Mini-batch mean squared error on training set is 0.062013; Full-batch train err = 0.057262
epoch 7/10. Took 2.8524 seconds. Mini-batch mean squared error on training set is 0.056058; Full-batch train err = 0.052150
epoch 8/10. Took 2.8469 seconds. Mini-batch mean squared error on training set is 0.05162; Full-batch train err = 0.048101
epoch 9/10. Took 2.7587 seconds. Mini-batch mean squared error on training set is 0.04765; Full-batch train err = 0.044479
epoch 10/10. Took 2.7913 seconds. Mini-batch mean squared error on training set is 0.044136; Full-batch train err = 0.041629
ex1: 0.053800

总结
直接随机在训练集中抽一些样本,用他们的像素数据作为w1数据
能够加快 sparse autoencoder的收敛速度,形成更好的w1
在后面的有监督训练中得到更准确的结果
---
我不确认这个结论有普遍意义,毕竟实验太少。
 

呃! 好难懂!

对应的权重

发展中国家将放缓增长

平均就更像了

呃!好难懂的样子

可以收敛的快点
 

高手!

收敛的加速作用

w1看起来非常类似

初始化autoencoder 可以收敛的快点
 

呃!这么复杂 有人能看到吗?

对收敛的加速作用

隐藏层神经元的的对应的权重

这个真心不懂

这一段把权值初始化成训练集前196个图片的像素值

起码这样初始化autoencoder 可以收敛的快点

呃!有个说明就好了

平均就更像了

点阵数据作为某个隐藏层神经元的的对应的权重