2013/01/30

[Python] theano.tensor.grad

theano.tensor で書いた関数を自動的に微分することができる。
ただし、関数で計算した結果得られる値はスカラーでなければならない。
ベクトルや行列になっているとエラーとなる。

gradを使った例を次に示す。

import numpy
import theano
import theano.tensor as T

x=T.dscalar('x')
v=T.dvector('v')
M=T.dmatrix('M')

y1=x**3
y2=T.dot(v,v)*x
y3=T.sum(T.dot(M,M))

gy1=T.grad(y1,x)
gy2=T.grad(y2,v)
gy3=T.grad(y3,M)

f1=theano.function([x],gy1)
f2=theano.function([x,v],gy2)
f3=theano.function([M],gy3)

print "f1  :",theano.pp(f1.maker.fgraph.outputs[0])
print "f2  :",theano.pp(f2.maker.fgraph.outputs[0])
print "f3  :",theano.pp(f3.maker.fgraph.outputs[0])

print "f1(2) =",f1(2)
print "f2(2,[3,5]) =",f2(2,[3,5])
print "f3([[1,2],[3,4]]) =",f3([[1,2],[3,4]])

実行すると、
f1  : Elemwise{Composite{[mul(i0, sqr(i1))]}}(TensorConstant{3.0}, x)
f2  : Elemwise{Composite{[add(*1 -> mul(i0, i1), *1)]}}(x, v)
f3  : gemm_inplace(_dot22(alloc(TensorConstant{(1, 1) of 1.0}, Shape_i{0}(M), Shape_i{1}(M)), M.T), TensorConstant{1.0},
 M.T, alloc(TensorConstant{(1, 1) of 1.0}, Shape_i{0}(M), Shape_i{1}(M)), TensorConstant{1.0})
f1(2) = 12.0
f2(2,[3,5]) = [ 12.  20.]
f3([[1,2],[3,4]]) = [[  7.  11.]
 [  9.  13.]]
が得られる。f1 の出力を見ると、mul が掛け算で sqr が2乗、i0=TensorConstant{3.0}、
i1=x であるから、y1 を x で微分した結果である 3*x^2 が得られていることがわかる。
f2 と f3 は複雑すぎて読み方がわからないが、具体的な値を入れた結果は
正しく出力されている。



エラーが出る例として、ベクトルが戻り値になるような関数を試してみる。
import numpy
import theano
import theano.tensor as T

x=T.dscalar('x')
v=T.dvector('v')
y=v*x
gy=T.grad(y,x)
すると、
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    gy=T.grad(y,x)
  File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line 411, in grad
    raise TypeError("cost must be a scalar.")
TypeError: cost must be a scalar.
といわれて微分できない。

2013/01/24

[Python] theano.tensor.fill

fill の挙動がわからないので調べてみた。

import numpy
import theano
import theano.tensor as T

x=T.dscalar('x')
xc=T.dscalar('xc')
y1=3*x
y2=xc*x
y3=T.fill(xc,3)*x

f1=theano.function([x],y1)
f2=theano.function([x,xc],y2)
f3=theano.function([x,xc],y3)

print "x=",theano.pp(x)
print "y1=",theano.pp(y1)
print "y2=",theano.pp(y2)
print "y3=",theano.pp(y3)
print "f1=",theano.pp(f1.maker.fgraph.outputs[0])
print "f2=",theano.pp(f2.maker.fgraph.outputs[0])
print "f3=",theano.pp(f3.maker.fgraph.outputs[0])
print "f1(2)=",f1(2)
print "f2(2,3)=",f2(2,3)
print "f3(2,3)=",f3(2,3)
print "f3(2,100)=",f3(2,100)

実行すると、
x= x
y1= (TensorConstant{3} * x)
y2= (xc * x)
y3= (fill(xc, TensorConstant{3}) * x)
f1= (TensorConstant{3.0} * x)
f2= (xc * x)
f3= (TensorConstant{3.0} * x)
f1(2)= 6.0
f2(2,3)= 6.0
f3(2,3)= 6.0
f3(2,100)= 6.0
が得られた。

f3(2,3)もf3(2,100)も同じ値を返していることから、
fill は第1引数を第2引数の値に固定した結果を返しているように見える。
何に使うのだろうか。

ところで、
f3=theano.function([x,xc],y3)
f3=theano.function([x],y3)
に置き換えると
theano.gof.fg.MissingInputError: ('An input of the graph, used to compute Elemwise{second,no_inplace}(xc, TensorConstant{3}), was not provided and not given a value', xc)
というエラーが出力される。
値が与えられてない変数が残っていると、たとえfillを使って値を与えていても
関数化するのは無理なようだ。

2013/01/20

流行のDNN

なんか、DNNが流行っているらしい。
というわけで、探してみると日本語の概要があった。
http://www.slideshare.net/mokemokechicken/pythondeep-learning
http://www.slideshare.net/tushuhei/121227deep-learning-iitsuka

Python+Theanoを使うと簡単なそうな。
実装例が
http://deeplearning.net/tutorial/
にあるので、まずは準備で、
http://deeplearning.net/tutorial/gettingstarted.html
に書いてあるとおり、githubからcloneでソースを取得する。

チュートリアルどおり、実験用の手書き数字画像としてMNIST datasetをダウンロードする。
gitで取ってきたなかの、
data/download.sh
を実行するとデータがダウンロードされる。

また、

$ cd doc
$ make
を実行すると チュートリアルのpdf版が html/deeplearning.pdf に作成される。

使い方はまだよく調べていないが、次のように実行すると、
とりあえずテスト実行できるようだ。
$ python
>>> import test
>>> test.speed()


実行した結果は、以下。一部省略。
-- logistic_sgd のログの一部(float64) --
epoch 30, minibatch 83/83, validation error 8.031250 %
     epoch 30, minibatch 83/83, test error of best model 7.843750 %
Optimization complete with best validation score of 8.031250 %,with test performance 7.843750 %
The code run for 30 epochs, with 1.299264 epochs/sec
The code for file logistic_sgd.pyc ran for 23.1s

-- logistic_cg のログの一部(float64) --
validation error 7.927083 %
Optimization complete with best validation score of 7.927083 %, with test performance 8.041667 %
The code for file logistic_cg.pyc ran for 53.4s

-- mlp のログの一部(float64) --
epoch 5, minibatch 2500/2500, validation error 7.300000 %
     epoch 5, minibatch 2500/2500, test error of best model 7.590000 %
Optimization complete. Best validation score of 7.300000 % obtained at iteration 14999, with test performance 7.590000 %
The code for file mlp.pyc ran for 2.88m

-- convolutional_mlp のログの一部(float64) --
epoch 5, minibatch 100/100, validation error 6.430000 %
     epoch 5, minibatch 100/100, test error of best model 6.920000 %
Optimization complete.
Best validation score of 6.430000 % obtained at iteration 599,with test performance 6.920000 %
The code for file convolutional_mlp.pyc ran for 2.26m

-- dA のログ(float64) --
... loading data
Training epoch 0, cost  63.2891694201
Training epoch 1, cost  55.7866565443
The no corruption code for file dA.pyc ran for 2.05m
Training epoch 0, cost  81.7714190632
Training epoch 1, cost  73.4285756365
The 30% corruption code for file dA.pyc ran for 2.06m

-- SdA のログ(float64) --
... loading data
... building the model
... getting the pretraining functions
... pre-training the model
Pre-training layer 0, epoch 0, cost  194.503144937
Pre-training layer 1, epoch 0, cost  695.507788509
Pre-training layer 2, epoch 0, cost  529.03645135
The pretraining code for file SdA.pyc ran for 6.89m
... getting the finetuning functions
... finetunning the model
epoch 0, minibatch 166/166, validation error 14.868687 %
     epoch 0, minibatch 166/166, test error of best model 15.727273 %
epoch 1, minibatch 166/166, validation error 11.595960 %
     epoch 1, minibatch 166/166, test error of best model 11.717172 %
Optimization complete with best validation score of 11.595960 %,with test performance 11.717172 %
The training code for file SdA.pyc ran for 6.70m

-- DBN のログ(float64) --
... loading data
... building the model
... getting the pretraining functions
... pre-training the model
Pre-training layer 0, epoch 0, cost  -165.566548661
Pre-training layer 1, epoch 0, cost  -620.030667461
Pre-training layer 2, epoch 0, cost  -133.876169806
The pretraining code for file DBN.pyc ran for 7.84m
... getting the finetuning functions
... finetunning the model
epoch 1, minibatch 166/166, validation error 28.373737 %
     epoch 1, minibatch 166/166, test error of best model 29.848485 %
epoch 2, minibatch 166/166, validation error 20.272727 %
     epoch 2, minibatch 166/166, test error of best model 21.424242 %
Optimization complete with best validation score of 20.272727 %,with test performance 21.424242 %
The fine tuning code for file DBN.pyc ran for -64.90m

-- rbm のログ(float64) --
... loading data
WARNING (theano.tensor.opt): Your current code is fine, but Theano versions prior to 0.5rc2 might have given an incorrect result. To disable this warning, set the Theano flag warn.subtensor_merge_bug to False.

Training epoch 0, cost is  -53.1045703345
Training took 17.727500 minutes
WARNING (theano.tensor.opt): Your current code is fine, but Theano versions prior to 0.5rc2 might have given an incorrect result. To disable this warning, set the Theano flag warn.subtensor_merge_bug to False.
 ... plotting sample  0
['logistic_sgd', 'logistic_cg', 'mlp', 'convolutional_mlp', 'dA', 'SdA', 'DBN', 'rbm']
float64 times [   24.00578618    55.60583591   175.75342607   136.53151321   248.71974111
   823.29348183   874.02775431  1079.12984514]
float64 expected [10.300000000000001, 23.699999999999999, 78.099999999999994, 73.700000000000003, 116.40000000000001, 346.89999999999998, 381.89999999999998, 558.10000000000002]
float64 % expected/get [ 0.42906322  0.42621426  0.44437256  0.53980212  0.46799663  0.42135643
  0.43694265  0.51717595]

-- float32の場合の最後の方のログ --
 ... plotting sample  0
['logistic_sgd', 'logistic_cg', 'mlp', 'convolutional_mlp', 'dA', 'SdA', 'DBN', 'rbm']
float32 times [   21.94568563    49.98527789   177.91128731   136.48723054   291.68081236
  1048.11308694  1128.05673623  1302.78659558]
float32 expected [11.6, 29.600000000000001, 47.200000000000003, 66.5, 71.0, 191.19999999999999, 226.80000000000001, 432.80000000000001]
float32 % expected/get [ 0.5285777   0.59217436  0.26530076  0.48722507  0.24341677  0.18242306
  0.20105372  0.33221097]
float64/float32 [ 1.09387269  1.11244427  0.98787114  1.00032445  0.85271204  0.78550062
  0.77480833  0.82832434]

Duplicate the timing to have everything in one place
['logistic_sgd', 'logistic_cg', 'mlp', 'convolutional_mlp', 'dA', 'SdA', 'DBN', 'rbm']
float64 times [   24.00578618    55.60583591   175.75342607   136.53151321   248.71974111
   823.29348183   874.02775431  1079.12984514]
float64 expected [10.300000000000001, 23.699999999999999, 78.099999999999994, 73.700000000000003, 116.40000000000001, 346.89999999999998, 381.89999999999998, 558.10000000000002]
float64 % expected/get [ 0.42906322  0.42621426  0.44437256  0.53980212  0.46799663  0.42135643
  0.43694265  0.51717595]
float32 times [   21.94568563    49.98527789   177.91128731   136.48723054   291.68081236
  1048.11308694  1128.05673623  1302.78659558]
float32 expected [11.6, 29.600000000000001, 47.200000000000003, 66.5, 71.0, 191.19999999999999, 226.80000000000001, 432.80000000000001]
float32 % expected/get [ 0.5285777   0.59217436  0.26530076  0.48722507  0.24341677  0.18242306
  0.20105372  0.33221097]
float64/float32 [ 1.09387269  1.11244427  0.98787114  1.00032445  0.85271204  0.78550062
  0.77480833  0.82832434]
expected float64/float32 [ 0.46934054  0.47413961  0.43898283  0.53997725  0.39906636  0.33097574
  0.3385468   0.42838942]
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.

 ... plotting sample  0
['logistic_sgd', 'logistic_cg', 'mlp', 'convolutional_mlp', 'dA', 'SdA', 'DBN', 'rbm']
gpu times [   23.64237881    50.57395744   185.60482621   139.12935448   289.82814217
  1050.09374905  1123.013515    1268.43072391]
gpu expected [3.0766348799999998, 7.5552349100000002, 18.992267850000001, 9.5999999999999996, 24.130070450000002, 20.399999999999999, 56.0, 302.60000000000002]
gpu % expected/get [ 0.1301322   0.14938983  0.10232637  0.06900054  0.08325648  0.01942684
  0.04986583  0.2385625 ]
float64/gpu [ 1.01537102  1.09949545  0.94692272  0.98132787  0.85816284  0.78401903
  0.77828783  0.85075978]

Duplicate the timing to have everything in one place
['logistic_sgd', 'logistic_cg', 'mlp', 'convolutional_mlp', 'dA', 'SdA', 'DBN', 'rbm']
float64 times [   24.00578618    55.60583591   175.75342607   136.53151321   248.71974111
   823.29348183   874.02775431  1079.12984514]
float64 expected [10.300000000000001, 23.699999999999999, 78.099999999999994, 73.700000000000003, 116.40000000000001, 346.89999999999998, 381.89999999999998, 558.10000000000002]
float64 % expected/get [ 0.42906322  0.42621426  0.44437256  0.53980212  0.46799663  0.42135643
  0.43694265  0.51717595]
float32 times [   21.94568563    49.98527789   177.91128731   136.48723054   291.68081236
  1048.11308694  1128.05673623  1302.78659558]
float32 expected [11.6, 29.600000000000001, 47.200000000000003, 66.5, 71.0, 191.19999999999999, 226.80000000000001, 432.80000000000001]
float32 % expected/get [ 0.5285777   0.59217436  0.26530076  0.48722507  0.24341677  0.18242306
  0.20105372  0.33221097]
gpu times [   23.64237881    50.57395744   185.60482621   139.12935448   289.82814217
  1050.09374905  1123.013515    1268.43072391]
gpu expected [3.0766348799999998, 7.5552349100000002, 18.992267850000001, 9.5999999999999996, 24.130070450000002, 20.399999999999999, 56.0, 302.60000000000002]
gpu % expected/get [ 0.1301322   0.14938983  0.10232637  0.06900054  0.08325648  0.01942684
  0.04986583  0.2385625 ]
float64/float32 [ 1.09387269  1.11244427  0.98787114  1.00032445  0.85271204  0.78550062
  0.77480833  0.82832434]
expected float64/float32 [ 0.46934054  0.47413961  0.43898283  0.53997725  0.39906636  0.33097574
  0.3385468   0.42838942]
float64/gpu [ 1.01537102  1.09949545  0.94692272  0.98132787  0.85816284  0.78401903
  0.77828783  0.85075978]
expected float64/gpu [ 0.43565836  0.46862063  0.42078647  0.52972286  0.40161731  0.33035146
  0.34006715  0.4399925 ]
float32/gpu [ 0.92823509  0.98836003  0.95854882  0.98100959  1.00639231  0.99811382
  1.00449079  1.02708534]
expected float32/gpu [ 0.49064437  0.58528147  0.25430373  0.47797246  0.24497276  0.18207898
  0.20195661  0.34120902]
speed_failure_float64=8
speed_failure_float32=8
speed_failure_gpu=8



仮想マシン上のlinuxなので、CUDAが使えない…