はじめに

Inception score [1]を計算します。このスコアは、GAN (Generative Adversarial Network)が生成した画像の評価値として使われることがあります。 [1]の著者らによるTensorFlow版のコードが[2]にあります。Chainer版が[3]にあります。ここではKerasで試します。

Inception score

Inception scoreは、Inceptionモデルで識別しやすい画像であるほど、かつ、識別されるラベルのバリエーションが豊富であるほどスコアが高くなるように設計されたスコアです。

スコアは次のように計算します。\(x_i\)を\(i\)番目の画像データ、\(y\)をラベル、\(i\)番目の画像データをInceptionモデルに入力して得られるラベル\(y\)の確率を\(p(y|x_i)\)とします。スコアを計算するために使用する全ての画像データを\(X\)とすると、周辺確率は、 \[ p(y) = \frac{1}{|X|} \sum_{x_i \in X} p(y|x_i) \] と計算できます。\(p(y|x_i)\)と\(p(y)\)とのKL divergence [4]は \[ D_{\rm KL}(p(y|x_i) || p(y)) = \sum_{y \in Y} p(y|x_i) \log \frac{p(y|x_i)}{p(y)} \] です。これを全ての\(x_i \in X\)について平均して、expをとると、Inception scoreになります。具体的には、 \[ \exp\left(\frac{1}{|X|}\sum_{x_i \in X} D_{\rm KL}(p(y|x_i) || p(y))\right) \] です。

KL divergenceは分布間の差異を測定する尺度です。したがって、\(p(y|x_i)\)が\(y\)に関して凸凹であるほど、そして、\(p(y)\)は平坦であるほど差異が大きくなり、その結果、Inception scoreが大きくなります。凸凹であるとか平坦であるとかは相対的なものなので、図にすると、

のように2つの分布に差があるほど、Inception scoreが大きくなります。\(X\)に含まれている画像が特定のクラスに集中していて、かつ、高い確率で識別できている画像ばかりだと\(p(y|x_i)\)は凸凹になりますが、その場合は\(p(y)\)も同じような形状の凸凹の分布になるため、Inception scoreは小さくなります。逆に、\(X\)に含まれている画像が特定のクラスに集中していても、Inceptionモデルで識別した結果として得られるクラスの確率がどのクラスに対しても同じ程度であれば、\(p(y|x_i)\)の分布が平坦になります。平坦な分布を平均するとやはり平坦な分布になるため、\(p(y)\)は平坦になり、Inception scoreが小さくなります。

コード

Inception scoreそのものは関数inception_scoreに示すように簡単に計算できます。学習済みのInceptionモデルは入力サイズが(299,299)で固定なので、入力する画像はリサイズする必要があります。

# -*- coding: utf-8 -*-
import os, glob
import glob
import numpy as np
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.applications.imagenet_utils import decode_predictions
from keras.preprocessing import image
from keras.datasets import mnist
from PIL import Image as pil_image

model = InceptionV3() # Load a model and its weights

def resize_mnist(x):
    x_list = []
    for i in range(x.shape[0]):
        img = image.array_to_img(x[i, :, :, :].reshape(28, 28, -1))
        #img.save("mnist-{0:03d}.png".format(i))
        img = img.resize(size=(299, 299), resample=pil_image.LANCZOS)
        x_list.append(image.img_to_array(img))
    return np.array(x_list)

def resize_do_nothing(x):
    return x

def inception_score(x, resizer, batch_size=32):
    r = None
    n_batch = (x.shape[0]+batch_size-1) // batch_size
    for j in range(n_batch):
        x_batch = resizer(x[j*batch_size:(j+1)*batch_size, :, :, :])
        r_batch = model.predict(preprocess_input(x_batch)) # r has the probabilities for all classes
        r = r_batch if r is None else np.concatenate([r, r_batch], axis=0)
    p_y = np.mean(r, axis=0) # p(y)
    e = r*np.log(r/p_y) # p(y|x)log(P(y|x)/P(y))
    e = np.sum(e, axis=1) # KL(x) = Σ_y p(y|x)log(P(y|x)/P(y))
    e = np.mean(e, axis=0)
    return np.exp(e) # Inception score

def mnist_inception_score(n_train):
    (x_train, y_train), (x_val, y_val) = mnist.load_data()
    x_train = np.expand_dims(x_train, axis=3) # shape=(60000, 28, 28) --> (60000, 28, 28, 1)
    x_train = np.tile(x_train, (1, 1, 1, 3)) # shape=(60000, 28, 28, 1) --> (60000, 28, 28, 3)
    return inception_score(x_train[0:n_train, :, :, :], resize_mnist)

def image_inception_score(globfile):
    files = glob.glob(globfile)
    xs = None
    for f in files:
        img = image.load_img(f, target_size=(299, 299))
        x = image.img_to_array(img) # x.shape=(299, 299, 3)
        x = np.expand_dims(x, axis=0) # Add an axis of batch-size. x.shape=(1, 299, 299, 3)
        xs = x if xs is None else np.concatenate([xs, x], axis=0)
    return inception_score(xs, resize_do_nothing)

print("Inception score (MNIST, 32):", mnist_inception_score(32))
print("Inception score (MNIST, 320):", mnist_inception_score(320))
print("Inception score (Imagenet, n02066245)", image_inception_score("imagenet/*.jpg"))
print("Inception score (Imagenet, 9 categories)", image_inception_score("imagenet2/*.jpg"))

結果

MNISTとImagenetの一部の画像でInception scoreを計算したところ、

Image set	# of images	Inception score
MNIST train	32	1.97646
MNIST train	320	2.30575
Imagenet (n02066245)	32	1.57115
Imagenet (9 classes)	32	8.03765

のような結果が得られました。MNIST trainはMNISTの学習用データの先頭から32個または320個とってきて計算した結果です。動物やら人工物やらの画像から学習したInceptionモデルを使っているので、手書き数字の画像を入れるとInception scoreそのものは低い値になりますが、利用する画像を増やすと、手書き数字を認識した結果のラベル\(y\)がばらつくためか、Inception scoreは大きくなることが分かります。

Imagenetの画像を使ってクラスのばらつきを変化させて測定してみると、確かに、同じクラス(n02066245)に属する画像のみを入れる場合に比べて、複数のクラスの画像(今回は9クラス)を入れるほうがInception scoreが大きくなることが分かります。n02066245で使った画像は、

です。9クラスで使った画像は

です。

まとめ

十分にテストをしたわけではありませんが、Inception scoreの傾向は分かりました。画像の評価に限定されますが、GANの評価に使えそうです。

参考文献

[1] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford and Xi Chen, "Improved Techniques for Training GANs," NIPS2016, pp. 2234-2242, 2016, http://papers.nips.cc/paper/6124-improved-techniques-for-training-gans
[2] https://github.com/openai/improved-gan
[3] https://github.com/hvy/chainer-inception-score
[4] https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

bluewidz nota

2017/12/10

Inception score

はじめに

Inception score

コード

結果

まとめ

参考文献

0 件のコメント :

カテゴリ

エントリ一覧

Links

自己紹介