はじめに

Fréchet Inception Distance (FID)と呼ばれる、Generative Adversarial Network (GAN)が生成する画像の品質を評価する指標を試してみます[1]。この指標は、画像の集合間の距離を表します。前回試したInception Scoreは画像の集合そのものの良さを表すスコアでしたので１つ画像の集合を与えるだけで計算できましたが、FIDはそのようには計算できません。GANで再現したい真の分布から生成された画像の集合と、GANで再現した分布から生成した画像の集合との距離を計算することになります。距離が近ければ近いほど良い画像であると判断します。FIDは、Google Brainが実施したGANの大規模評価の評価指標にも用いられています[2]。

計算方法

FIDは、Inceptionモデルの途中の層の出力から得られるベクトル\(h\)を使ってFréchet Distance [3, 4]を計算することで求められます。Fréchet Distanceは曲線同士の距離のため、\(h\)のままでは距離を計算できません。そこで、画像から得られるベクトル\(h\)の分布が多変量正規分布(Multivariate normal distribution)に従うと仮定します。多変量正規分布は曲線なので、２つの多変量正規分布を求めると、その分布間のFréchet Distanceを計算できます。平均ベクトルと共分散行列が分かっている多変量正規分布間のFréchet Distanceは[5]で計算できます。

具体的な計算方法は[1]の著者らが[6]で公開しています。TensorFlowベースです。画像の集合を\(A\)、その要素を\(a \in A\)とし、Inceptionモデルの途中の層まで計算する関数を\(f_{\rm inception} : A \rightarrow H\)とします。\(H\)は\(h\)の集合です。まず、平均ベクトル\(\mu\)と共分散行列\(\Sigma\)を計算します。\(H\)の各要素は\(f_{\rm inception}\)で計算済みであるとしています。 \[ \mu = \frac{1}{|A|} \sum_{h \in H} h \] \[ \Sigma = \frac{1}{|A|-1} \sum_{h \in H} (h-\mu)(h-\mu)^{T} \] \(H\)から推定した分布間の距離を計算するので、[6]に倣って、\(\Sigma\)を不偏共分散行列として計算しています。少々距離が短くなりますが、標本共分散行列で計算してもGANで生成した画像の評価指標として使う分には特に問題ないでしょう。

2つの画像集合\(A_1\)と\(A_2\)の距離を計算したいので、それぞれの平均ベクトルを\(\mu_1, \mu_2\)、共分散行列を\(\Sigma_1, \Sigma_2\)とすると、Fréchet Distanceは \[ d^2 = |\mu_1-\mu_2|^2 + {\rm tr}\left (\Sigma_1 + \Sigma_2 - 2(\Sigma_1 \Sigma_2)^{\frac{1}{2}}\right )\] で計算できます。

[1]のp.7 L.13-14によると、\(h\)には最後のプーリング層を使っているとのことです。これは、Inception-v3の論文[7]のTable 1の下から3行目のpoolのことを指しています。この層の出力は1x1x2048なので、\(h\)は2048次元のベクトルということになります。[8]にも書いてありますが、画像のサンプル数が2048個より多くないと\(d^2\)が計算できないので、注意が必要です（ちょっと試すだけでも2000枚強の画像が必要とは、なんて面倒な指標なんだ！）。

実験結果

MNISTとImageNetの一部の画像を使ってFIDを計算します。MNISTは、訓練用と検証用ともに先頭3000枚を利用します。ImageNetは、10クラスからランダムに2956枚選んだ画像と、6クラスから順に2956枚に達するまで選んだ画像を利用します。

10クラスは、具体的には{n02066245, n02096294, n02100735, n02119789, n02123394, n02124075, n02125311, n02417914, n02423022, n02509815}です。6クラスは{n02066245, n02096294, n02100735, n02119789, n02123394, n02124075}です。

計算結果は下表の通りです。

\(A_1\)	\(A_2\)	FID (\(d^2\))
MNIST train	MNIST train	6.8959054997e-11
MNIST train	MNIST val	7.05136659744
MNIST val	MNIST train	7.0513665973
MNIST train	ImageNet 10 classes	338.586062646
MNIST train	ImageNet 6 classes	346.218188602
ImageNet 10 classes	ImageNet 6 classes	67.1025959331

同じ画像集合間の距離は計算誤差を無視すると0になっています(1行目)。MNISTのtrainとvalの距離は7.05です(2, 3行目)。MNISTとImageNetの距離はMNIST間の距離より遠く、約340です(4, 5行目)。クラス数が異なるImageNetの画像集合間の距離は67.1で、MNISTのtrainとvalの間の距離より遠くなっています(6行目)。

距離の大きさはともかく、その大小関係は期待通りになっています。具体的には、

同じ集合間の距離は0。
クラスが同じ画像集合間の距離は、クラスが多少異なる画像集合間より近い。
全く異なる画像集合間の距離は、クラスが多少異なる画像集合間の距離より遠い。

ということです。

コード

今回の実験に使ったコードは以下です。Kerasを使っています。実験の都合上、計算した\(H\)は一旦ファイルに保存するようにしています。

# -*- coding: utf-8 -*-
import os, glob
import glob
import numpy as np
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.applications.imagenet_utils import decode_predictions
from keras.preprocessing import image
from keras.datasets import mnist
from keras.models import Model
from PIL import Image as pil_image
from scipy.linalg import sqrtm

model = InceptionV3() # Load a model and its weights
model4fid = Model(inputs=model.input, outputs=model.get_layer("avg_pool").output)
def resize_mnist(x):
    x_list = []
    for i in range(x.shape[0]):
        img = image.array_to_img(x[i, :, :, :].reshape(28, 28, -1))
        #img.save("mnist-{0:03d}.png".format(i))
        img = img.resize(size=(299, 299), resample=pil_image.LANCZOS)
        x_list.append(image.img_to_array(img))
    return np.array(x_list)

def resize_do_nothing(x):
    return x

def frechet_distance(m1, c1, m2, c2):
    return np.sum((m1 - m2)**2) + np.trace(c1 + c2 - 2*(sqrtm(np.dot(c1, c2))))

def mean_cov(x):
    mean = np.mean(x, axis=0)
    sigma = np.cov(x, rowvar=False)
    return mean, sigma

def fid(h1, h2):
    m1, c1 = mean_cov(h1)
    m2, c2 = mean_cov(h2)
    return frechet_distance(m1, c1, m2, c2)

def calc_h(x, resizer, batch_size=8):
    r = None
    n_batch = (x.shape[0]+batch_size-1) // batch_size
    for j in range(n_batch):
        x_batch = resizer(x[j*batch_size:(j+1)*batch_size, :, :, :])
        r_batch = model4fid.predict(preprocess_input(x_batch))
        r = r_batch if r is None else np.concatenate([r, r_batch], axis=0)
        if j % 10 == 0:
            print("i =", j)
    return r

def mnist_h(n_train, n_val):
    x = [0, 0]; h = [0, 0]; n = [n_train, n_val]
    (x[0], _), (x[1], _) = mnist.load_data()
    for i in range(2):
        x[i] = np.expand_dims(x[i], axis=3) # shape=(60000, 28, 28) --> (60000, 28, 28, 1)
        x[i] = np.tile(x[i], (1, 1, 1, 3)) # shape=(60000, 28, 28, 1) --> (60000, 28, 28, 3)
        h[i] = calc_h(x[i][0:n[i], :, :, :], resize_mnist)
    return h[0], h[1]

def imagenet_h(files, batch_size=8):
    xs = []; hs = []
    for f in files:
        img = image.load_img(f, target_size=(299, 299))
        x = image.img_to_array(img) # x.shape=(299, 299, 3)
        xs.append(x)
        if len(xs) == batch_size:
            hs.append(calc_h(np.array(xs), resize_do_nothing))
            xs = []
    if len(xs) > 0:
        hs.append(calc_h(np.array(xs), resize_do_nothing))
    return np.concatenate(hs, axis=0)

# Calculate and save H of MNIST
h_train, h_val = mnist_h(3000, 3000)
np.save("mnist_h_train.npy", h_train)
np.save("mnist_h_val.npy", h_val)

# Calculate and save H of the part of Imagenet 
h_imagenet = imagenet_h(glob.glob("from_imagenet/*.jpg")) # 10 classes
h_imagenet_seq = imagenet_h(sorted(glob.glob("from_imagenet_seq/*.jpg"))[0:2956]) # 6 classes
np.save("imagenet_h.npy", h_imagenet)
np.save("imagenet_h_seq.npy", h_imagenet_seq)

# Load H and calculate FID
h_train = np.load("mnist_h_train.npy")
h_val = np.load("mnist_h_val.npy")
h_imagenet = np.load("imagenet_h.npy")
h_imagenet_seq = np.load("imagenet_h_seq.npy")
print("FID between MNIST train and val :", fid(h_train, h_val))
print("FID between MNIST val and train :", fid(h_val, h_train))
print("FID between MNIST train and train :", fid(h_train, h_train))
print("FID between MNIST train and imagenet :", fid(h_train, h_imagenet))
print("FID between MNIST train and imagenet_seq :", fid(h_train, h_imagenet_seq))
print("FID between imagenet and imagenet_seq :", fid(h_imagenet, h_imagenet_seq))

参考

[1] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter, "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium," arXiv:1706.08500, 2017, https://arxiv.org/abs/1706.08500
[2] Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet, "Are GANs Created Equal? A Large-Scale Study," arXiv:1711.10337, 2017, https://arxiv.org/abs/1711.10337
[3] Fréchet, M. "Sur la distance de deux lois de probabilité," C. R. Acad. Sci. Paris, 244, 689-692, 1957 (内容を確認したわけではない)
[4] http://www.thothchildren.com/chapter/59b4f81975704408bd430061 (Fréchet Distanceの解説記事)
[5] D. C. Dowson and B. V. Landau, "The Fréchet Distance between Multivariate Normal Distributions," Journal of multivariate analysis, 12, 450-455, 1982, http://www.sciencedirect.com/science/article/pii/0047259X8290077X
[6] https://github.com/bioinf-jku/TTUR/blob/master/fid.py ([1]の著者らが作成したコード)
[7] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, "Rethinking the Inception Architecture for Computer Vision," arXiv:1512.00567, 2015, https://arxiv.org/abs/1512.00567

はじめに

２年以上前に流行っていた[1]らしいXGBoost[2]をいまさら試してみました。勾配ブースティング決定木(Gradient boosting decision tree)と呼ばれている手法です。手法自体の説明は探すと色々出てきますので、そちらを参照ください。今回は、回帰問題と識別問題を簡単なデータで試してみます。

インストール

基本的には[2] を参考にインストールします。今回はWindows+VC2015のコマンドラインツールがインストールされている環境下で試しました。ソースコードの入手までは出来ている状態で、まず、VS2015 x64 Native Tools Command Prompt を立ち上げ、次のコマンドを実行すると、xgboostをビルドできます。

vcvarsall amd64
cd (path to xgboost)
mkdir build64
cd build64
cmake .. -G "Visual Studio 14 2015 Win64"
cmake --build . --config Release

Python環境下で利用するために、

cd (path to xgboost)
cd python-package
python setup.py install

を実行します。これで、import xgboostが使えるようになります。

回帰

回帰には、XGBRegressorを利用します。実験データは正弦波にノイズを加えたものです。

コード

コードは次の通りです。

%matplotlib inline
import xgboost as xgb
import numpy as np
import matplotlib.pyplot as plt
from IPython import display
from IPython.display import HTML
x = []
y = []
for i in range(2):
    xi = np.linspace(0,1,1000) + np.random.normal(0,0.1,1000)
    yi = np.cos(xi*10) + np.random.normal(0,0.1,1000)
    xi = xi.reshape(-1, 1)
    yi = yi.reshape(-1, 1)
    x.append(xi)
    y.append(yi)

mod = xgb.XGBRegressor(learning_rate=0.1, max_depth=2, n_estimators=100)
mod.fit(x[0], y[0])
y_train_pred = mod.predict(x[0])
y_test_pred = mod.predict(x[1])
from sklearn.metrics import mean_squared_error
print('MSE train : {0:.3f}, test : {1:.3f}'.
      format(mean_squared_error(y[0], y_train_pred),
             mean_squared_error(y[1], y_test_pred)))

x_pred = np.linspace(-0.5,1.5,10000).reshape(-1, 1)
y_pred = mod.predict(x_pred)
#plt.scatter(x[0], y[0], s=1)
plt.scatter(x[1], y[1], s=1)
plt.plot(x_pred, y_pred, 'C3')
plt.show()
plt.close()

結果

Jupyter notebookで実行すると、以下のような結果が得られます。

MSE train : 0.009, test : 0.013

テストデータに対する回帰折れ線(?)は次のようになりました。ギザギザではありますが、中心付近を通過するように学習できていることが分かります。

識別

識別にはXGBClassifierを使います。データは、1個のガウス分布から1000個サンプリングした点をクラス-1、2個のガウス分布から500個ずつサンプリングした点をクラス1として作成しています。

コード

コードは次の通りです。

%matplotlib inline
import xgboost as xgb
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from IPython import display

# Gaussian distribution for class -1
mu = [0, 0]
sigma = [[30, 0], [0, 50]]
x1 = np.random.multivariate_normal(mu, sigma, 1000) # Train
x1t = np.random.multivariate_normal(mu, sigma, 1000) # Test

# 2 Gaussian distribution for class 1
mu = [5, 5]
sigma = [[5, 0], [0, 3]]
x2 = np.random.multivariate_normal(mu, sigma, 500) # Train
x2t = np.random.multivariate_normal(mu, sigma, 500) # Test

mu = [-10, 15]
sigma = [[3, 2], [2, 3]]
x2 = np.concatenate([x2, np.random.multivariate_normal(mu, sigma, 500)]) # Train
x2t = np.concatenate([x2t, np.random.multivariate_normal(mu, sigma, 500)]) # Test

# Make labels of train and test
y = np.concatenate([np.ones(1000)*-1, np.ones(1000)*1])

# classify by xgboost
mod = xgb.XGBClassifier(
        learning_rate=0.1,
        max_depth=2, # Maximum depth of each tree
        n_estimators=100) # The number of trees
x = np.concatenate([x1, x2])
xt = np.concatenate([x1t, x2t])
mod.fit(x, y)
y_pred = mod.predict(x)
yt_pred = mod.predict(xt)

from sklearn.metrics import accuracy_score
print('MSE train : {0:.3f}, test : {1:.3f}'.
      format(accuracy_score(y, y_pred), accuracy_score(y, yt_pred)))

# Show boundaries between class -1 and 1
gx1 = np.linspace(-20, 20, 1000)
gx2 = np.linspace(-20, 20, 1000)
gx1, gx2 = np.meshgrid(gx1, gx2)
y = mod.predict(np.array([gx1, gx2]).T.reshape(-1, 2))
plt.scatter(x1t[:,0], x1t[:,1], s=0.2)
plt.scatter(x2t[:,0], x2t[:,1], s=0.2)
cm = LinearSegmentedColormap.from_list('cm', [(0, 'blue'),(1, 'blue')])
plt.contour(gx1, gx2, y.reshape(1000, -1).T, [0], alpha=1, cmap=cm)
plt.show()

結果

Jupyter notebookで実行すると、以下のような結果が得られます。

MSE train : 0.926, test : 0.913

クラス間の識別境界(青線)と、テストデータのクラス-1(青点)と1(橙点)の分布は次の図のようになります。決定木ベースなのでギザギザですが、見た目では分割できているように見えます。

参考

[1] http://smrmkt.hatenablog.jp/entry/2015/04/28/210039
[2] http://xgboost.readthedocs.io/en/latest/build.html

bluewidz nota

2017/12/24

Fréchet Inception Distance

はじめに

計算方法

実験結果

コード

参考

2017/12/17

XGBoost

はじめに

インストール

回帰

コード

結果

識別

コード

結果

参考

カテゴリ

エントリ一覧

Links

自己紹介