在20.1 節中,我們介紹了 GAN 工作原理背后的基本思想。我們展示了他們可以從一些簡單的、易于采樣的分布中抽取樣本,比如均勻分布或正態分布,并將它們轉換成看起來與某些數據集的分布相匹配的樣本。雖然我們匹配 2D 高斯分布的示例說明了要點,但它并不是特別令人興奮。
在本節中,我們將演示如何使用 GAN 生成逼真的圖像。我們的模型將基于 Radford等人介紹的深度卷積 GAN (DCGAN)。(2015 年)。我們將借用已經證明在判別計算機視覺問題上非常成功的卷積架構,并展示如何通過 GAN 來利用它們來生成逼真的圖像。
import tensorflow as tf
from d2l import tensorflow as d2l
20.2.1。口袋妖怪數據集
我們將使用的數據集是從pokemondb獲得的 Pokemon 精靈的集合 。首先下載、提取和加載此數據集。
Downloading ../data/pokemon.zip from http://d2l-data.s3-accelerate.amazonaws.com/pokemon.zip...
Downloading ../data/pokemon.zip from http://d2l-data.s3-accelerate.amazonaws.com/pokemon.zip...
Downloading ../data/pokemon.zip from http://d2l-data.s3-accelerate.amazonaws.com/pokemon.zip...
Found 40597 files belonging to 721 classes.
我們將每個圖像調整為64×64. 變換ToTensor
會將像素值投影到[0,1],而我們的生成器將使用 tanh 函數獲取輸出 [?1,1]. 因此我們用0.5意味著和0.5標準偏差以匹配值范圍。
batch_size = 256
transformer = torchvision.transforms.Compose([
torchvision.transforms.Resize((64, 64)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(0.5, 0.5)
])
pokemon.transform = transformer
data_iter = torch.utils.data.DataLoader(
pokemon, batch_size=batch_size,
shuffle=True, num_workers=d2l.get_dataloader_workers())
batch_size = 256
transformer = gluon.data.vision.transforms.Compose([
gluon.data.vision.transforms.Resize(64),
gluon.data.vision.transforms.ToTensor(),
gluon.data.vision.transforms.Normalize(0.5, 0.5)
])
data_iter = gluon.data.DataLoader(
pokemon.transform_first(transformer), batch_size=batch_size,
shuffle=True, num_workers=d2l.get_dataloader_workers())
def transform_func(X):
X = X / 255.
X = (X - 0.5) / (0.5)
return X
# For TF>=2.4 use `num_parallel_calls = tf.data.AUTOTUNE`
data_iter = pokemon.map(lambda x, y: (transform_func(x), y),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
data_iter = data_iter.cache().shuffle(buffer_size=1000).prefetch(
buffer_size=tf.data.experimental.AUTOTUNE)
WARNING:tensorflow:From /home/d2l-worker/miniconda3/envs/d2l-en-release-1/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
讓我們想象一下前 20 張圖像。
20.2.2。發電機
生成器需要映射噪聲變量 z∈Rd, 長度-d向量,到具有寬度和高度的 RGB 圖像64×64. 在 14.11 節中我們介紹了全卷積網絡,它使用轉置卷積層(參考 14.10 節)來擴大輸入尺寸。生成器的基本塊包含一個轉置卷積層,然后是批量歸一化和 ReLU 激活。
class G_block(nn.Module):
def __init__(self, out_channels, in_channels=3, kernel_size=4, strides=2,
padding=1, **kwargs):
super(G_block, self).__init__(**kwargs)
self.conv2d_trans = nn.ConvTranspose2d(in_channels, out_channels,
kernel_size, strides, padding, bias=False)
self.batch_norm = nn.BatchNorm2d(out_channels)
self.activation = nn.ReLU()
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d_trans(X)))
class G_block(nn.Block):
def __init__(self, channels, kernel_size=4,
strides=2, padding=1, **kwargs):
super(G_block, self).__init__(**kwargs)
self.conv2d_trans = nn.Conv2DTranspose(
channels, kernel_size, strides, padding, use_bias=False)
self.batch_norm = nn.BatchNorm()
self.activation = nn.Activation('relu')
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d_trans(X)))
class G_block(tf.keras.layers.Layer):
def __init__(self, out_channels, kernel_size=4, strides=2, padding="same",
**kwargs):
super().__init__(**kwargs)
self.conv2d_trans = tf.keras.layers.Conv2DTranspose(
out_channels, kernel_size, strides, padding, use_bias=False)
self.batch_norm = tf.keras.layers.BatchNormalization()
self.activation = tf.keras.layers.ReLU()
def call(self, X):
return self.activation(self.batch_norm(self.conv2d_trans(X)))
默認情況下,轉置卷積層使用 kh=kw=4內核,一個sh=sw=2大步前進,一個 ph=pw=1填充。輸入形狀為 nh′×nw′=16×16,生成器塊將使輸入的寬度和高度加倍。
x = torch.zeros((2, 3, 16, 16))
g_blk = G_block(20)
g_blk(x).shape
torch.Size([2, 20, 32, 32])
如果將轉置卷積層更改為4×4 核心,1×1步幅和零填充。輸入大小為 1×1,輸出的寬度和高度將分別增加 3。
torch.Size([2, 20, 4, 4])
(2, 20, 4, 4)
生成器由四個基本塊組成,將輸入的寬度和高度從 1 增加到 32。同時,它首先將潛在變量投影到64×8通道,然后每次將通道減半。最后,使用轉置卷積層生成輸出。它進一步加倍寬度和高度以匹配所需的64×64形狀,并將通道尺寸減小到 3. tanh 激活函數用于將輸出值投影到(?1,1)范圍。
n_G = 64
net_G = nn.Sequential(
G_block(in_channels=100, out_channels=n_G*8,
strides=1, padding=0), # Output: (64 * 8, 4, 4)
G_block(in_channels=n_G*8, out_channels=n_G*4), # Output: (64 * 4, 8, 8)
G_block(in_channels=n_G*4, out_channels=n_G*2), # Output: (64 * 2, 16, 16)
G_block(in_channels=n_G*2, out_channels=n_G), # Output: (64, 32, 32)
nn.ConvTranspose2d(in_channels=n_G, out_channels=3,
kernel_size=4, stride=2, padding=1, bias=False),
nn.Tanh()) # Output: (3, 64, 64)
n_G = 64
net_G = nn.Sequential()
net_G.add(G_block(n_G*8, strides=1, padding=0), # Output: (64 * 8, 4, 4)
G_block(n_G*4), # Output: (64 * 4, 8, 8)
G_block(n_G*2), # Output: (64 * 2, 16, 16)
G_block(n_G), # Output: (64, 32, 32)
nn.Conv2DTranspose(
3, kernel_size=4, strides=2, padding=1, use_bias=False,
activation='tanh')) # Output: (3, 64, 64)
n_G = 64
net_G = tf.keras.Sequential([
# Output: (4, 4, 64 * 8)
G_block(out_channels=n_G*8, strides=1, padding="valid"),
G_block(out_channels=n_G*4), # Output: (8, 8, 64 * 4)
G_block(out_channels=n_G*2), # Output: (16, 16, 64 * 2)
G_block(out_channels=n_G), # Output: (32, 32, 64)
# Output: (64, 64, 3)
tf.keras.layers.Conv2DTranspose(
3, kernel_size=4, strides=2, padding="same", use_bias=False,
activation="tanh")
])
生成一個 100 維的潛在變量來驗證生成器的輸出形狀。
20.2.3。判別器
判別器是一個普通的卷積網絡,除了它使用一個 leaky ReLU 作為它的激活函數。鑒于 α∈[0,1], 它的定義是
可以看出,如果α=0,以及一個身份函數,如果α=1. 為了α∈(0,1),leaky ReLU 是一個非線性函數,它為負輸入提供非零輸出。它旨在解決“垂死的 ReLU”問題,即神經元可能始終輸出負值,因此無法取得任何進展,因為 ReLU 的梯度為 0。
判別器的基本塊是一個卷積層,然后是一個批量歸一化層和一個 leaky ReLU 激活。卷積層的超參數類似于生成器塊中的轉置卷積層。
class D_block(nn.Module):
def __init__(self, out_channels, in_channels=3, kernel_size=4, strides=2,
padding=1, alpha=0.2, **kwargs):
super(D_block, self).__init__(**kwargs)
self.conv2d = nn.Conv2d(in_channels, out_channels, kernel_size,
strides, padding, bias=False)
self.batch_norm = nn.BatchNorm2d(out_channels)
self.activation = nn.LeakyReLU(alpha, inplace=True)
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d(X)))
class D_block(nn.Block):
def __init__(self, channels, kernel_size=4, strides=2,
padding=1, alpha=0.2, **kwargs):
super(D_block, self).__init__(**kwargs)
self.conv2d = nn.Conv2D(
channels, kernel_size, strides, padding, use_bias=False)
self.batch_norm = nn.BatchNorm()
self.activation = nn.LeakyReLU(alpha)
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d(X)))
class D_block(tf.keras.layers.Layer):
def __init__(self, out_channels, kernel_size=4, strides=2, padding="same",
alpha=0.2, **kwargs):
super().__init__(**kwargs)
self.conv2d = tf.keras.layers.Conv2D(out_channels, kernel_size,
strides, padding, use_bias=False)
self.batch_norm = tf.keras.layers.BatchNormalization()
self.activation = tf.keras.layers.LeakyReLU(alpha)
def call(self, X):
return self.activation(self.batch_norm(self.conv2d(X)))
正如我們在第 7.3 節中演示的那樣,具有默認設置的基本塊會將輸入的寬度和高度減半。例如,給定一個輸入形狀nh=nw=16, 具有內核形狀 kh=kw=4, 步幅sh=sw=2和填充形狀ph=pw=1,輸出形狀將是:
鑒別器是生成器的鏡像。
n_D = 64
net_D = nn.Sequential(
D_block(n_D), # Output: (64, 32, 32)
D_block(in_channels=n_D, out_channels=n_D*2), # Output: (64 * 2, 16, 16)
D_block(in_channels=n_D*2, out_channels=n_D*4), # Output: (64 * 4, 8, 8)
D_block(in_channels=n_D*4, out_channels=n_D*8), # Output: (64 * 8, 4, 4)
nn.Conv2d(in_channels=n_D*8, out_channels=1,
kernel_size=4, bias=False)) # Output: (1, 1, 1)
n_D = 64
net_D = tf.keras.Sequential([
D_block(n_D), # Output: (32, 32, 64)
D_block(out_channels=n_D*2), # Output: (16, 16, 64 * 2)
D_block(out_channels=n_D*4), # Output: (8, 8, 64 * 4)
D_block(out_channels=n_D*8), # Outupt: (4, 4, 64 * 64)
# Output: (1, 1, 1)
tf.keras.layers.Conv2D(1, kernel_size=4, use_bias=False)
])
它使用帶有輸出通道的卷積層1作為獲得單個預測值的最后一層。
20.2.4。訓練
與第 20.1 節中的基本 GAN 相比,我們對生成器和鑒別器使用相同的學習率,因為它們彼此相似。此外,我們改變β1在 Adam 中(第 12.10 節)來自0.9到0.5. 它降低了動量的平滑度,即過去梯度的指數加權移動平均值,以處理快速變化的梯度,因為生成器和鑒別器相互爭斗。此外,隨機生成的噪聲Z
是一個 4-D 張量,我們正在使用 GPU 來加速計算。
def train(net_D, net_G, data_iter, num_epochs, lr, latent_dim,
device=d2l.try_gpu()):
loss = nn.BCEWithLogitsLoss(reduction='sum')
for w in net_D.parameters():
nn.init.normal_(w, 0, 0.02)
for w in net_G.parameters():
nn.init.normal_(w, 0, 0.02)
net_D, net_G = net_D.to(device), net_G.to(device)
trainer_hp = {'lr': lr, 'betas': [0.5,0.999]}
trainer_D = torch.optim.Adam(net_D.parameters(), **trainer_hp)
trainer_G = torch.optim.Adam(net_G.parameters(), **trainer_hp)
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
legend=['discriminator', 'generator'])
animator.fig.subplots_adjust(hspace=0.3)
for epoch in range(1, num_epochs + 1):
# Train one epoch
timer = d2l.Timer()
metric = d2l.Accumulator(3) # loss_D, loss_G, num_examples
for X, _ in data_iter:
batch_size = X.shape[0]
Z = torch.normal(0, 1, size=(batch_size, latent_dim, 1, 1))
X, Z = X.to(device), Z.to(device)
metric.add(d2l.update_D(X, Z, net_D, net_G, loss, trainer_D),
d2l.update_G(Z, net_D, net_G, loss, trainer_G),
batch_size)
# Show generated examples
Z = torch.normal(0, 1, size=(21, latent_dim, 1, 1), device=device)
# Normalize the synthetic data to N(0, 1)
fake_x = net_G(Z).permute(0, 2, 3, 1) / 2 + 0.5
imgs = torch.cat(
[torch.cat([
fake_x[i * 7 + j].cpu().detach() for j in range(7)], dim=1)
for i in range(len(fake_x)//7)], dim=0)
animator.axes[1].cla()
animator.axes[1].imshow(imgs)
# Show the losses
loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
animator.add(epoch, (loss_D, loss_G))
print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
f'{metric[2] / timer.stop():.1f} examples/sec on {str(device)}')
def train(net_D, net_G, data_iter, num_epochs, lr, latent_dim,
device=d2l.try_gpu()):
loss = gluon.loss.SigmoidBCELoss()
net_D.initialize(init=init.Normal(0.02), force_reinit=True, ctx=device)
net_G.initialize(init=init.Normal(0.02), force_reinit=True, ctx=device)
trainer_hp = {'learning_rate': lr, 'beta1': 0.5}
trainer_D = gluon.Trainer(net_D.collect_params(), 'adam', trainer_hp)
trainer_G = gluon.Trainer(net_G.collect_params(), 'adam', trainer_hp)
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
legend=['discriminator', 'generator'])
animator.fig.subplots_adjust(hspace=0.3)
for epoch in range(1, num_epochs + 1):
# Train one epoch
timer = d2l.Timer()
metric = d2l.Accumulator(3) # loss_D, loss_G, num_examples
for X, _ in data_iter:
batch_size = X.shape[0]
Z = np.random.normal(0, 1, size=(batch_size, latent_dim, 1, 1))
X, Z = X.as_in_ctx(device), Z.as_in_ctx(device),
metric.add(d2l.update_D(X, Z, net_D, net_G, loss, trainer_D),
d2l.update_G(Z, net_D, net_G, loss, trainer_G),
batch_size)
# Show generated examples
Z = np.random.normal(0, 1, size=(21, latent_dim, 1, 1), ctx=device)
# Normalize the synthetic data to N(0, 1)
fake_x = net_G(Z).transpose(0, 2, 3, 1) / 2 + 0.5
imgs = np.concatenate(
[np.concatenate([fake_x[i * 7 + j] for j in range(7)], axis=1)
for i in range(len(fake_x)//7)], axis=0)
animator.axes[1].cla()
animator.axes[1].imshow(imgs.asnumpy())
# Show the losses
loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
animator.add(epoch, (loss_D, loss_G))
print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
f'{metric[2] / timer.stop():.1f} examples/sec on {str(device)}')
def train(net_D, net_G, data_iter, num_epochs, lr, latent_dim,
device=d2l.try_gpu()):
loss = tf.keras.losses.BinaryCrossentropy(
from_logits=True, reduction=tf.keras.losses.Reduction.SUM)
for w in net_D.trainable_variables:
w.assign(tf.random.normal(mean=0, stddev=0.02, shape=w.shape))
for w in net_G.trainable_variables:
w.assign(tf.random.normal(mean=0, stddev=0.02, shape=w.shape))
optimizer_hp = {"lr": lr, "beta_1": 0.5, "beta_2": 0.999}
optimizer_D = tf.keras.optimizers.Adam(**optimizer_hp)
optimizer_G = tf.keras.optimizers.Adam(**optimizer_hp)
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
legend=['discriminator', 'generator'])
animator.fig.subplots_adjust(hspace=0.3)
for epoch in range(1, num_epochs + 1):
# Train one epoch
timer = d2l.Timer()
metric = d2l.Accumulator(3) # loss_D, loss_G, num_examples
for X, _ in data_iter:
batch_size = X.shape[0]
Z = tf.random.normal(mean=0, stddev=1,
shape=(batch_size, 1, 1, latent_dim))
metric.add(d2l.update_D(X, Z, net_D, net_G, loss, optimizer_D),
d2l.update_G(Z, net_D, net_G, loss, optimizer_G),
batch_size)
# Show generated examples
Z = tf.random.normal(mean=0, stddev=1, shape=(21, 1, 1, latent_dim))
# Normalize the synthetic data to N(0, 1)
fake_x = net_G(Z) / 2 + 0.5
imgs = tf.concat([tf.concat([fake_x[i * 7 + j] for j in range(7)],
axis=1)
for i in range(len(fake_x) // 7)], axis=0)
animator.axes[1].cla()
animator.axes[1].imshow(imgs)
# Show the losses
loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
animator.add(epoch, (loss_D, loss_G))
print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
f'{metric[2] / timer.stop():.1f} examples/sec on {str(device._device_name)}')
我們用少量的 epochs 訓練模型只是為了演示。為了獲得更好的性能,可以將變量num_epochs
設置為更大的數字。
loss_D 0.030, loss_G 7.203, 1026.4 examples/sec on cuda:0
loss_D 0.224, loss_G 6.386, 2260.7 examples/sec on gpu(0)
20.2.5。概括
-
DCGAN 架構有四個用于鑒別器的卷積層和四個用于生成器的“分數步”卷積層。
-
鑒別器是一個 4 層跨步卷積,具有批量歸一化(除了它的輸入層)和 leaky ReLU 激活。
-
Leaky ReLU 是一種非線性函數,可為負輸入提供非零輸出。它旨在解決“垂死的 ReLU”問題,并幫助梯度更容易地通過架構。
20.2.6. 練習
-
如果我們使用標準 ReLU 激活而不是 leaky ReLU 會發生什么?
-
在 Fashion-MNIST 上應用 DCGAN,看看哪個類別效果好,哪個效果不好。
評論