Optimizing Global Contrastive Loss with Small Batch Size (SogCLR)
Author: Zhuoning Yuan, Tianbao Yang
Introduction
In this tutorial, you will learn how to train a self-supervised model by optimizing Global Contrastive Loss (GCLoss) on CIFAR10/CIFAR100. This version was implementated in PyTorch based on moco’s codebase. It is recommended to run this notebook on a GPU-enabled environment, e.g., Google Colab. For training ImageNet-1K, please refer to this Github repo.
Reference
If you find this tutorial helpful in your work, please cite our library paper and the following papers:
@inproceedings{yuan2022provable,
title={Provable stochastic optimization for global contrastive learning: Small batch does not harm performance},
author={Yuan, Zhuoning and Wu, Yuexin and Qiu, Zi-Hao and Du, Xianzhi and Zhang, Lijun and Zhou, Denny and Yang, Tianbao},
booktitle={International Conference on Machine Learning},
pages={25760--25782},
year={2022},
organization={PMLR}
}
Install LibAUC
Let’s start with install our library here. In this tutorial, we will use the lastest version for LibAUC by using pip install -U
.
!pip install -U libauc
Importing LibAUC
Importing related packages
import libauc
from libauc.models import resnet50, resnet18
from libauc.datasets import CIFAR100
from libauc.optimizers import SogCLR
from libauc.losses import GCLoss
import torch
import torchvision.transforms as transforms
import torch.nn as nn
import numpy as np
import os,math,shutil
Reproducibility
The following function set_all_seeds
limits the number of sources
of randomness behaviors, such as model intialization, data shuffling,
etcs. However, completely reproducible results are not guaranteed
across PyTorch releases [Ref].
def set_all_seeds(SEED):
# REPRODUCIBILITY
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Global Contrastive Loss
Global Contrastive Loss (GCLoss) aims to maximize the similarity between an anchor image \(\mathbf{x}_i\) and its corresponding positive image \(\mathbf{x}_i^{+}\), while minimizing the similarity between the anchor and a set of negative samples \(\mathbf{S}_i^{-}\). For GCLoss, negative samples are from full data instead of mini-batch data. For more details about the formulation of GCL, please refer to the SogCLR paper.
Hyper-parameters
# model: non-linear projection layer
num_proj_layers=2
dim=256
mlp_dim=2048
# dataset: cifar100
data_name = 'cifar100'
batch_size = 256
# optimizer
weight_decay = 1e-6
init_lr=0.075
epochs = 200
warmup_epochs = 10
# dynamtic loss
gamma = 0.9
temperature = 0.5
# path
logdir = './logs/'
logname = 'resnet18_cifar100'
os.makedirs(os.path.join(logdir, logname), exist_ok=True)
Dataset Pipeline for Contrastive Learning
The dataset pipeline presented here is different from standard
pipelines. Firstly, TwoCropsTransform
generates two random
augmented crops of a single image to construct pairwise samples, as
opposed to just one random crop in standard pipeline. Secondly, the
augmentation
follows SimCLR’s implementation. Lastly,
libauc.datasets.CIFAR100
returns the index of the image along
with the image and its label.
class TwoCropsTransform:
"""Take two random crops of one image."""
def __init__(self, base_transform1, base_transform2):
self.base_transform1 = base_transform1
self.base_transform2 = base_transform2
def __call__(self, x):
im1 = self.base_transform1(x)
im2 = self.base_transform2(x)
return [im1, im2]
image_size = 32
mean = [0.4914, 0.4822, 0.4465]
std = [0.2470, 0.2435, 0.2616]
normalize = transforms.Normalize(mean=mean, std=std)
# SimCLR augmentations
augmentation = [
transforms.RandomResizedCrop(image_size, scale=(0.08, 1.)),
transforms.RandomApply([
transforms.ColorJitter(0.4, 0.4, 0.2, 0.1) # Not strengthened
], p=0.8),
transforms.RandomGrayscale(p=0.2),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
]
DATA_ROOT = './'
train_dataset = libauc.datasets.CIFAR100(
root=DATA_ROOT, train=True, download=True, return_index=True,
transform=TwoCropsTransform(
transforms.Compose(augmentation),
transforms.Compose(augmentation)
)
)
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=batch_size,
shuffle=True,
num_workers=4,
drop_last=True
)
Helper functions
def build_mlp(num_layers, input_dim, mlp_dim, output_dim, last_bn=True):
mlp = []
for l in range(num_layers):
dim1 = input_dim if l == 0 else mlp_dim
dim2 = output_dim if l == num_layers - 1 else mlp_dim
mlp.append(nn.Linear(dim1, dim2, bias=False))
if l < num_layers - 1:
mlp.append(nn.BatchNorm1d(dim2))
mlp.append(nn.ReLU(inplace=True))
elif last_bn:
# Follow SimCLR's design:
# https://github.com/google-research/simclr/blob/master/model_util.py#L157
# For simplicity, we further removed gamma in BN
mlp.append(nn.BatchNorm1d(dim2, affine=False))
return nn.Sequential(*mlp)
def adjust_learning_rate(optimizer, epoch, init_lr=0.075):
"""Decays the learning rate with half-cycle cosine after warmup."""
if epoch < warmup_epochs:
lr = init_lr * epoch / warmup_epochs
else:
lr = init_lr * 0.5 * (1. + math.cos(math.pi * (epoch - warmup_epochs) / (epochs - warmup_epochs)))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return lr
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
torch.save(state, filename)
if is_best:
shutil.copyfile(filename, 'model_best.pth.tar')
def train(train_loader, model, loss_fn, optimizer, epoch):
model.train()
iters_per_epoch = len(train_loader)
for i, (images, _, index) in enumerate(train_loader):
lr = adjust_learning_rate(optimizer, epoch + i / iters_per_epoch)
images[0] = images[0].cuda()
images[1] = images[1].cuda()
with torch.cuda.amp.autocast(True):
hidden1 = model(images[0])
hidden2 = model(images[1])
loss = loss_fn(hidden1, hidden2, index)
optimizer.zero_grad()
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
print(f'Epoch: {epoch}, Dynamtic Loss: {loss:.3f}')
Creating Model & Optimizer
set_all_seeds(123)
# resNet-18 + 2-layer non-linear layers
base_encoder = resnet18(
pretrained=False, last_activation=None, num_classes=128
)
hidden_dim = base_encoder.fc.weight.shape[1]
del base_encoder.fc # Remove original fc layer
base_encoder.fc = build_mlp(num_proj_layers, hidden_dim, mlp_dim, dim)
base_encoder.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
base_encoder.maxpool = nn.Identity()
model = base_encoder.cuda()
# square root lr scaling
lr = init_lr * math.sqrt(batch_size)
# LARS optimizer
optimizer = libauc.optimizers.SogCLR(
base_encoder.parameters(),
mode = 'lars',
lr=lr,
weight_decay=weight_decay,
momentum=0.9
)
# Global Contrastive Loss
loss_fn = GCLoss('unimodal', N=50000, tau=temperature, gamma=gamma, distributed=False)
Pretraining
# mixed precision training
scaler = torch.cuda.amp.GradScaler()
print ('Pretraining')
for epoch in range(epochs):
if epoch in [int(epochs)*0.5, int(epochs)*0.75]:
optimizer.update_regularizer()
# train for one epoch
train(train_loader, model, loss_fn, optimizer, epoch)
# save checkpoint
if epoch % 10 == 0 or epochs - epoch < 3:
save_checkpoint(
{'epoch': epoch + 1,
'arch': 'resnet18',
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
'scaler': scaler.state_dict(),
},
is_best=False,
filename=os.path.join(logdir, logname, f'checkpoint_{epoch:04d}.pth.tar'))
Pretraining
Epoch: 0, Dynamtic Loss: -0.236
Epoch: 1, Dynamtic Loss: -0.328
Epoch: 2, Dynamtic Loss: -0.477
Epoch: 3, Dynamtic Loss: -0.501
Epoch: 4, Dynamtic Loss: -0.598
Epoch: 5, Dynamtic Loss: -0.656
Epoch: 6, Dynamtic Loss: -0.659
Epoch: 7, Dynamtic Loss: -0.710
Epoch: 8, Dynamtic Loss: -0.763
Epoch: 9, Dynamtic Loss: -0.810
Epoch: 10, Dynamtic Loss: -0.825
Epoch: 11, Dynamtic Loss: -0.877
Epoch: 12, Dynamtic Loss: -0.829
Epoch: 13, Dynamtic Loss: -0.858
Epoch: 14, Dynamtic Loss: -0.872
Epoch: 15, Dynamtic Loss: -0.889
Epoch: 16, Dynamtic Loss: -0.877
Epoch: 17, Dynamtic Loss: -0.911
Epoch: 18, Dynamtic Loss: -0.953
Epoch: 19, Dynamtic Loss: -0.950
Epoch: 20, Dynamtic Loss: -0.966
Epoch: 21, Dynamtic Loss: -0.917
Epoch: 22, Dynamtic Loss: -0.961
Epoch: 23, Dynamtic Loss: -1.001
Epoch: 24, Dynamtic Loss: -1.060
Epoch: 25, Dynamtic Loss: -0.989
Epoch: 26, Dynamtic Loss: -0.971
Epoch: 27, Dynamtic Loss: -1.061
Epoch: 28, Dynamtic Loss: -0.985
Epoch: 29, Dynamtic Loss: -1.012
Epoch: 30, Dynamtic Loss: -1.064
Epoch: 31, Dynamtic Loss: -1.031
Epoch: 32, Dynamtic Loss: -1.025
Epoch: 33, Dynamtic Loss: -1.081
Epoch: 34, Dynamtic Loss: -1.091
Epoch: 35, Dynamtic Loss: -1.084
Epoch: 36, Dynamtic Loss: -1.017
Epoch: 37, Dynamtic Loss: -1.061
Epoch: 38, Dynamtic Loss: -1.066
Epoch: 39, Dynamtic Loss: -1.018
Epoch: 40, Dynamtic Loss: -1.064
Epoch: 41, Dynamtic Loss: -1.041
Epoch: 42, Dynamtic Loss: -1.106
Epoch: 43, Dynamtic Loss: -1.067
Epoch: 44, Dynamtic Loss: -1.114
Epoch: 45, Dynamtic Loss: -1.066
Epoch: 46, Dynamtic Loss: -1.067
Epoch: 47, Dynamtic Loss: -1.136
Epoch: 48, Dynamtic Loss: -1.113
Epoch: 49, Dynamtic Loss: -1.116
Epoch: 50, Dynamtic Loss: -1.144
Epoch: 51, Dynamtic Loss: -1.170
Epoch: 52, Dynamtic Loss: -1.145
Epoch: 53, Dynamtic Loss: -1.157
Epoch: 54, Dynamtic Loss: -1.157
Epoch: 55, Dynamtic Loss: -1.162
Epoch: 56, Dynamtic Loss: -1.167
Epoch: 57, Dynamtic Loss: -1.158
Epoch: 58, Dynamtic Loss: -1.141
Epoch: 59, Dynamtic Loss: -1.218
Epoch: 60, Dynamtic Loss: -1.214
Epoch: 61, Dynamtic Loss: -1.173
Epoch: 62, Dynamtic Loss: -1.168
Epoch: 63, Dynamtic Loss: -1.124
Epoch: 64, Dynamtic Loss: -1.151
Epoch: 65, Dynamtic Loss: -1.123
Epoch: 66, Dynamtic Loss: -1.177
Epoch: 67, Dynamtic Loss: -1.158
Epoch: 68, Dynamtic Loss: -1.166
Epoch: 69, Dynamtic Loss: -1.160
Epoch: 70, Dynamtic Loss: -1.147
Epoch: 71, Dynamtic Loss: -1.140
Epoch: 72, Dynamtic Loss: -1.168
Epoch: 73, Dynamtic Loss: -1.207
Epoch: 74, Dynamtic Loss: -1.193
Epoch: 75, Dynamtic Loss: -1.238
Epoch: 76, Dynamtic Loss: -1.136
Epoch: 77, Dynamtic Loss: -1.173
Epoch: 78, Dynamtic Loss: -1.178
Epoch: 79, Dynamtic Loss: -1.169
Epoch: 80, Dynamtic Loss: -1.148
Epoch: 81, Dynamtic Loss: -1.258
Epoch: 82, Dynamtic Loss: -1.156
Epoch: 83, Dynamtic Loss: -1.141
Epoch: 84, Dynamtic Loss: -1.219
Epoch: 85, Dynamtic Loss: -1.215
Epoch: 86, Dynamtic Loss: -1.223
Epoch: 87, Dynamtic Loss: -1.153
Epoch: 88, Dynamtic Loss: -1.196
Epoch: 89, Dynamtic Loss: -1.201
Epoch: 90, Dynamtic Loss: -1.198
Epoch: 91, Dynamtic Loss: -1.200
Epoch: 92, Dynamtic Loss: -1.219
Epoch: 93, Dynamtic Loss: -1.243
Epoch: 94, Dynamtic Loss: -1.207
Epoch: 95, Dynamtic Loss: -1.255
Epoch: 96, Dynamtic Loss: -1.235
Epoch: 97, Dynamtic Loss: -1.231
Epoch: 98, Dynamtic Loss: -1.188
Epoch: 99, Dynamtic Loss: -1.223
Epoch: 100, Dynamtic Loss: -1.147
Epoch: 101, Dynamtic Loss: -1.275
Epoch: 102, Dynamtic Loss: -1.248
Epoch: 103, Dynamtic Loss: -1.203
Epoch: 104, Dynamtic Loss: -1.259
Epoch: 105, Dynamtic Loss: -1.168
Epoch: 106, Dynamtic Loss: -1.216
Epoch: 107, Dynamtic Loss: -1.274
Epoch: 108, Dynamtic Loss: -1.196
Epoch: 109, Dynamtic Loss: -1.253
Epoch: 110, Dynamtic Loss: -1.249
Epoch: 111, Dynamtic Loss: -1.230
Epoch: 112, Dynamtic Loss: -1.183
Epoch: 113, Dynamtic Loss: -1.267
Epoch: 114, Dynamtic Loss: -1.194
Epoch: 115, Dynamtic Loss: -1.223
Epoch: 116, Dynamtic Loss: -1.209
Epoch: 117, Dynamtic Loss: -1.214
Epoch: 118, Dynamtic Loss: -1.197
Epoch: 119, Dynamtic Loss: -1.265
Epoch: 120, Dynamtic Loss: -1.245
Epoch: 121, Dynamtic Loss: -1.196
Epoch: 122, Dynamtic Loss: -1.228
Epoch: 123, Dynamtic Loss: -1.262
Epoch: 124, Dynamtic Loss: -1.247
Epoch: 125, Dynamtic Loss: -1.224
Epoch: 126, Dynamtic Loss: -1.242
Epoch: 127, Dynamtic Loss: -1.261
Epoch: 128, Dynamtic Loss: -1.268
Epoch: 129, Dynamtic Loss: -1.240
Epoch: 130, Dynamtic Loss: -1.272
Epoch: 131, Dynamtic Loss: -1.245
Epoch: 132, Dynamtic Loss: -1.259
Epoch: 133, Dynamtic Loss: -1.245
Epoch: 134, Dynamtic Loss: -1.292
Epoch: 135, Dynamtic Loss: -1.231
Epoch: 136, Dynamtic Loss: -1.212
Epoch: 137, Dynamtic Loss: -1.250
Epoch: 138, Dynamtic Loss: -1.246
Epoch: 139, Dynamtic Loss: -1.209
Epoch: 140, Dynamtic Loss: -1.250
Epoch: 141, Dynamtic Loss: -1.269
Epoch: 142, Dynamtic Loss: -1.281
Epoch: 143, Dynamtic Loss: -1.270
Epoch: 144, Dynamtic Loss: -1.310
Epoch: 145, Dynamtic Loss: -1.258
Epoch: 146, Dynamtic Loss: -1.290
Epoch: 147, Dynamtic Loss: -1.287
Epoch: 148, Dynamtic Loss: -1.276
Epoch: 149, Dynamtic Loss: -1.226
Epoch: 150, Dynamtic Loss: -1.221
Epoch: 151, Dynamtic Loss: -1.225
Epoch: 152, Dynamtic Loss: -1.251
Epoch: 153, Dynamtic Loss: -1.234
Epoch: 154, Dynamtic Loss: -1.233
Epoch: 155, Dynamtic Loss: -1.251
Epoch: 156, Dynamtic Loss: -1.234
Epoch: 157, Dynamtic Loss: -1.202
Epoch: 158, Dynamtic Loss: -1.307
Epoch: 159, Dynamtic Loss: -1.267
Epoch: 160, Dynamtic Loss: -1.275
Epoch: 161, Dynamtic Loss: -1.284
Epoch: 162, Dynamtic Loss: -1.263
Epoch: 163, Dynamtic Loss: -1.291
Epoch: 164, Dynamtic Loss: -1.232
Epoch: 165, Dynamtic Loss: -1.262
Epoch: 166, Dynamtic Loss: -1.263
Epoch: 167, Dynamtic Loss: -1.247
Epoch: 168, Dynamtic Loss: -1.294
Epoch: 169, Dynamtic Loss: -1.249
Epoch: 170, Dynamtic Loss: -1.223
Epoch: 171, Dynamtic Loss: -1.269
Epoch: 172, Dynamtic Loss: -1.246
Epoch: 173, Dynamtic Loss: -1.248
Epoch: 174, Dynamtic Loss: -1.237
Epoch: 175, Dynamtic Loss: -1.269
Epoch: 176, Dynamtic Loss: -1.277
Epoch: 177, Dynamtic Loss: -1.276
Epoch: 178, Dynamtic Loss: -1.280
Epoch: 179, Dynamtic Loss: -1.272
Epoch: 180, Dynamtic Loss: -1.239
Epoch: 181, Dynamtic Loss: -1.265
Epoch: 182, Dynamtic Loss: -1.249
Epoch: 183, Dynamtic Loss: -1.231
Epoch: 184, Dynamtic Loss: -1.245
Epoch: 185, Dynamtic Loss: -1.295
Epoch: 186, Dynamtic Loss: -1.210
Epoch: 187, Dynamtic Loss: -1.250
Epoch: 188, Dynamtic Loss: -1.253
Epoch: 189, Dynamtic Loss: -1.295
Epoch: 190, Dynamtic Loss: -1.254
Epoch: 191, Dynamtic Loss: -1.270
Epoch: 192, Dynamtic Loss: -1.280
Epoch: 193, Dynamtic Loss: -1.245
Epoch: 194, Dynamtic Loss: -1.272
Epoch: 195, Dynamtic Loss: -1.272
Epoch: 196, Dynamtic Loss: -1.255
Epoch: 197, Dynamtic Loss: -1.214
Epoch: 198, Dynamtic Loss: -1.223
Linear Evaluation
By default, we use momentum-SGD without weight decay and a batch size of 1024 for linear classification on frozen features/weights. In this stage, it runs 90 epochs.
Configurations
# dataset
image_size = 32
batch_size = 1024
num_classes = 100 # cifar100
# optimizer
epochs = 90
init_lr = 0.075
weight_decay = 0
# checkpoint
checkpoint_dir = '/content/logs/resnet18_cifar100/checkpoint_0199.pth.tar'
Dataset pipeline
mean = [0.4914, 0.4822, 0.4465]
std = [0.2470, 0.2435, 0.2616]
normalize = transforms.Normalize(mean=mean, std=std)
train_dataset = libauc.datasets.CIFAR100(root=DATA_ROOT, train=True, download=True,
transform=transforms.Compose([transforms.RandomResizedCrop(32),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,]))
val_dataset = libauc.datasets.CIFAR100(root=DATA_ROOT, train=False,download=True,
transform=transforms.Compose([transforms.ToTensor(),
normalize,]))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=4)
Helper functions
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
def train(train_loader, model, criterion, optimizer, epoch):
model.eval()
for i, (images, target) in enumerate(train_loader):
images = images.float().cuda()
target = target.long().cuda()
output = model(images)
loss = criterion(output, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def validate(val_loader, model, criterion):
model.eval()
acc1_list = []
acc5_list = []
with torch.no_grad():
for i, (images, target) in enumerate(val_loader):
images = images.float().cuda()
target = target.long().cuda()
output = model(images)
acc1, acc5 = accuracy(output, target, topk=(1, 5))
acc1_list.append(acc1)
acc5_list.append(acc5)
acc1_array = torch.stack(acc1_list)
acc5_array = torch.stack(acc5_list)
return torch.mean(acc1_array), torch.mean(acc5_array)
Define model
set_all_seeds(123)
# ResNet-18 + classification layer
model = resnet18(pretrained=False, last_activation=None, num_classes=128)
hidden_dim = model.fc.weight.shape[1]
del model.fc
model.fc = nn.Linear(hidden_dim, num_classes, bias=True)
# cifar head for resnet18
model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
model.maxpool = nn.Identity()
# load pretrained checkpoint excluding non-linear layers
linear_keyword = 'fc'
checkpoint = torch.load(checkpoint_dir, map_location="cpu")
state_dict = checkpoint['state_dict']
for k in list(state_dict.keys()):
if linear_keyword in k:
del state_dict[k]
msg = model.load_state_dict(state_dict, strict=False)
print ('Linear Classifier Variables: %s'%(msg.missing_keys))
# cuda
model = model.cuda()
# freeze all layers but the last fc
for name, param in model.named_parameters():
if name not in ['%s.weight' % linear_keyword, '%s.bias' % linear_keyword]:
param.requires_grad = False
# init the fc layer
getattr(model, linear_keyword).weight.data.normal_(mean=0.0, std=0.01)
getattr(model, linear_keyword).bias.data.zero_()
# optimize only the linear classifier
parameters = list(filter(lambda p: p.requires_grad, model.parameters()))
assert len(parameters) == 2 # weight, bias
Define loss & optimizer
# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda()
# linear lr scaling
lr = init_lr * batch_size / 256
optimizer = torch.optim.SGD(parameters,
lr=lr,
momentum=0.9,
weight_decay=weight_decay)
Training
# linear evaluation
print ('Linear Evaluation')
for epoch in range(epochs):
adjust_learning_rate(optimizer, epoch)
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch)
# evaluate on validation set
acc1, acc5 = validate(val_loader, model, criterion)
# log
print ('Epoch: %s, Top1: %.2f, Top5: %.2f'%(epoch, acc1, acc5))
Linear Evaluation
Epoch: 0, Top1: 0.96, Top5: 4.97
Epoch: 1, Top1: 20.99, Top5: 49.68
Epoch: 2, Top1: 24.33, Top5: 55.29
Epoch: 3, Top1: 27.16, Top5: 57.75
Epoch: 4, Top1: 27.86, Top5: 58.76
Epoch: 5, Top1: 28.48, Top5: 59.93
Epoch: 6, Top1: 30.20, Top5: 60.94
Epoch: 7, Top1: 30.52, Top5: 61.14
Epoch: 8, Top1: 30.98, Top5: 62.21
Epoch: 9, Top1: 30.89, Top5: 61.89
Epoch: 10, Top1: 29.54, Top5: 61.77
Epoch: 11, Top1: 32.11, Top5: 62.98
Epoch: 12, Top1: 32.00, Top5: 63.58
Epoch: 13, Top1: 32.50, Top5: 63.85
Epoch: 14, Top1: 33.26, Top5: 64.69
Epoch: 15, Top1: 33.51, Top5: 63.77
Epoch: 16, Top1: 34.15, Top5: 65.22
Epoch: 17, Top1: 33.86, Top5: 65.36
Epoch: 18, Top1: 34.44, Top5: 65.39
Epoch: 19, Top1: 34.77, Top5: 65.29
Epoch: 20, Top1: 34.73, Top5: 65.28
Epoch: 21, Top1: 34.36, Top5: 65.83
Epoch: 22, Top1: 34.38, Top5: 65.33
Epoch: 23, Top1: 35.31, Top5: 66.06
Epoch: 24, Top1: 35.66, Top5: 66.81
Epoch: 25, Top1: 34.97, Top5: 66.09
Epoch: 26, Top1: 35.37, Top5: 65.72
Epoch: 27, Top1: 35.36, Top5: 66.16
Epoch: 28, Top1: 34.84, Top5: 66.54
Epoch: 29, Top1: 36.12, Top5: 67.74
Epoch: 30, Top1: 35.96, Top5: 66.93
Epoch: 31, Top1: 36.52, Top5: 67.68
Epoch: 32, Top1: 35.97, Top5: 67.71
Epoch: 33, Top1: 36.28, Top5: 67.32
Epoch: 34, Top1: 36.76, Top5: 68.12
Epoch: 35, Top1: 37.07, Top5: 67.98
Epoch: 36, Top1: 36.24, Top5: 67.98
Epoch: 37, Top1: 36.96, Top5: 68.05
Epoch: 38, Top1: 37.35, Top5: 68.35
Epoch: 39, Top1: 36.74, Top5: 67.76
Epoch: 40, Top1: 36.66, Top5: 68.42
Epoch: 41, Top1: 37.10, Top5: 68.27
Epoch: 42, Top1: 37.07, Top5: 68.10
Epoch: 43, Top1: 36.88, Top5: 68.06
Epoch: 44, Top1: 37.02, Top5: 68.37
Epoch: 45, Top1: 37.77, Top5: 68.60
Epoch: 46, Top1: 36.60, Top5: 68.25
Epoch: 47, Top1: 38.02, Top5: 69.02
Epoch: 48, Top1: 37.70, Top5: 68.90
Epoch: 49, Top1: 37.63, Top5: 68.79
Epoch: 50, Top1: 38.32, Top5: 69.08
Epoch: 51, Top1: 37.73, Top5: 68.67
Epoch: 52, Top1: 37.85, Top5: 68.89
Epoch: 53, Top1: 38.23, Top5: 68.94
Epoch: 54, Top1: 38.30, Top5: 69.64
Epoch: 55, Top1: 38.25, Top5: 69.08
Epoch: 56, Top1: 37.90, Top5: 68.93
Epoch: 57, Top1: 38.46, Top5: 69.23
Epoch: 58, Top1: 37.52, Top5: 68.67
Epoch: 59, Top1: 38.27, Top5: 69.14
Epoch: 60, Top1: 38.44, Top5: 69.21
Epoch: 61, Top1: 38.41, Top5: 69.04
Epoch: 62, Top1: 38.12, Top5: 69.31
Epoch: 63, Top1: 39.28, Top5: 70.19
Epoch: 64, Top1: 38.58, Top5: 69.84
Epoch: 65, Top1: 38.83, Top5: 69.46
Epoch: 66, Top1: 38.70, Top5: 69.76
Epoch: 67, Top1: 38.69, Top5: 69.71
Epoch: 68, Top1: 39.02, Top5: 70.00
Epoch: 69, Top1: 39.21, Top5: 69.78
Epoch: 70, Top1: 39.24, Top5: 69.89
Epoch: 71, Top1: 39.31, Top5: 69.94
Epoch: 72, Top1: 38.97, Top5: 69.87
Epoch: 73, Top1: 38.99, Top5: 69.83
Epoch: 74, Top1: 38.97, Top5: 69.98
Epoch: 75, Top1: 39.32, Top5: 70.14
Epoch: 76, Top1: 39.36, Top5: 69.80
Epoch: 77, Top1: 39.42, Top5: 70.01
Epoch: 78, Top1: 39.29, Top5: 69.92
Epoch: 79, Top1: 39.24, Top5: 69.95
Epoch: 80, Top1: 39.24, Top5: 70.05
Epoch: 81, Top1: 39.27, Top5: 70.05
Epoch: 82, Top1: 39.26, Top5: 70.04
Epoch: 83, Top1: 39.22, Top5: 70.15
Epoch: 84, Top1: 39.44, Top5: 70.21
Epoch: 85, Top1: 39.28, Top5: 70.24
Epoch: 86, Top1: 39.33, Top5: 70.19
Epoch: 87, Top1: 39.29, Top5: 70.18
Epoch: 88, Top1: 39.26, Top5: 70.19
Epoch: 89, Top1: 39.25, Top5: 70.20