Optimizing Global Contrastive Loss with Automatic Temperature Individualization (iSogCLR) 
================================================================================================================================

.. raw:: html

    <div style="display: flex; justify-content: space-between;">
      <div style="display: flex; align-items: center;">
        <a href="https://colab.research.google.com/drive/1KVrhkTCLaI-K4_6XEjgZ0hYjzczTxJIX" style="margin-right: 10px;">
          <img src="https://upload.wikimedia.org/wikipedia/commons/d/d0/Google_Colaboratory_SVG_Logo.svg" width="40" height="40" />
          <span>Run on Colab</span>
        </a>
      </div>
      <div style="display: flex; align-items: center;">
        <a href="https://drive.google.com/drive/folders/1WKBO-Phlrrutq3157Pzosv-jDVH62MY6?usp=sharing" style="margin-right: 10px;">
          <img src="https://upload.wikimedia.org/wikipedia/commons/8/8d/Download_alt_font_awesome.svg" width="25" height="25" />
          <span>Download Notebook</span>
        </a>
      </div>
      <div style="display: flex; align-items: center;">
        <a href="https://github.com/Optimization-AI/LibAUC" style="margin-right: 10px;">
            <img src="https://upload.wikimedia.org/wikipedia/commons/c/c2/GitHub_Invertocat_Logo.svg" width="25" height="25" />
            <span>View on Github</span>
        </a>
      </div>
    </div>

------------------------------------------------------------------------------------

.. container:: cell markdown

    | **Author**: Zi-Hao Qiu
    | **Edited by**: Zhuoning Yuan, Tianbao Yang
    \

Introduction
------------------------------------------------------------------------------------

In this tutorial, we introduce the application of iSogCLR algorithm in a
typical bimodal contrastive learning task. In pretraining stage, we
sample a subset of the widely used `CC3M
dataset <https://ai.google.com/research/ConceptualCaptions/download>`__,
which contains about 3,000,000 image-text pairs. And then we evaluate
the pretrained models via zero-shot image/text retrieval on
`MS-COCO <https://github.com/tylin/coco-caption>`__ dataset.

For the convenience of reproduction, we provide a subset of CC3M
`here <https://drive.google.com/drive/folders/1IDnFIJW3FIENffgPuZcDep1_UIyzKY-D?usp=drive_link>`__,
which contains 300,000 image text pairs. We also provide the MS-COCO
dataset and its jsons files
`here <https://drive.google.com/drive/folders/1uwtnunMNgc_E7f8bPzs5KQnCMRDPPFB9?usp=drive_link>`__.
The experiment in this tutorial is conducted one 4 Nvidia 3090 GPUs, you
can modify the **CUDA_VISIBLE_DEVICES** option and **batch_size_train**
option based on your equipments.

**References**

If you find this tutorial helpful in your work, please cite our `library paper <https://arxiv.org/abs/2306.03065>`__  and the following papers:

.. code-block:: RST

    @inproceedings{qiu2023isogclr,
         title={Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization},
         author={Qiu, Zi-Hao and Hu, Quanqi and Yuan, Zhuoning and Zhou, Denny and Zhang, Lijun and Yang, Tianbao},
         booktitle={International Conference on Machine Learning},
         year={2023},
         organization={PMLR}
       }

Install Latest LibAUC and Other Required Libs
------------------------------------------------------------------------------------

.. code:: python

    !pip install -U libauc  

Here we use `timm
library <https://github.com/huggingface/pytorch-image-models>`__ to
build image encoder and use the `transformers
library <https://github.com/huggingface/transformers>`__ to build text
encoder.

.. code:: python

    !pip install timm
    !pip install transformers  

We compare our iSogCLR with CLIP, which is implemented by
`OpenCLIP <https://github.com/mlfoundations/open_clip>`__

.. code:: python

    !pip install open_clip_torch

Import required libs
------------------------------------------------------------------------------------

.. code:: python

    import os
    os.environ["TOKENIZERS_PARALLELISM"] = "true"
    os.environ["CUDA_VISIBLE_DEVICES"] = '0' # distributed training: '0,1,2,3'
    
    import re
    import argparse
    from pathlib import Path
    import json
    import os
    import random
    import math
    from functools import partial
    
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.backends.cudnn as cudnn
    from torch import optim
    import torchvision
    from torchvision import transforms
    
    from torch.utils.data import Dataset, Subset, DataLoader
    
    from PIL import Image
    from PIL import ImageFile
    ImageFile.LOAD_TRUNCATED_IMAGES = True
    Image.MAX_IMAGE_PIXELS = None
    
    import cv2
    import numpy as np
    
    import timm
    from transformers import AutoModel, AutoTokenizer
    
    import open_clip
    from open_clip.loss import ClipLoss
    
    
    import libauc
    from libauc.losses.contrastive import GCLoss_v2
    from libauc.optimizer import iSogCLR
    from libauc.utils import CosineLRScheduler


Arguments for experiments
------------------------------------------------------------------------------------

.. code:: python

    # path to data folder
    data_path = 'cc3m_subset'
    train_file = 'cc3m_subset.json'
    
    # model config
    image_encoder = 'resnet50'
    text_encoder = 'distilbert-base-uncased'
    image_res = 256
    vision_width = 768
    embed_dim = 256
    seed = 42
    
    # optimizer and schedular
    opt = 'adamW'
    lr = 3e-4
    min_lr = 1e-5
    warmup = True
    warmup_lr = 1e-5
    weight_decay = 0.02
    decay_rate = 1
    epochs = 30
    warmup_epochs = 20
    cooldown_epochs = 0
    
    # training & test settings
    batch_size_train = 256
    batch_size_test = 512
    k_test = 256
    
    # output path
    output_dir = './output/' 
    
    # AMP training
    use_amp = True
    
    # loss config
    temp = 0.01       # the temperature parameter for clip or sogclr
    gamma = 0.8       # the parameter for the moving average estimator in sogclr/isogclr
    rho = 8.0         # the rho parameter for isogclr
    eta = 1e-4        # learning rate for the learnable temperature variables in isogclr
    tau_init = 0.01   # the initial value of the learnable temperature variables in isogclr
    beta_u = 0.9      # the momentum parameter for the graidents of the learnable temperature variables
    
    n_gpus = torch.cuda.device_count()
    
    val_coco_file = 'coco_val_new.json'
    test_coco_file = 'coco_test_new.json'
    coco_image_root = 'coco'
    
    Path(output_dir).mkdir(parents=True, exist_ok=True)

Define helper functions
------------------------------------------------------------------------------------

.. code:: python

    # we employ this function to preprocess the captions
    def pre_caption(caption, max_words):
        caption = re.sub(
            r"([,.'!?\"()*#:;~])",
            '',
            caption.lower(),
        ).replace('-', ' ').replace('/', ' ').replace('<person>', 'person')
    
        caption = re.sub(
            r"\s{2,}",
            ' ',
            caption,
        )
        caption = caption.rstrip('\n') 
        caption = caption.strip(' ')
    
        #truncate caption
        caption_words = caption.split(' ')
        if len(caption_words)>max_words:
            caption = ' '.join(caption_words[:max_words])
                
        return caption

.. code:: python

    class train_set(Dataset):
        def __init__(self, ann_file, transform, image_root, max_words=30):        
            self.ann = []
            for f in ann_file:
                self.ann += json.load(open(f,'r'))
            self.transform = transform
            self.image_root = image_root
            self.max_words = max_words
            self.img_ids = {}   
            
            n = 0
            for ann in self.ann:
                img_id = ann['image_id']
                if img_id not in self.img_ids.keys():
                    self.img_ids[img_id] = n
                    n += 1    
            
        def __len__(self):
            return len(self.ann)
        
        def __getitem__(self, index):    
            ann = self.ann[index]
            image_path = os.path.join(self.image_root, ann['image'])
    
            image = Image.open(image_path).convert('RGB')   
            image = self.transform(image)
            
            caption = pre_caption(ann['caption'], self.max_words) 
    
            return image, caption, self.img_ids[ann['image_id']], index
        
        
    class eval_set(Dataset):
        def __init__(self, ann_file, transform, image_root, max_words=30):        
            self.ann = json.load(open(ann_file,'r'))
            self.transform = transform
            self.image_root = image_root
            self.max_words = max_words 
            
            self.text = []
            self.image = []
            self.txt2img = {}
            self.img2txt = {}
            
            txt_id = 0
            for img_id, ann in enumerate(self.ann):
                self.image.append(ann['image'])
                self.img2txt[img_id] = []
                for i, caption in enumerate(ann['caption']):
                    self.text.append(pre_caption(caption,self.max_words))
                    self.img2txt[img_id].append(txt_id)
                    self.txt2img[txt_id] = img_id
                    txt_id += 1
                                        
        def __len__(self):
            return len(self.image)
        
        def __getitem__(self, index):    
            image_path = os.path.join(self.image_root, self.ann[index]['image'])        
            image = Image.open(image_path).convert('RGB')    
            image = self.transform(image)  
    
            return image, index

.. code:: python

    def add_weight_decay(model, weight_decay=1e-5, skip_list=()):
        decay = []
        no_decay = []
        for name, param in model.named_parameters():
            if not param.requires_grad:
                continue  # frozen weights
            if len(param.shape) == 1 or name.endswith(".bias") or name in skip_list:
                no_decay.append(param)
            else:
                decay.append(param)
        return [
            {'params': no_decay, 'weight_decay': 0.},
            {'params': decay, 'weight_decay': weight_decay}]
    
    
    def create_optimizer(model, opt, weight_decay=1e-5, filter_bias_and_bn=True):
        if weight_decay and filter_bias_and_bn:
            skip = {}
            if hasattr(model, 'no_weight_decay'):
                skip = model.no_weight_decay()
            parameters = add_weight_decay(model, weight_decay, skip)
            weight_decay = 0.
        else:
            parameters = model.parameters()
    
        opt_args = dict(lr=lr, weight_decay=weight_decay)
        optimizer = iSogCLR(parameters, mode=opt, **opt_args)
    
        return optimizer

.. code:: python

    def create_scheduler(optimizer):
        num_epochs = epochs
        
        lr_scheduler = CosineLRScheduler(
            optimizer,
            t_initial = num_epochs,
            t_mul = 1.0,
            lr_min = min_lr,
            decay_rate = decay_rate,
            warmup_lr_init = warmup_lr,
            warmup_t = warmup_epochs,
            cycle_limit = 1,
            t_in_epochs = True,
            noise_range_t = None,
            noise_pct = 0.67,
            noise_std = 1.0,
            noise_seed = 42,
        )
      
        return lr_scheduler

Fix random seed
------------------------------------------------------------------------------------

The following functions limit the number of sources of randomness
behaviors, such as model intialization, data shuffling, etcs.

.. code:: python

    # fix the seed for reproducibility
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    cudnn.benchmark = True

Objectives
------------------------------------------------------------------------------------

Here, we mainly introduce the Robust Global Contrastive Loss (RGCL) for
learning representations for bimodal data (e.g., image-text data). For the detailed formulation, please refer to the `paper <https://arxiv.org/abs/2305.11965>`__,.


Define the model
------------------------------------------------------------------------------------

.. code:: python

    # The following class includes the image encoder, text encoder and several objectives
    class Model(nn.Module):
        def __init__(self, image_encoder = None, text_encoder = None,
                     embed_dim = 256, init_model = True, bsz = 128,
                     loss_type = 'clip',  # objective type: clip, sogclr, isogclr
                     gamma = 0.9,         # the coefficient for moving average estimator
                     temp = 0.01,         # temperature for clip or sogclr
                     rho = 8.0, eta = 0.01, tau_init = 0.01, beta_u = 0.9,  # params for isogclr
                     use_temp_net = True):    # True if you want to use temperature network for isogclr
            super().__init__()
    
            self.temp = temp
        
            self.visual_encoder = timm.create_model(image_encoder, pretrained=init_model)
            self.visual_encoder.reset_classifier(0)
    
            self.text_encoder = AutoModel.from_pretrained(text_encoder, local_files_only=False)
    
            if not init_model:
                self.text_encoder.init_weights()
    
            self.vision_proj = nn.Linear(self.visual_encoder.num_features, embed_dim)
            self.text_proj = nn.Linear(768, embed_dim)   
    
            self.loss_type = loss_type
            
            if self.loss_type == 'clip':
                self.criterion = ClipLoss()        # here we employ the implementation from open-clip
                self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / temp))
            elif self.loss_type == 'isogclr':
                self.criterion = GCLoss_v2(tau=temp, gamma=gamma, tau_min=0.005, tau_max=0.07,
                                           rho=rho, eta=eta, enable_isogclr=True)
            else:
                raise NotImplementedError
           
        def forward(self, image, text_ids, text_att_masks, idx, text_idx, epoch):
            image_embeds = self.visual_encoder(image)
            image_embeds = self.vision_proj(image_embeds)
            image_feat = F.normalize(image_embeds, dim=-1) 
    
            text_output = self.text_encoder(text_ids, attention_mask=text_att_masks, output_hidden_states=False)
            text_embeds = self.text_proj(text_output.last_hidden_state[:,0,:])
            text_feat = F.normalize(text_embeds, dim=-1)
            
            if self.loss_type == 'clip':
                loss = self.criterion(image_feat, text_feat, self.logit_scale.exp())
                info = None
            elif self.loss_type == 'isogclr':
                loss, info = self.criterion(image_feat, text_feat, idx)
    
            return loss, info

Training function
------------------------------------------------------------------------------------

.. code:: python

    def epoch_train(model, data_loader, optimizer, tokenizer, epoch, max_epoch, warmup_steps, device, scheduler, grad_scaler):
        # train
        model.train()  
        
        print_freq = 50
        step_size = 100
        warmup_iterations = warmup_steps * step_size  
        
        for i,(image, text, idx, text_idx) in enumerate(data_loader):
            optimizer.zero_grad()
    
            image = image.to(device, non_blocking=True)   
            idx = idx.to(device, non_blocking=True)
            text_idx = text_idx.to(device, non_blocking=True)   
            text_input = tokenizer(text, padding='max_length', truncation=True, max_length=30, return_tensors="pt").to(device)  
                
            if grad_scaler is None:
                loss, info = model(image, text_input.input_ids, text_input.attention_mask, idx=idx, text_idx=text_idx, epoch=epoch)
                loss.mean().backward()
                optimizer.step()
            else:
                with torch.cuda.amp.autocast():
                    loss, info = model(image, text_input.input_ids, text_input.attention_mask, idx=idx, text_idx=text_idx, epoch=epoch)
                grad_scaler.scale(loss.mean()).backward()
                grad_scaler.step(optimizer)
                grad_scaler.update()
           
            if epoch==0 and i%step_size==0 and i<=warmup_iterations: 
                scheduler.step(i//step_size)
                
            if i%print_freq == 0:
                lr = optimizer.param_groups[0]["lr"]
                print("Epoch:", epoch, "iteration:", i, "lr:", lr, "loss:", loss.mean().item())
                if info is not None:
                    print("tau_img: %.4f, tau_txt: %.4f" % (info[0].mean(), info[1].mean()))

Evaluation function
-------------------

.. code:: python

    @torch.no_grad()
    def evaluation(model, data_loader, tokenizer, device):
        # test
        model.eval() 
    
        print('Computing features for evaluation...')
        texts = data_loader.dataset.text   
        num_text = len(texts)
        text_bs = 256
        text_embeds = []
        for i in range(0, num_text, text_bs):
            text = texts[i: min(num_text, i+text_bs)]
            text_input = tokenizer(text, padding='max_length', truncation=True, max_length=30, return_tensors="pt").to(device) 
            text_output = model.text_encoder(text_input.input_ids, attention_mask=text_input.attention_mask, output_hidden_states=False)  
            text_embed = F.normalize(model.text_proj(text_output.last_hidden_state[:,0,:]), dim=-1)
            text_embeds.append(text_embed)
        text_embeds = torch.cat(text_embeds,dim=0)
        
        image_embeds = []
        for image, img_id in data_loader: 
            image = image.to(device) 
            image_feat = model.visual_encoder(image)        
            image_embed = model.vision_proj(image_feat)            
            image_embed = F.normalize(image_embed, dim=-1)      
            image_embeds.append(image_embed)
        image_embeds = torch.cat(image_embeds,dim=0)
        
        sims_matrix = image_embeds @ text_embeds.t()
        score_matrix_i2t = torch.full((len(data_loader.dataset.image),len(texts)),-100.0).to(device)
    
        for i,sims in enumerate(sims_matrix): 
            topk_sim, topk_idx = sims.topk(k=k_test, dim=0)
            score_matrix_i2t[i, topk_idx] = topk_sim
            
        sims_matrix = sims_matrix.t()
        score_matrix_t2i = torch.full((len(texts),len(data_loader.dataset.image)),-100.0).to(device)
        
        for i,sims in enumerate(sims_matrix): 
            topk_sim, topk_idx = sims.topk(k=k_test, dim=0)
            score_matrix_t2i[i, topk_idx] = topk_sim
    
        return score_matrix_i2t.cpu().numpy(), score_matrix_t2i.cpu().numpy()
    
    
    @torch.no_grad()
    def itm_eval(scores_i2t, scores_t2i, txt2img, img2txt):
        
        #Images->Text 
        ranks = np.zeros(scores_i2t.shape[0])
        for index,score in enumerate(scores_i2t):
            inds = np.argsort(score)[::-1]
            # Score
            rank = 1e20
            for i in img2txt[index]:
                tmp = np.where(inds == i)[0][0]
                if tmp < rank:
                    rank = tmp
            ranks[index] = rank
    
        # Compute metrics
        tr1 = 100.0 * len(np.where(ranks < 1)[0]) / len(ranks)
        tr5 = 100.0 * len(np.where(ranks < 5)[0]) / len(ranks)
        tr10 = 100.0 * len(np.where(ranks < 10)[0]) / len(ranks)
      
        #Text->Images 
        ranks = np.zeros(scores_t2i.shape[0])
        
        for index,score in enumerate(scores_t2i):
            inds = np.argsort(score)[::-1]
            ranks[index] = np.where(inds == txt2img[index])[0][0]
    
        # Compute metrics
        ir1 = 100.0 * len(np.where(ranks < 1)[0]) / len(ranks)
        ir5 = 100.0 * len(np.where(ranks < 5)[0]) / len(ranks)
        ir10 = 100.0 * len(np.where(ranks < 10)[0]) / len(ranks)        
    
        tr_mean = (tr1 + tr5 + tr10) / 3
        ir_mean = (ir1 + ir5 + ir10) / 3
        r_mean = (tr_mean + ir_mean) / 2
    
        eval_result =  {'txt_r1': tr1,
                        'txt_r5': tr5,
                        'txt_r10': tr10,
                        'txt_r_mean': tr_mean,
                        'img_r1': ir1,
                        'img_r5': ir5,
                        'img_r10': ir10,
                        'img_r_mean': ir_mean,
                        'r_mean': r_mean}
        return eval_result

Create datasets and dataloaders
------------------------------------------------------------------------------------

.. code:: python

    # set up the transformation, datasets and dataloaders
    train_transform = transforms.Compose([                        
            transforms.RandomResizedCrop(image_res, scale=(0.5, 1.0), interpolation=Image.BICUBIC),
            transforms.RandomHorizontalFlip(),
            transforms.RandAugment(),     
            transforms.ToTensor(),
            transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
        ]) 
    
    test_transform = transforms.Compose([
        transforms.Resize((image_res, image_res), interpolation=Image.BICUBIC),
        transforms.ToTensor(),
        transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
        ])
    
    train_dataset = train_set([train_file], train_transform, data_path)
    val_coco_dataset = eval_set(val_coco_file, test_transform, coco_image_root)
    test_coco_dataset = eval_set(test_coco_file, test_transform, coco_image_root)
    
    print("len of train_dataset:", len(train_dataset))
    print("len of coco val/test:", len(val_coco_dataset), len(test_coco_dataset))
    
    train_loader = DataLoader(train_dataset, batch_size=batch_size_train * n_gpus, num_workers=16, pin_memory=True,
                             shuffle=True, drop_last=True, prefetch_factor=4)
    val_loader = DataLoader(val_coco_dataset, batch_size=batch_size_test, num_workers=16, pin_memory=True,
                           shuffle=False, drop_last=False, prefetch_factor=12)
    test_loader = DataLoader(test_coco_dataset, batch_size=batch_size_test, num_workers=16, pin_memory=True,
                           shuffle=False, drop_last=False, prefetch_factor=12)


.. parsed-literal::

    len of train_dataset: 300000
    len of coco val/test: 5000 5000


Launch training and evaluation for CLIP
---------------------------------------

.. code:: python

    # create the model
    tokenizer = AutoTokenizer.from_pretrained(text_encoder, local_files_only=False)
    model = Model(image_encoder=image_encoder, text_encoder=text_encoder, embed_dim=embed_dim, 
                  init_model=True, bsz=batch_size_train, loss_type='clip', 
                  gamma=gamma, temp=temp, rho=rho, eta=eta, tau_init=tau_init, beta_u=beta_u)
    
    model = model.cuda()


.. code:: python

    if n_gpus > 1:
        print("Using", n_gpus, "GPUs")
        model = nn.DataParallel(model)


.. code:: python

    # set up the optimizer and objective function
    optimizer = create_optimizer(model, opt, weight_decay)
    lr_scheduler = create_scheduler(optimizer)
    
    if use_amp:
        grad_scaler = torch.cuda.amp.GradScaler()
    else:
        grad_scaler = None
    
    # training loop
    for epoch in range(0, epochs):
        train_stats = epoch_train(model, train_loader, optimizer, tokenizer, epoch, epochs, 
                                  warmup_epochs, torch.device('cuda'), lr_scheduler, grad_scaler)
    
        # evaluate the model on ms-coco data
        try:
          # for distributed training
          score_val_i2t_coco, score_val_t2i_coco = evaluation(model.module, val_loader, tokenizer,  torch.device('cuda')) # model.module
          score_test_i2t_coco, score_test_t2i_coco = evaluation(model.module, test_loader, tokenizer,  torch.device('cuda'))
        except:
          # for non-distributed training
          score_val_i2t_coco, score_val_t2i_coco = evaluation(model, val_loader, tokenizer,  torch.device('cuda')) # model.module
          score_test_i2t_coco, score_test_t2i_coco = evaluation(model, test_loader, tokenizer,  torch.device('cuda'))  
    
        print("Epoch:", epoch)
        val_result_coco = itm_eval(score_val_i2t_coco, score_val_t2i_coco, val_loader.dataset.txt2img, val_loader.dataset.img2txt)  
        print("coco val:", val_result_coco)
        test_result_coco = itm_eval(score_test_i2t_coco, score_test_t2i_coco, test_loader.dataset.txt2img, test_loader.dataset.img2txt)    
        print("coco test:", test_result_coco)
        
        lr_scheduler.step(epoch+warmup_epochs+1)


.. parsed-literal::

    Epoch: 0 iteration: 0 lr: 1e-05 loss: 11.74642562866211
    Epoch: 0 iteration: 50 lr: 1e-05 loss: 7.507866859436035
    Epoch: 0 iteration: 100 lr: 2.45e-05 loss: 5.759531497955322
    Epoch: 0 iteration: 150 lr: 2.45e-05 loss: 4.457749843597412
    Epoch: 0 iteration: 200 lr: 3.899999999999999e-05 loss: 3.844197988510132
    Epoch: 0 iteration: 250 lr: 3.899999999999999e-05 loss: 3.469355583190918
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 0
    coco val: {'txt_r1': 3.86, 'txt_r5': 12.8, 'txt_r10': 19.8, 'txt_r_mean': 12.153333333333334, 'img_r1': 1.8872451019592162, 'img_r5': 7.0171931227509, 'img_r10': 12.043182726909237, 'img_r_mean': 6.982540317206451, 'r_mean': 9.567936825269893}
    coco test: {'txt_r1': 3.6, 'txt_r5': 12.38, 'txt_r10': 18.84, 'txt_r_mean': 11.606666666666667, 'img_r1': 1.8032786885245902, 'img_r5': 7.005197920831668, 'img_r10': 11.943222710915634, 'img_r_mean': 6.917233106757298, 'r_mean': 9.261949886711982}
    Epoch: 1 iteration: 0 lr: 0.0002992056748283996 loss: 3.105051279067993
    Epoch: 1 iteration: 50 lr: 0.0002992056748283996 loss: 2.4103074073791504
    Epoch: 1 iteration: 100 lr: 0.0002992056748283996 loss: 2.2818379402160645
    Epoch: 1 iteration: 150 lr: 0.0002992056748283996 loss: 2.118741989135742
    Epoch: 1 iteration: 200 lr: 0.0002992056748283996 loss: 1.9152384996414185
    Epoch: 1 iteration: 250 lr: 0.0002992056748283996 loss: 1.8800408840179443
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 1
    coco val: {'txt_r1': 15.04, 'txt_r5': 33.92, 'txt_r10': 45.58, 'txt_r_mean': 31.513333333333332, 'img_r1': 8.06077568972411, 'img_r5': 22.718912435025988, 'img_r10': 33.10275889644142, 'img_r_mean': 21.29414900706384, 'r_mean': 26.403741170198586}
    coco test: {'txt_r1': 14.64, 'txt_r5': 34.1, 'txt_r10': 45.68, 'txt_r_mean': 31.473333333333333, 'img_r1': 7.804878048780488, 'img_r5': 22.82686925229908, 'img_r10': 33.88644542183127, 'img_r_mean': 21.50606424097028, 'r_mean': 26.489698787151806}
    Epoch: 2 iteration: 0 lr: 0.0002968314021064018 loss: 1.5531284809112549
    Epoch: 2 iteration: 50 lr: 0.0002968314021064018 loss: 1.5267637968063354
    Epoch: 2 iteration: 100 lr: 0.0002968314021064018 loss: 1.4859260320663452
    Epoch: 2 iteration: 150 lr: 0.0002968314021064018 loss: 1.552567958831787
    Epoch: 2 iteration: 200 lr: 0.0002968314021064018 loss: 1.4763367176055908
    Epoch: 2 iteration: 250 lr: 0.0002968314021064018 loss: 1.501932978630066
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 2
    coco val: {'txt_r1': 15.7, 'txt_r5': 36.7, 'txt_r10': 48.82, 'txt_r_mean': 33.74, 'img_r1': 10.163934426229508, 'img_r5': 26.62934826069572, 'img_r10': 37.99280287884846, 'img_r_mean': 24.92869518859123, 'r_mean': 29.334347594295615}
    coco test: {'txt_r1': 15.74, 'txt_r5': 36.68, 'txt_r10': 48.24, 'txt_r_mean': 33.553333333333335, 'img_r1': 9.75609756097561, 'img_r5': 26.965213914434226, 'img_r10': 38.74050379848061, 'img_r_mean': 25.153938424630145, 'r_mean': 29.35363587898174}
    Epoch: 3 iteration: 0 lr: 0.00029290319486279724 loss: 1.2079691886901855
    Epoch: 3 iteration: 50 lr: 0.00029290319486279724 loss: 1.2061635255813599
    Epoch: 3 iteration: 100 lr: 0.00029290319486279724 loss: 1.181814432144165
    Epoch: 3 iteration: 150 lr: 0.00029290319486279724 loss: 1.235809564590454
    Epoch: 3 iteration: 200 lr: 0.00029290319486279724 loss: 1.2041468620300293
    Epoch: 3 iteration: 250 lr: 0.00029290319486279724 loss: 1.2037649154663086
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 3
    coco val: {'txt_r1': 16.04, 'txt_r5': 36.82, 'txt_r10': 49.36, 'txt_r_mean': 34.07333333333333, 'img_r1': 10.44782087165134, 'img_r5': 27.457017193122752, 'img_r10': 38.48460615753699, 'img_r_mean': 25.463148074103696, 'r_mean': 29.76824070371851}
    coco test: {'txt_r1': 15.7, 'txt_r5': 36.5, 'txt_r10': 48.58, 'txt_r_mean': 33.593333333333334, 'img_r1': 10.343862455017993, 'img_r5': 27.988804478208717, 'img_r10': 39.36825269892043, 'img_r_mean': 25.900306544049045, 'r_mean': 29.74681993869119}
    Epoch: 4 iteration: 0 lr: 0.00028746409135817707 loss: 1.0229318141937256
    Epoch: 4 iteration: 50 lr: 0.00028746409135817707 loss: 0.8746964931488037
    Epoch: 4 iteration: 100 lr: 0.00028746409135817707 loss: 1.064015507698059
    Epoch: 4 iteration: 150 lr: 0.00028746409135817707 loss: 1.0923449993133545
    Epoch: 4 iteration: 200 lr: 0.00028746409135817707 loss: 0.979778528213501
    Epoch: 4 iteration: 250 lr: 0.00028746409135817707 loss: 1.0288567543029785
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 4
    coco val: {'txt_r1': 16.2, 'txt_r5': 37.3, 'txt_r10': 49.6, 'txt_r_mean': 34.36666666666667, 'img_r1': 11.407437025189925, 'img_r5': 28.64454218312675, 'img_r10': 40.199920031987205, 'img_r_mean': 26.750633080101295, 'r_mean': 30.55864987338398}
    coco test: {'txt_r1': 15.46, 'txt_r5': 37.26, 'txt_r10': 48.44, 'txt_r_mean': 33.72, 'img_r1': 10.71171531387445, 'img_r5': 28.95641743302679, 'img_r10': 40.863654538184726, 'img_r_mean': 26.843929095028653, 'r_mean': 30.281964547514328}
    Epoch: 5 iteration: 0 lr: 0.0002805736835487436 loss: 0.748623251914978
    Epoch: 5 iteration: 50 lr: 0.0002805736835487436 loss: 0.8048175573348999
    Epoch: 5 iteration: 100 lr: 0.0002805736835487436 loss: 0.8324432969093323
    Epoch: 5 iteration: 150 lr: 0.0002805736835487436 loss: 0.8187351822853088
    Epoch: 5 iteration: 200 lr: 0.0002805736835487436 loss: 0.8561583757400513
    Epoch: 5 iteration: 250 lr: 0.0002805736835487436 loss: 0.7616273164749146
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 5
    coco val: {'txt_r1': 15.6, 'txt_r5': 37.12, 'txt_r10': 49.98, 'txt_r_mean': 34.23333333333333, 'img_r1': 11.463414634146341, 'img_r5': 29.52019192323071, 'img_r10': 41.35545781687325, 'img_r_mean': 27.446354791416766, 'r_mean': 30.839844062375047}
    coco test: {'txt_r1': 14.72, 'txt_r5': 35.68, 'txt_r10': 48.34, 'txt_r_mean': 32.913333333333334, 'img_r1': 11.523390643742504, 'img_r5': 30.143942423030786, 'img_r10': 41.67932826869252, 'img_r_mean': 27.78222044515527, 'r_mean': 30.3477768892443}
    Epoch: 6 iteration: 0 lr: 0.0002723074641843674 loss: 0.5856387615203857
    Epoch: 6 iteration: 50 lr: 0.0002723074641843674 loss: 0.7076289057731628
    Epoch: 6 iteration: 100 lr: 0.0002723074641843674 loss: 0.6565060615539551
    Epoch: 6 iteration: 150 lr: 0.0002723074641843674 loss: 0.6765242218971252
    Epoch: 6 iteration: 200 lr: 0.0002723074641843674 loss: 0.7100015878677368
    Epoch: 6 iteration: 250 lr: 0.0002723074641843674 loss: 0.6650581955909729
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 6
    coco val: {'txt_r1': 15.54, 'txt_r5': 37.94, 'txt_r10': 50.16, 'txt_r_mean': 34.54666666666666, 'img_r1': 11.243502598960417, 'img_r5': 29.432227109156337, 'img_r10': 41.04758096761295, 'img_r_mean': 27.24110355857657, 'r_mean': 30.893885112621614}
    coco test: {'txt_r1': 15.78, 'txt_r5': 36.78, 'txt_r10': 49.24, 'txt_r_mean': 33.93333333333334, 'img_r1': 11.379448220711716, 'img_r5': 29.956017592962816, 'img_r10': 41.45541783286685, 'img_r_mean': 27.59696121551379, 'r_mean': 30.765147274423562}
    Epoch: 7 iteration: 0 lr: 0.00026275599969422214 loss: 0.5822378396987915
    Epoch: 7 iteration: 50 lr: 0.00026275599969422214 loss: 0.5452847480773926
    Epoch: 7 iteration: 100 lr: 0.00026275599969422214 loss: 0.5890320539474487
    Epoch: 7 iteration: 150 lr: 0.00026275599969422214 loss: 0.558639645576477
    Epoch: 7 iteration: 200 lr: 0.00026275599969422214 loss: 0.6335784196853638
    Epoch: 7 iteration: 250 lr: 0.00026275599969422214 loss: 0.6401098370552063
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 7
    coco val: {'txt_r1': 16.54, 'txt_r5': 38.52, 'txt_r10': 51.04, 'txt_r_mean': 35.36666666666667, 'img_r1': 11.795281887245102, 'img_r5': 29.89204318272691, 'img_r10': 41.58736505397841, 'img_r_mean': 27.758230041316807, 'r_mean': 31.562448353991737}
    coco test: {'txt_r1': 16.28, 'txt_r5': 37.32, 'txt_r10': 49.64, 'txt_r_mean': 34.413333333333334, 'img_r1': 11.47141143542583, 'img_r5': 30.275889644142342, 'img_r10': 42.147141143542584, 'img_r_mean': 27.964814074370253, 'r_mean': 31.189073703851793}
    Epoch: 8 iteration: 0 lr: 0.0002520239379220344 loss: 0.5210278034210205
    Epoch: 8 iteration: 50 lr: 0.0002520239379220344 loss: 0.4082544445991516
    Epoch: 8 iteration: 100 lr: 0.0002520239379220344 loss: 0.4823477864265442
    Epoch: 8 iteration: 150 lr: 0.0002520239379220344 loss: 0.49092692136764526
    Epoch: 8 iteration: 200 lr: 0.0002520239379220344 loss: 0.5032364130020142
    Epoch: 8 iteration: 250 lr: 0.0002520239379220344 loss: 0.4627079963684082
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 8
    coco val: {'txt_r1': 15.3, 'txt_r5': 36.62, 'txt_r10': 49.02, 'txt_r_mean': 33.64666666666667, 'img_r1': 11.615353858456617, 'img_r5': 29.78808476609356, 'img_r10': 41.175529788084766, 'img_r_mean': 27.526322804211645, 'r_mean': 30.586494735439157}
    coco test: {'txt_r1': 14.7, 'txt_r5': 34.02, 'txt_r10': 47.42, 'txt_r_mean': 32.04666666666667, 'img_r1': 11.411435425829668, 'img_r5': 29.76809276289484, 'img_r10': 41.44342263094762, 'img_r_mean': 27.540983606557376, 'r_mean': 29.79382513661202}
    Epoch: 9 iteration: 0 lr: 0.00024022886158240857 loss: 0.34958702325820923
    Epoch: 9 iteration: 50 lr: 0.00024022886158240857 loss: 0.4485335350036621
    Epoch: 9 iteration: 100 lr: 0.00024022886158240857 loss: 0.41256430745124817
    Epoch: 9 iteration: 150 lr: 0.00024022886158240857 loss: 0.3847663998603821
    Epoch: 9 iteration: 200 lr: 0.00024022886158240857 loss: 0.434209942817688
    Epoch: 9 iteration: 250 lr: 0.00024022886158240857 loss: 0.4179908037185669
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 9
    coco val: {'txt_r1': 15.56, 'txt_r5': 37.96, 'txt_r10': 50.06, 'txt_r_mean': 34.52666666666667, 'img_r1': 11.611355457816874, 'img_r5': 29.848060775689724, 'img_r10': 41.5953618552579, 'img_r_mean': 27.684926029588166, 'r_mean': 31.10579634812742}
    coco test: {'txt_r1': 15.74, 'txt_r5': 36.64, 'txt_r10': 48.5, 'txt_r_mean': 33.626666666666665, 'img_r1': 11.80327868852459, 'img_r5': 29.772091163534586, 'img_r10': 41.583366653338665, 'img_r_mean': 27.719578835132612, 'r_mean': 30.673122750899637}
    Epoch: 10 iteration: 0 lr: 0.00022749999999999997 loss: 0.33992326259613037
    Epoch: 10 iteration: 50 lr: 0.00022749999999999997 loss: 0.3966507911682129
    Epoch: 10 iteration: 100 lr: 0.00022749999999999997 loss: 0.3801310360431671
    Epoch: 10 iteration: 150 lr: 0.00022749999999999997 loss: 0.342434823513031
    Epoch: 10 iteration: 200 lr: 0.00022749999999999997 loss: 0.3833215832710266
    Epoch: 10 iteration: 250 lr: 0.00022749999999999997 loss: 0.43105077743530273
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 10
    coco val: {'txt_r1': 16.44, 'txt_r5': 39.08, 'txt_r10': 51.28, 'txt_r_mean': 35.6, 'img_r1': 11.915233906437425, 'img_r5': 30.091963214714113, 'img_r10': 41.84326269492203, 'img_r_mean': 27.950153272024522, 'r_mean': 31.775076636012262}
    coco test: {'txt_r1': 15.32, 'txt_r5': 37.7, 'txt_r10': 50.18, 'txt_r_mean': 34.4, 'img_r1': 11.859256297481007, 'img_r5': 30.403838464614154, 'img_r10': 41.911235505797684, 'img_r_mean': 28.058110089297617, 'r_mean': 31.22905504464881}
    Epoch: 11 iteration: 0 lr: 0.00021397681324599103 loss: 0.31117188930511475
    Epoch: 11 iteration: 50 lr: 0.00021397681324599103 loss: 0.33558982610702515
    Epoch: 11 iteration: 100 lr: 0.00021397681324599103 loss: 0.36867523193359375
    Epoch: 11 iteration: 150 lr: 0.00021397681324599103 loss: 0.28263527154922485
    Epoch: 11 iteration: 200 lr: 0.00021397681324599103 loss: 0.3501768112182617
    Epoch: 11 iteration: 250 lr: 0.00021397681324599103 loss: 0.36479008197784424
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 11
    coco val: {'txt_r1': 15.28, 'txt_r5': 37.84, 'txt_r10': 49.2, 'txt_r_mean': 34.10666666666667, 'img_r1': 11.979208316673331, 'img_r5': 30.23190723710516, 'img_r10': 42.05517792882847, 'img_r_mean': 28.08876449420232, 'r_mean': 31.097715580434496}
    coco test: {'txt_r1': 15.22, 'txt_r5': 36.18, 'txt_r10': 48.02, 'txt_r_mean': 33.14, 'img_r1': 11.955217912834867, 'img_r5': 30.979608156737307, 'img_r10': 42.60295881647341, 'img_r_mean': 28.512594962015196, 'r_mean': 30.8262974810076}
    Epoch: 12 iteration: 0 lr: 0.00019980746418436736 loss: 0.27429062128067017
    Epoch: 12 iteration: 50 lr: 0.00019980746418436736 loss: 0.3097416162490845
    Epoch: 12 iteration: 100 lr: 0.00019980746418436736 loss: 0.30445027351379395
    Epoch: 12 iteration: 150 lr: 0.00019980746418436736 loss: 0.3258894681930542
    Epoch: 12 iteration: 200 lr: 0.00019980746418436736 loss: 0.27619031071662903
    Epoch: 12 iteration: 250 lr: 0.00019980746418436736 loss: 0.30364763736724854
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 12
    coco val: {'txt_r1': 16.56, 'txt_r5': 38.14, 'txt_r10': 50.62, 'txt_r_mean': 35.10666666666666, 'img_r1': 12.35905637744902, 'img_r5': 31.139544182327068, 'img_r10': 42.51899240303879, 'img_r_mean': 28.672530987604958, 'r_mean': 31.88959882713581}
    coco test: {'txt_r1': 15.18, 'txt_r5': 36.3, 'txt_r10': 49.56, 'txt_r_mean': 33.68, 'img_r1': 12.295081967213115, 'img_r5': 31.211515393842465, 'img_r10': 42.998800479808075, 'img_r_mean': 28.835132613621216, 'r_mean': 31.25756630681061}
    Epoch: 13 iteration: 0 lr: 0.00018514719516857505 loss: 0.2100810557603836
    Epoch: 13 iteration: 50 lr: 0.00018514719516857505 loss: 0.2885628938674927
    Epoch: 13 iteration: 100 lr: 0.00018514719516857505 loss: 0.2615102529525757
    Epoch: 13 iteration: 150 lr: 0.00018514719516857505 loss: 0.30048686265945435
    Epoch: 13 iteration: 200 lr: 0.00018514719516857505 loss: 0.30662938952445984
    Epoch: 13 iteration: 250 lr: 0.00018514719516857505 loss: 0.3095318377017975
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 13
    coco val: {'txt_r1': 16.22, 'txt_r5': 37.66, 'txt_r10': 50.22, 'txt_r_mean': 34.699999999999996, 'img_r1': 11.511395441823272, 'img_r5': 29.488204718112755, 'img_r10': 40.95961615353858, 'img_r_mean': 27.3197387711582, 'r_mean': 31.009869385579098}
    coco test: {'txt_r1': 16.1, 'txt_r5': 37.5, 'txt_r10': 49.88, 'txt_r_mean': 34.49333333333333, 'img_r1': 11.923230707716913, 'img_r5': 30.22391043582567, 'img_r10': 41.63934426229508, 'img_r_mean': 27.928828468612554, 'r_mean': 31.211080900972945}
    Epoch: 14 iteration: 0 lr: 0.00017015662717380974 loss: 0.22490891814231873
    Epoch: 14 iteration: 50 lr: 0.00017015662717380974 loss: 0.24104690551757812
    Epoch: 14 iteration: 100 lr: 0.00017015662717380974 loss: 0.27677229046821594
    Epoch: 14 iteration: 150 lr: 0.00017015662717380974 loss: 0.25092434883117676
    Epoch: 14 iteration: 200 lr: 0.00017015662717380974 loss: 0.23248010873794556
    Epoch: 14 iteration: 250 lr: 0.00017015662717380974 loss: 0.2669617235660553
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 14
    coco val: {'txt_r1': 15.82, 'txt_r5': 36.82, 'txt_r10': 49.34, 'txt_r_mean': 33.99333333333333, 'img_r1': 12.047181127548981, 'img_r5': 30.635745701719312, 'img_r10': 42.6109556177529, 'img_r_mean': 28.431294149007062, 'r_mean': 31.2123137411702}
    coco test: {'txt_r1': 14.78, 'txt_r5': 35.76, 'txt_r10': 48.42, 'txt_r_mean': 32.98666666666667, 'img_r1': 12.243102758896441, 'img_r5': 30.695721711315475, 'img_r10': 42.36705317872851, 'img_r_mean': 28.435292549646807, 'r_mean': 30.71097960815674}
    Epoch: 15 iteration: 0 lr: 0.000155 loss: 0.1818775236606598
    Epoch: 15 iteration: 50 lr: 0.000155 loss: 0.21323110163211823
    Epoch: 15 iteration: 100 lr: 0.000155 loss: 0.2310401201248169
    Epoch: 15 iteration: 150 lr: 0.000155 loss: 0.2086959332227707
    Epoch: 15 iteration: 200 lr: 0.000155 loss: 0.22357095777988434
    Epoch: 15 iteration: 250 lr: 0.000155 loss: 0.24121759831905365
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 15
    coco val: {'txt_r1': 16.8, 'txt_r5': 38.16, 'txt_r10': 51.14, 'txt_r_mean': 35.36666666666667, 'img_r1': 11.935225909636145, 'img_r5': 30.103958416633347, 'img_r10': 42.33106757297081, 'img_r_mean': 28.123417299746766, 'r_mean': 31.74504198320672}
    coco test: {'txt_r1': 15.2, 'txt_r5': 37.24, 'txt_r10': 50.12, 'txt_r_mean': 34.18666666666667, 'img_r1': 11.611355457816874, 'img_r5': 30.403838464614154, 'img_r10': 42.12714914034386, 'img_r_mean': 28.047447687591628, 'r_mean': 31.117057177129148}
    Epoch: 16 iteration: 0 lr: 0.00013984337282619026 loss: 0.20621338486671448
    Epoch: 16 iteration: 50 lr: 0.00013984337282619026 loss: 0.20322853326797485
    Epoch: 16 iteration: 100 lr: 0.00013984337282619026 loss: 0.2034672498703003
    Epoch: 16 iteration: 150 lr: 0.00013984337282619026 loss: 0.2079382836818695
    Epoch: 16 iteration: 200 lr: 0.00013984337282619026 loss: 0.21095183491706848
    Epoch: 16 iteration: 250 lr: 0.00013984337282619026 loss: 0.20369692146778107
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 16
    coco val: {'txt_r1': 16.8, 'txt_r5': 37.92, 'txt_r10': 50.68, 'txt_r_mean': 35.13333333333333, 'img_r1': 12.231107556977209, 'img_r5': 30.69972011195522, 'img_r10': 42.54698120751699, 'img_r_mean': 28.492602958816473, 'r_mean': 31.8129681460749}
    coco test: {'txt_r1': 15.04, 'txt_r5': 36.64, 'txt_r10': 49.7, 'txt_r_mean': 33.79333333333333, 'img_r1': 12.29908036785286, 'img_r5': 30.86765293882447, 'img_r10': 42.5109956017593, 'img_r_mean': 28.559242969478873, 'r_mean': 31.1762881514061}
    Epoch: 17 iteration: 0 lr: 0.00012485280483142487 loss: 0.16787829995155334
    Epoch: 17 iteration: 50 lr: 0.00012485280483142487 loss: 0.16973815858364105
    Epoch: 17 iteration: 100 lr: 0.00012485280483142487 loss: 0.17559704184532166
    Epoch: 17 iteration: 150 lr: 0.00012485280483142487 loss: 0.19280369579792023
    Epoch: 17 iteration: 200 lr: 0.00012485280483142487 loss: 0.18810811638832092
    Epoch: 17 iteration: 250 lr: 0.00012485280483142487 loss: 0.1578725427389145
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 17
    coco val: {'txt_r1': 17.0, 'txt_r5': 39.04, 'txt_r10': 51.44, 'txt_r_mean': 35.82666666666666, 'img_r1': 12.343062774890043, 'img_r5': 30.735705717712914, 'img_r10': 42.03118752499, 'img_r_mean': 28.369985339197655, 'r_mean': 32.098326002932154}
    coco test: {'txt_r1': 15.72, 'txt_r5': 36.94, 'txt_r10': 50.26, 'txt_r_mean': 34.306666666666665, 'img_r1': 12.02718912435026, 'img_r5': 30.567772890843663, 'img_r10': 42.139144342263094, 'img_r_mean': 28.24470211915234, 'r_mean': 31.275684392909504}
    Epoch: 18 iteration: 0 lr: 0.00011019253581563262 loss: 0.17534607648849487
    Epoch: 18 iteration: 50 lr: 0.00011019253581563262 loss: 0.19806219637393951
    Epoch: 18 iteration: 100 lr: 0.00011019253581563262 loss: 0.16321659088134766
    Epoch: 18 iteration: 150 lr: 0.00011019253581563262 loss: 0.15023337304592133
    Epoch: 18 iteration: 200 lr: 0.00011019253581563262 loss: 0.14350810647010803
    Epoch: 18 iteration: 250 lr: 0.00011019253581563262 loss: 0.1909620463848114
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 18
    coco val: {'txt_r1': 15.72, 'txt_r5': 37.14, 'txt_r10': 50.2, 'txt_r_mean': 34.35333333333333, 'img_r1': 12.00719712115154, 'img_r5': 30.091963214714113, 'img_r10': 41.819272291083564, 'img_r_mean': 27.972810875649742, 'r_mean': 31.163072104491537}
    coco test: {'txt_r1': 14.98, 'txt_r5': 35.98, 'txt_r10': 48.94, 'txt_r_mean': 33.3, 'img_r1': 11.603358656537385, 'img_r5': 30.307876849260296, 'img_r10': 41.92323070771691, 'img_r_mean': 27.94482207117153, 'r_mean': 30.622411035585763}
    Epoch: 19 iteration: 0 lr: 9.602318675400897e-05 loss: 0.17000208795070648
    Epoch: 19 iteration: 50 lr: 9.602318675400897e-05 loss: 0.14290763437747955
    Epoch: 19 iteration: 100 lr: 9.602318675400897e-05 loss: 0.1349085569381714
    Epoch: 19 iteration: 150 lr: 9.602318675400897e-05 loss: 0.15767492353916168
    Epoch: 19 iteration: 200 lr: 9.602318675400897e-05 loss: 0.15036305785179138
    Epoch: 19 iteration: 250 lr: 9.602318675400897e-05 loss: 0.17334865033626556
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 19
    coco val: {'txt_r1': 16.64, 'txt_r5': 38.02, 'txt_r10': 50.84, 'txt_r_mean': 35.166666666666664, 'img_r1': 12.287085165933627, 'img_r5': 30.635745701719312, 'img_r10': 42.091163534586165, 'img_r_mean': 28.3379981340797, 'r_mean': 31.752332400373184}
    coco test: {'txt_r1': 15.64, 'txt_r5': 37.7, 'txt_r10': 49.62, 'txt_r_mean': 34.32, 'img_r1': 12.059176329468213, 'img_r5': 30.743702518992404, 'img_r10': 42.21111555377849, 'img_r_mean': 28.3379981340797, 'r_mean': 31.328999067039852}
    Epoch: 20 iteration: 0 lr: 8.250000000000001e-05 loss: 0.14072063565254211
    Epoch: 20 iteration: 50 lr: 8.250000000000001e-05 loss: 0.12933437526226044
    Epoch: 20 iteration: 100 lr: 8.250000000000001e-05 loss: 0.20693959295749664
    Epoch: 20 iteration: 150 lr: 8.250000000000001e-05 loss: 0.15231087803840637
    Epoch: 20 iteration: 200 lr: 8.250000000000001e-05 loss: 0.15985363721847534
    Epoch: 20 iteration: 250 lr: 8.250000000000001e-05 loss: 0.14119693636894226
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 20
    coco val: {'txt_r1': 16.54, 'txt_r5': 39.04, 'txt_r10': 50.56, 'txt_r_mean': 35.38, 'img_r1': 12.774890043982406, 'img_r5': 31.20751699320272, 'img_r10': 43.06277489004398, 'img_r_mean': 29.015060642409704, 'r_mean': 32.19753032120485}
    coco test: {'txt_r1': 15.62, 'txt_r5': 38.28, 'txt_r10': 51.2, 'txt_r_mean': 35.03333333333333, 'img_r1': 12.263094762095163, 'img_r5': 31.47141143542583, 'img_r10': 43.0187924830068, 'img_r_mean': 28.9177662268426, 'r_mean': 31.975549780087967}
    Epoch: 21 iteration: 0 lr: 6.97711384175914e-05 loss: 0.12423430383205414
    Epoch: 21 iteration: 50 lr: 6.97711384175914e-05 loss: 0.13421592116355896
    Epoch: 21 iteration: 100 lr: 6.97711384175914e-05 loss: 0.09904897212982178
    Epoch: 21 iteration: 150 lr: 6.97711384175914e-05 loss: 0.11255185306072235
    Epoch: 21 iteration: 200 lr: 6.97711384175914e-05 loss: 0.14298436045646667
    Epoch: 21 iteration: 250 lr: 6.97711384175914e-05 loss: 0.13077646493911743
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 21
    coco val: {'txt_r1': 17.24, 'txt_r5': 38.74, 'txt_r10': 50.88, 'txt_r_mean': 35.620000000000005, 'img_r1': 12.558976409436225, 'img_r5': 31.331467413034787, 'img_r10': 42.630947620951616, 'img_r_mean': 28.84046381447421, 'r_mean': 32.230231907237105}
    coco test: {'txt_r1': 15.84, 'txt_r5': 38.24, 'txt_r10': 51.44, 'txt_r_mean': 35.17333333333333, 'img_r1': 12.526989204318273, 'img_r5': 31.36745301879248, 'img_r10': 42.998800479808075, 'img_r_mean': 28.964414234306275, 'r_mean': 32.068873783819804}
    Epoch: 22 iteration: 0 lr: 5.797606207796559e-05 loss: 0.09781420230865479
    Epoch: 22 iteration: 50 lr: 5.797606207796559e-05 loss: 0.10436877608299255
    Epoch: 22 iteration: 100 lr: 5.797606207796559e-05 loss: 0.09954556077718735
    Epoch: 22 iteration: 150 lr: 5.797606207796559e-05 loss: 0.10239797830581665
    Epoch: 22 iteration: 200 lr: 5.797606207796559e-05 loss: 0.15317881107330322
    Epoch: 22 iteration: 250 lr: 5.797606207796559e-05 loss: 0.13270767033100128
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 22
    coco val: {'txt_r1': 17.24, 'txt_r5': 38.46, 'txt_r10': 52.06, 'txt_r_mean': 35.92, 'img_r1': 12.782886845261896, 'img_r5': 31.6953218712515, 'img_r10': 43.02678928428629, 'img_r_mean': 29.168332666933225, 'r_mean': 32.54416633346661}
    coco test: {'txt_r1': 15.6, 'txt_r5': 39.1, 'txt_r10': 50.8, 'txt_r_mean': 35.166666666666664, 'img_r1': 12.566973210715714, 'img_r5': 31.62734906037585, 'img_r10': 43.278688524590166, 'img_r_mean': 29.157670265227242, 'r_mean': 32.162168465946955}
    Epoch: 23 iteration: 0 lr: 4.724400030577786e-05 loss: 0.09777984768152237
    Epoch: 23 iteration: 50 lr: 4.724400030577786e-05 loss: 0.12258177995681763
    Epoch: 23 iteration: 100 lr: 4.724400030577786e-05 loss: 0.1060154139995575
    Epoch: 23 iteration: 150 lr: 4.724400030577786e-05 loss: 0.13091956079006195
    Epoch: 23 iteration: 200 lr: 4.724400030577786e-05 loss: 0.10514585673809052
    Epoch: 23 iteration: 250 lr: 4.724400030577786e-05 loss: 0.12769201397895813
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 23
    coco val: {'txt_r1': 16.48, 'txt_r5': 38.66, 'txt_r10': 51.64, 'txt_r_mean': 35.593333333333334, 'img_r1': 12.686925229908036, 'img_r5': 31.591363454618154, 'img_r10': 43.114754098360656, 'img_r_mean': 29.131014260962285, 'r_mean': 32.36217379714781}
    coco test: {'txt_r1': 15.62, 'txt_r5': 38.06, 'txt_r10': 51.1, 'txt_r_mean': 34.92666666666667, 'img_r1': 12.538984406237505, 'img_r5': 31.74330267892843, 'img_r10': 43.442622950819676, 'img_r_mean': 29.241636678661866, 'r_mean': 32.08415167266427}
    Epoch: 24 iteration: 0 lr: 3.769253581563263e-05 loss: 0.08650655299425125
    Epoch: 24 iteration: 50 lr: 3.769253581563263e-05 loss: 0.10609667003154755
    Epoch: 24 iteration: 100 lr: 3.769253581563263e-05 loss: 0.10544316470623016
    Epoch: 24 iteration: 150 lr: 3.769253581563263e-05 loss: 0.08425739407539368
    Epoch: 24 iteration: 200 lr: 3.769253581563263e-05 loss: 0.11596322059631348
    Epoch: 24 iteration: 250 lr: 3.769253581563263e-05 loss: 0.12456141412258148
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 24
    coco val: {'txt_r1': 16.9, 'txt_r5': 39.34, 'txt_r10': 52.24, 'txt_r_mean': 36.160000000000004, 'img_r1': 12.730907636945222, 'img_r5': 31.463414634146343, 'img_r10': 43.2906837265094, 'img_r_mean': 29.161668665866987, 'r_mean': 32.6608343329335}
    coco test: {'txt_r1': 16.0, 'txt_r5': 38.64, 'txt_r10': 51.36, 'txt_r_mean': 35.333333333333336, 'img_r1': 12.566973210715714, 'img_r5': 31.815273890443823, 'img_r10': 43.59456217512995, 'img_r_mean': 29.325603092096497, 'r_mean': 32.329468212714914}
    Epoch: 25 iteration: 0 lr: 2.9426316451256386e-05 loss: 0.11349457502365112
    Epoch: 25 iteration: 50 lr: 2.9426316451256386e-05 loss: 0.08233440667390823
    Epoch: 25 iteration: 100 lr: 2.9426316451256386e-05 loss: 0.09436212480068207
    Epoch: 25 iteration: 150 lr: 2.9426316451256386e-05 loss: 0.0920330286026001
    Epoch: 25 iteration: 200 lr: 2.9426316451256386e-05 loss: 0.08613620698451996
    Epoch: 25 iteration: 250 lr: 2.9426316451256386e-05 loss: 0.0929696261882782
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 25
    coco val: {'txt_r1': 17.26, 'txt_r5': 39.76, 'txt_r10': 52.34, 'txt_r_mean': 36.45333333333333, 'img_r1': 12.69092363054778, 'img_r5': 31.43142742902839, 'img_r10': 43.04278288684526, 'img_r_mean': 29.055044648807144, 'r_mean': 32.75418899107024}
    coco test: {'txt_r1': 15.84, 'txt_r5': 39.3, 'txt_r10': 51.54, 'txt_r_mean': 35.56, 'img_r1': 12.670931627349061, 'img_r5': 31.54338264694122, 'img_r10': 43.006797281087564, 'img_r_mean': 29.073703851792615, 'r_mean': 32.31685192589631}
    Epoch: 26 iteration: 0 lr: 2.2535908641822855e-05 loss: 0.07479941099882126
    Epoch: 26 iteration: 50 lr: 2.2535908641822855e-05 loss: 0.08746127784252167
    Epoch: 26 iteration: 100 lr: 2.2535908641822855e-05 loss: 0.10455113649368286
    Epoch: 26 iteration: 150 lr: 2.2535908641822855e-05 loss: 0.09784542769193649
    Epoch: 26 iteration: 200 lr: 2.2535908641822855e-05 loss: 0.06572966277599335
    Epoch: 26 iteration: 250 lr: 2.2535908641822855e-05 loss: 0.09240047633647919
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 26
    coco val: {'txt_r1': 16.88, 'txt_r5': 39.52, 'txt_r10': 52.32, 'txt_r_mean': 36.24, 'img_r1': 12.794882047181128, 'img_r5': 31.70731707317073, 'img_r10': 43.29468212714914, 'img_r_mean': 29.265627082500334, 'r_mean': 32.75281354125017}
    coco test: {'txt_r1': 15.76, 'txt_r5': 38.96, 'txt_r10': 51.74, 'txt_r_mean': 35.48666666666667, 'img_r1': 12.794882047181128, 'img_r5': 31.851259496201518, 'img_r10': 43.63454618152739, 'img_r_mean': 29.426895908303347, 'r_mean': 32.45678128748501}
    Epoch: 27 iteration: 0 lr: 1.7096805137202738e-05 loss: 0.07049550861120224
    Epoch: 27 iteration: 50 lr: 1.7096805137202738e-05 loss: 0.08527995645999908
    Epoch: 27 iteration: 100 lr: 1.7096805137202738e-05 loss: 0.07916025817394257
    Epoch: 27 iteration: 150 lr: 1.7096805137202738e-05 loss: 0.0926615446805954
    Epoch: 27 iteration: 200 lr: 1.7096805137202738e-05 loss: 0.062070801854133606
    Epoch: 27 iteration: 250 lr: 1.7096805137202738e-05 loss: 0.06778311729431152
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 27
    coco val: {'txt_r1': 16.62, 'txt_r5': 39.08, 'txt_r10': 51.5, 'txt_r_mean': 35.733333333333334, 'img_r1': 12.854858056777289, 'img_r5': 31.679328268692522, 'img_r10': 43.238704518192726, 'img_r_mean': 29.257630281220845, 'r_mean': 32.49548180727709}
    coco test: {'txt_r1': 15.62, 'txt_r5': 38.3, 'txt_r10': 51.6, 'txt_r_mean': 35.17333333333333, 'img_r1': 12.870851659336266, 'img_r5': 31.835265893642543, 'img_r10': 43.57856857257097, 'img_r_mean': 29.428228708516595, 'r_mean': 32.300781020924966}
    Epoch: 28 iteration: 0 lr: 1.3168597893598175e-05 loss: 0.08952151238918304
    Epoch: 28 iteration: 50 lr: 1.3168597893598175e-05 loss: 0.08497560024261475
    Epoch: 28 iteration: 100 lr: 1.3168597893598175e-05 loss: 0.09802306443452835
    Epoch: 28 iteration: 150 lr: 1.3168597893598175e-05 loss: 0.10137701034545898
    Epoch: 28 iteration: 200 lr: 1.3168597893598175e-05 loss: 0.08434905111789703
    Epoch: 28 iteration: 250 lr: 1.3168597893598175e-05 loss: 0.07585834711790085
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 28
    coco val: {'txt_r1': 17.42, 'txt_r5': 39.7, 'txt_r10': 52.34, 'txt_r_mean': 36.48666666666667, 'img_r1': 13.066773290683727, 'img_r5': 31.999200319872052, 'img_r10': 43.47061175529788, 'img_r_mean': 29.51219512195122, 'r_mean': 32.999430894308944}
    coco test: {'txt_r1': 16.02, 'txt_r5': 39.16, 'txt_r10': 52.3, 'txt_r_mean': 35.82666666666666, 'img_r1': 12.938824470211916, 'img_r5': 32.11515393842463, 'img_r10': 43.874450219912035, 'img_r_mean': 29.64280954284953, 'r_mean': 32.734738104758094}
    Epoch: 29 iteration: 0 lr: 1.0794325171600358e-05 loss: 0.0904349684715271
    Epoch: 29 iteration: 50 lr: 1.0794325171600358e-05 loss: 0.0633661150932312
    Epoch: 29 iteration: 100 lr: 1.0794325171600358e-05 loss: 0.06782661378383636
    Epoch: 29 iteration: 150 lr: 1.0794325171600358e-05 loss: 0.0833449587225914
    Epoch: 29 iteration: 200 lr: 1.0794325171600358e-05 loss: 0.09229975193738937
    Epoch: 29 iteration: 250 lr: 1.0794325171600358e-05 loss: 0.08226582407951355
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 29
    coco val: {'txt_r1': 16.82, 'txt_r5': 39.46, 'txt_r10': 52.16, 'txt_r_mean': 36.14666666666667, 'img_r1': 12.922830867652939, 'img_r5': 31.747301079568174, 'img_r10': 43.45061975209916, 'img_r_mean': 29.373583899773422, 'r_mean': 32.76012528322005}
    coco test: {'txt_r1': 15.9, 'txt_r5': 38.66, 'txt_r10': 51.54, 'txt_r_mean': 35.36666666666667, 'img_r1': 12.802878848460615, 'img_r5': 31.847261095561777, 'img_r10': 43.514594162335065, 'img_r_mean': 29.388244702119152, 'r_mean': 32.37745568439291}


Launch training and evaluation for iSogCLR
------------------------------------------

.. code:: python

    # create the model and wrap it in DDP
    tokenizer = AutoTokenizer.from_pretrained(text_encoder, local_files_only=False)
    model = Model(image_encoder=image_encoder, text_encoder=text_encoder, embed_dim=embed_dim, 
                  init_model=True, bsz=batch_size_train, loss_type='isogclr', 
                  gamma=gamma, temp=temp, rho=rho, eta=eta, tau_init=tau_init, beta_u=beta_u)
    
    model = model.cuda()


.. code:: python

    if n_gpus > 1:
        print("Using", n_gpus, "GPUs")
        model = nn.DataParallel(model)


.. code:: python

    # set up the optimizer and objective function
    optimizer = create_optimizer(model, opt, weight_decay)
    lr_scheduler = create_scheduler(optimizer)
    
    if use_amp:
        grad_scaler = torch.cuda.amp.GradScaler()
    else:
        grad_scaler = None
    
    # training loop
    for epoch in range(0, epochs):
        train_stats = epoch_train(model, train_loader, optimizer, tokenizer, epoch, epochs, 
                                  warmup_epochs, torch.device('cuda'), lr_scheduler, grad_scaler)
    
        # evaluate the model on ms-coco data
        try:
            # for distributed training
            score_val_i2t_coco, score_val_t2i_coco = evaluation(model.module, val_loader, tokenizer,  torch.device('cuda'))
            score_test_i2t_coco, score_test_t2i_coco = evaluation(model.module, test_loader, tokenizer,  torch.device('cuda'))
        except:
            # for non-distributed training
            score_val_i2t_coco, score_val_t2i_coco = evaluation(model, val_loader, tokenizer,  torch.device('cuda'))
            score_test_i2t_coco, score_test_t2i_coco = evaluation(model, test_loader, tokenizer,  torch.device('cuda'))     
        print("Epoch:", epoch)
        val_result_coco = itm_eval(score_val_i2t_coco, score_val_t2i_coco, val_loader.dataset.txt2img, val_loader.dataset.img2txt)  
        print("coco val:", val_result_coco)
        test_result_coco = itm_eval(score_test_i2t_coco, score_test_t2i_coco, test_loader.dataset.txt2img, test_loader.dataset.img2txt)    
        print("coco test:", test_result_coco)
        
        lr_scheduler.step(epoch+warmup_epochs+1)


.. parsed-literal::

    Epoch: 0 iteration: 0 lr: 1e-05 loss: 24.701007843017578
    tau_img: 0.0100, tau_txt: 0.0100
    Epoch: 0 iteration: 50 lr: 1e-05 loss: 10.574981689453125
    tau_img: 0.0100, tau_txt: 0.0100
    Epoch: 0 iteration: 100 lr: 2.45e-05 loss: 4.697925567626953
    tau_img: 0.0100, tau_txt: 0.0100
    Epoch: 0 iteration: 150 lr: 2.45e-05 loss: 1.9576847553253174
    tau_img: 0.0100, tau_txt: 0.0100
    Epoch: 0 iteration: 200 lr: 3.899999999999999e-05 loss: 1.0460829734802246
    tau_img: 0.0100, tau_txt: 0.0100
    Epoch: 0 iteration: 250 lr: 3.899999999999999e-05 loss: 0.5043810606002808
    tau_img: 0.0100, tau_txt: 0.0100
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 0
    coco val: {'txt_r1': 4.1, 'txt_r5': 13.8, 'txt_r10': 21.34, 'txt_r_mean': 13.079999999999998, 'img_r1': 2.0591763294682126, 'img_r5': 7.860855657736905, 'img_r10': 13.13874450219912, 'img_r_mean': 7.686258829801413, 'r_mean': 10.383129414900706}
    coco test: {'txt_r1': 4.2, 'txt_r5': 12.7, 'txt_r10': 20.2, 'txt_r_mean': 12.366666666666665, 'img_r1': 1.9832067173130747, 'img_r5': 7.493002798880448, 'img_r10': 12.950819672131148, 'img_r_mean': 7.4756763961082235, 'r_mean': 9.921171531387444}
    Epoch: 1 iteration: 0 lr: 0.0002992056748283996 loss: 1.3195196390151978
    tau_img: 0.0094, tau_txt: 0.0095
    Epoch: 1 iteration: 50 lr: 0.0002992056748283996 loss: 0.075884610414505
    tau_img: 0.0094, tau_txt: 0.0095
    Epoch: 1 iteration: 100 lr: 0.0002992056748283996 loss: 0.3162369430065155
    tau_img: 0.0094, tau_txt: 0.0095
    Epoch: 1 iteration: 150 lr: 0.0002992056748283996 loss: 0.1882624328136444
    tau_img: 0.0094, tau_txt: 0.0095
    Epoch: 1 iteration: 200 lr: 0.0002992056748283996 loss: -0.10296255350112915
    tau_img: 0.0094, tau_txt: 0.0095
    Epoch: 1 iteration: 250 lr: 0.0002992056748283996 loss: 0.15444990992546082
    tau_img: 0.0094, tau_txt: 0.0095
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 1
    coco val: {'txt_r1': 12.22, 'txt_r5': 28.74, 'txt_r10': 40.32, 'txt_r_mean': 27.093333333333334, 'img_r1': 5.881647341063575, 'img_r5': 18.10075969612155, 'img_r10': 27.608956417433028, 'img_r_mean': 17.197121151539385, 'r_mean': 22.14522724243636}
    coco test: {'txt_r1': 11.34, 'txt_r5': 29.4, 'txt_r10': 40.32, 'txt_r_mean': 27.02, 'img_r1': 5.593762495001999, 'img_r5': 18.376649340263896, 'img_r10': 27.984806077568972, 'img_r_mean': 17.318405970944955, 'r_mean': 22.169202985472477}
    Epoch: 2 iteration: 0 lr: 0.0002968314021064018 loss: -0.0604383647441864
    tau_img: 0.0088, tau_txt: 0.0088
    Epoch: 2 iteration: 50 lr: 0.0002968314021064018 loss: 0.23243539035320282
    tau_img: 0.0088, tau_txt: 0.0088
    Epoch: 2 iteration: 100 lr: 0.0002968314021064018 loss: 0.04821205139160156
    tau_img: 0.0088, tau_txt: 0.0088
    Epoch: 2 iteration: 150 lr: 0.0002968314021064018 loss: 0.21965868771076202
    tau_img: 0.0088, tau_txt: 0.0088
    Epoch: 2 iteration: 200 lr: 0.0002968314021064018 loss: 0.05134771019220352
    tau_img: 0.0088, tau_txt: 0.0088
    Epoch: 2 iteration: 250 lr: 0.0002968314021064018 loss: 0.1536252200603485
    tau_img: 0.0088, tau_txt: 0.0088
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 2
    coco val: {'txt_r1': 14.64, 'txt_r5': 35.0, 'txt_r10': 46.5, 'txt_r_mean': 32.04666666666667, 'img_r1': 7.97281087564974, 'img_r5': 22.898840463814473, 'img_r10': 33.77049180327869, 'img_r_mean': 21.547381047580966, 'r_mean': 26.79702385712382}
    coco test: {'txt_r1': 15.14, 'txt_r5': 34.42, 'txt_r10': 46.54, 'txt_r_mean': 32.03333333333333, 'img_r1': 8.388644542183126, 'img_r5': 23.594562175129948, 'img_r10': 34.406237504998, 'img_r_mean': 22.12981474077036, 'r_mean': 27.081574037051844}
    Epoch: 3 iteration: 0 lr: 0.00029290319486279724 loss: -0.29481595754623413
    tau_img: 0.0083, tau_txt: 0.0081
    Epoch: 3 iteration: 50 lr: 0.00029290319486279724 loss: 0.06638230383396149
    tau_img: 0.0083, tau_txt: 0.0081
    Epoch: 3 iteration: 100 lr: 0.00029290319486279724 loss: 0.03567551076412201
    tau_img: 0.0083, tau_txt: 0.0082
    Epoch: 3 iteration: 150 lr: 0.00029290319486279724 loss: 0.05767179653048515
    tau_img: 0.0083, tau_txt: 0.0081
    Epoch: 3 iteration: 200 lr: 0.00029290319486279724 loss: 0.056682661175727844
    tau_img: 0.0083, tau_txt: 0.0082
    Epoch: 3 iteration: 250 lr: 0.00029290319486279724 loss: 0.28257113695144653
    tau_img: 0.0083, tau_txt: 0.0082
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 3
    coco val: {'txt_r1': 15.9, 'txt_r5': 37.2, 'txt_r10': 49.18, 'txt_r_mean': 34.093333333333334, 'img_r1': 9.70811675329868, 'img_r5': 26.3734506197521, 'img_r10': 37.31707317073171, 'img_r_mean': 24.466213514594163, 'r_mean': 29.279773423963746}
    coco test: {'txt_r1': 15.52, 'txt_r5': 37.28, 'txt_r10': 48.94, 'txt_r_mean': 33.913333333333334, 'img_r1': 9.660135945621752, 'img_r5': 26.66533386645342, 'img_r10': 37.49300279888045, 'img_r_mean': 24.606157536985204, 'r_mean': 29.259745435159267}
    Epoch: 4 iteration: 0 lr: 0.00028746409135817707 loss: -0.2583860158920288
    tau_img: 0.0079, tau_txt: 0.0077
    Epoch: 4 iteration: 50 lr: 0.00028746409135817707 loss: 0.04029808193445206
    tau_img: 0.0079, tau_txt: 0.0076
    Epoch: 4 iteration: 100 lr: 0.00028746409135817707 loss: 0.11739009618759155
    tau_img: 0.0079, tau_txt: 0.0076
    Epoch: 4 iteration: 150 lr: 0.00028746409135817707 loss: 0.32731348276138306
    tau_img: 0.0079, tau_txt: 0.0076
    Epoch: 4 iteration: 200 lr: 0.00028746409135817707 loss: -0.00629810243844986
    tau_img: 0.0079, tau_txt: 0.0076
    Epoch: 4 iteration: 250 lr: 0.00028746409135817707 loss: 0.15173837542533875
    tau_img: 0.0079, tau_txt: 0.0076
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 4
    coco val: {'txt_r1': 17.16, 'txt_r5': 38.44, 'txt_r10': 50.34, 'txt_r_mean': 35.31333333333333, 'img_r1': 10.903638544582167, 'img_r5': 27.86485405837665, 'img_r10': 39.40423830467813, 'img_r_mean': 26.057576969212317, 'r_mean': 30.685455151272826}
    coco test: {'txt_r1': 17.0, 'txt_r5': 37.84, 'txt_r10': 50.16, 'txt_r_mean': 35.0, 'img_r1': 10.415833666533386, 'img_r5': 28.58856457417033, 'img_r10': 40.26789284286286, 'img_r_mean': 26.424097027855524, 'r_mean': 30.712048513927762}
    Epoch: 5 iteration: 0 lr: 0.0002805736835487436 loss: -0.4848897457122803
    tau_img: 0.0075, tau_txt: 0.0072
    Epoch: 5 iteration: 50 lr: 0.0002805736835487436 loss: 0.06531377136707306
    tau_img: 0.0075, tau_txt: 0.0072
    Epoch: 5 iteration: 100 lr: 0.0002805736835487436 loss: 0.09321524202823639
    tau_img: 0.0075, tau_txt: 0.0072
    Epoch: 5 iteration: 150 lr: 0.0002805736835487436 loss: 0.218039870262146
    tau_img: 0.0075, tau_txt: 0.0073
    Epoch: 5 iteration: 200 lr: 0.0002805736835487436 loss: 0.1558637171983719
    tau_img: 0.0075, tau_txt: 0.0072
    Epoch: 5 iteration: 250 lr: 0.0002805736835487436 loss: -0.09588228911161423
    tau_img: 0.0075, tau_txt: 0.0072
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 5
    coco val: {'txt_r1': 18.54, 'txt_r5': 40.0, 'txt_r10': 51.6, 'txt_r_mean': 36.71333333333333, 'img_r1': 11.015593762495001, 'img_r5': 28.984406237505, 'img_r10': 40.42782886845262, 'img_r_mean': 26.809276289484206, 'r_mean': 31.76130481140877}
    coco test: {'txt_r1': 16.56, 'txt_r5': 38.8, 'txt_r10': 51.22, 'txt_r_mean': 35.526666666666664, 'img_r1': 11.107556977209116, 'img_r5': 29.072371051579367, 'img_r10': 40.77169132347061, 'img_r_mean': 26.983873117419694, 'r_mean': 31.25526989204318}
    Epoch: 6 iteration: 0 lr: 0.0002723074641843674 loss: -0.5769622325897217
    tau_img: 0.0072, tau_txt: 0.0069
    Epoch: 6 iteration: 50 lr: 0.0002723074641843674 loss: 0.37227633595466614
    tau_img: 0.0072, tau_txt: 0.0069
    Epoch: 6 iteration: 100 lr: 0.0002723074641843674 loss: 0.06294765323400497
    tau_img: 0.0072, tau_txt: 0.0069
    Epoch: 6 iteration: 150 lr: 0.0002723074641843674 loss: -0.028086403384804726
    tau_img: 0.0072, tau_txt: 0.0069
    Epoch: 6 iteration: 200 lr: 0.0002723074641843674 loss: 0.08182275295257568
    tau_img: 0.0072, tau_txt: 0.0069
    Epoch: 6 iteration: 250 lr: 0.0002723074641843674 loss: 0.16375750303268433
    tau_img: 0.0072, tau_txt: 0.0069
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 6
    coco val: {'txt_r1': 18.02, 'txt_r5': 40.82, 'txt_r10': 53.12, 'txt_r_mean': 37.32, 'img_r1': 11.431427429028389, 'img_r5': 29.748100759696122, 'img_r10': 41.47940823670532, 'img_r_mean': 27.55297880847661, 'r_mean': 32.4364894042383}
    coco test: {'txt_r1': 17.68, 'txt_r5': 40.18, 'txt_r10': 52.56, 'txt_r_mean': 36.806666666666665, 'img_r1': 11.75529788084766, 'img_r5': 30.151939224310276, 'img_r10': 41.89924030387845, 'img_r_mean': 27.935492469678792, 'r_mean': 32.37107956817273}
    Epoch: 7 iteration: 0 lr: 0.00026275599969422214 loss: -0.4518427550792694
    tau_img: 0.0070, tau_txt: 0.0067
    Epoch: 7 iteration: 50 lr: 0.00026275599969422214 loss: 0.2819710075855255
    tau_img: 0.0070, tau_txt: 0.0067
    Epoch: 7 iteration: 100 lr: 0.00026275599969422214 loss: 0.05290326103568077
    tau_img: 0.0070, tau_txt: 0.0067
    Epoch: 7 iteration: 150 lr: 0.00026275599969422214 loss: -0.008920110762119293
    tau_img: 0.0070, tau_txt: 0.0067
    Epoch: 7 iteration: 200 lr: 0.00026275599969422214 loss: 0.2930781841278076
    tau_img: 0.0070, tau_txt: 0.0067
    Epoch: 7 iteration: 250 lr: 0.00026275599969422214 loss: 0.14736725389957428
    tau_img: 0.0070, tau_txt: 0.0067
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 7
    coco val: {'txt_r1': 17.88, 'txt_r5': 40.54, 'txt_r10': 52.78, 'txt_r_mean': 37.06666666666667, 'img_r1': 11.571371451419433, 'img_r5': 30.023990403838464, 'img_r10': 41.543382646941225, 'img_r_mean': 27.71291483406638, 'r_mean': 32.38979075036652}
    coco test: {'txt_r1': 18.14, 'txt_r5': 39.58, 'txt_r10': 51.58, 'txt_r_mean': 36.43333333333333, 'img_r1': 12.167133146741303, 'img_r5': 30.851659336265495, 'img_r10': 42.4390243902439, 'img_r_mean': 28.485938957750232, 'r_mean': 32.45963614554178}
    Epoch: 8 iteration: 0 lr: 0.0002520239379220344 loss: -0.36706972122192383
    tau_img: 0.0068, tau_txt: 0.0065
    Epoch: 8 iteration: 50 lr: 0.0002520239379220344 loss: -0.229108527302742
    tau_img: 0.0068, tau_txt: 0.0065
    Epoch: 8 iteration: 100 lr: 0.0002520239379220344 loss: 0.31043940782546997
    tau_img: 0.0068, tau_txt: 0.0065
    Epoch: 8 iteration: 150 lr: 0.0002520239379220344 loss: 0.00404047966003418
    tau_img: 0.0069, tau_txt: 0.0066
    Epoch: 8 iteration: 200 lr: 0.0002520239379220344 loss: -0.24809685349464417
    tau_img: 0.0069, tau_txt: 0.0066
    Epoch: 8 iteration: 250 lr: 0.0002520239379220344 loss: -0.2770186960697174
    tau_img: 0.0068, tau_txt: 0.0065
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 8
    coco val: {'txt_r1': 16.92, 'txt_r5': 38.66, 'txt_r10': 51.2, 'txt_r_mean': 35.593333333333334, 'img_r1': 11.38344662135146, 'img_r5': 29.760095961615352, 'img_r10': 41.63934426229508, 'img_r_mean': 27.5942956150873, 'r_mean': 31.59381447421032}
    coco test: {'txt_r1': 17.36, 'txt_r5': 38.22, 'txt_r10': 50.44, 'txt_r_mean': 35.339999999999996, 'img_r1': 11.82327069172331, 'img_r5': 30.3718512594962, 'img_r10': 41.74330267892843, 'img_r_mean': 27.979474876715983, 'r_mean': 31.65973743835799}
    Epoch: 9 iteration: 0 lr: 0.00024022886158240857 loss: -0.7354167699813843
    tau_img: 0.0067, tau_txt: 0.0064
    Epoch: 9 iteration: 50 lr: 0.00024022886158240857 loss: -0.14618906378746033
    tau_img: 0.0067, tau_txt: 0.0064
    Epoch: 9 iteration: 100 lr: 0.00024022886158240857 loss: 0.12334905564785004
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 9 iteration: 150 lr: 0.00024022886158240857 loss: -0.45143190026283264
    tau_img: 0.0067, tau_txt: 0.0065
    Epoch: 9 iteration: 200 lr: 0.00024022886158240857 loss: 0.06901969015598297
    tau_img: 0.0067, tau_txt: 0.0065
    Epoch: 9 iteration: 250 lr: 0.00024022886158240857 loss: 0.02915862947702408
    tau_img: 0.0067, tau_txt: 0.0064
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 9
    coco val: {'txt_r1': 17.24, 'txt_r5': 39.68, 'txt_r10': 52.52, 'txt_r_mean': 36.48, 'img_r1': 11.943222710915634, 'img_r5': 30.279888044782087, 'img_r10': 42.059176329468215, 'img_r_mean': 28.094095695055312, 'r_mean': 32.28704784752765}
    coco test: {'txt_r1': 17.64, 'txt_r5': 39.44, 'txt_r10': 50.9, 'txt_r_mean': 35.99333333333333, 'img_r1': 11.975209916033586, 'img_r5': 30.463814474210317, 'img_r10': 41.97920831667333, 'img_r_mean': 28.13941090230574, 'r_mean': 32.06637211781954}
    Epoch: 10 iteration: 0 lr: 0.00022749999999999997 loss: -0.9465005993843079
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 10 iteration: 50 lr: 0.00022749999999999997 loss: -0.1919674426317215
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 10 iteration: 100 lr: 0.00022749999999999997 loss: 0.0656488761305809
    tau_img: 0.0066, tau_txt: 0.0063
    Epoch: 10 iteration: 150 lr: 0.00022749999999999997 loss: 0.15473569929599762
    tau_img: 0.0066, tau_txt: 0.0063
    Epoch: 10 iteration: 200 lr: 0.00022749999999999997 loss: 0.048671215772628784
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 10 iteration: 250 lr: 0.00022749999999999997 loss: 0.05919775739312172
    tau_img: 0.0066, tau_txt: 0.0063
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 10
    coco val: {'txt_r1': 17.54, 'txt_r5': 39.96, 'txt_r10': 52.46, 'txt_r_mean': 36.653333333333336, 'img_r1': 12.039184326269492, 'img_r5': 30.89564174330268, 'img_r10': 42.55897640943623, 'img_r_mean': 28.497934159669466, 'r_mean': 32.5756337465014}
    coco test: {'txt_r1': 17.24, 'txt_r5': 38.94, 'txt_r10': 51.24, 'txt_r_mean': 35.806666666666665, 'img_r1': 12.191123550579768, 'img_r5': 30.947620951619353, 'img_r10': 42.958816473410636, 'img_r_mean': 28.69918699186992, 'r_mean': 32.25292682926829}
    Epoch: 11 iteration: 0 lr: 0.00021397681324599103 loss: -0.8527200222015381
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 11 iteration: 50 lr: 0.00021397681324599103 loss: -0.310724675655365
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 11 iteration: 100 lr: 0.00021397681324599103 loss: -0.18071337044239044
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 11 iteration: 150 lr: 0.00021397681324599103 loss: -0.15896828472614288
    tau_img: 0.0067, tau_txt: 0.0064
    Epoch: 11 iteration: 200 lr: 0.00021397681324599103 loss: 0.125459223985672
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 11 iteration: 250 lr: 0.00021397681324599103 loss: 0.005948394536972046
    tau_img: 0.0066, tau_txt: 0.0064
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 11
    coco val: {'txt_r1': 18.22, 'txt_r5': 40.72, 'txt_r10': 53.08, 'txt_r_mean': 37.339999999999996, 'img_r1': 12.367053178728508, 'img_r5': 31.231507397041185, 'img_r10': 42.890843662534984, 'img_r_mean': 28.829801412768223, 'r_mean': 33.08490070638411}
    coco test: {'txt_r1': 19.12, 'txt_r5': 40.38, 'txt_r10': 52.52, 'txt_r_mean': 37.34, 'img_r1': 12.29908036785286, 'img_r5': 31.215513794482206, 'img_r10': 43.082766893242706, 'img_r_mean': 28.865787018525925, 'r_mean': 33.10289350926296}
    Epoch: 12 iteration: 0 lr: 0.00019980746418436736 loss: -0.8759943246841431
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 12 iteration: 50 lr: 0.00019980746418436736 loss: -0.6733912229537964
    tau_img: 0.0067, tau_txt: 0.0064
    Epoch: 12 iteration: 100 lr: 0.00019980746418436736 loss: 0.007951691746711731
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 12 iteration: 150 lr: 0.00019980746418436736 loss: -0.27293896675109863
    tau_img: 0.0066, tau_txt: 0.0064
    Epoch: 12 iteration: 200 lr: 0.00019980746418436736 loss: -0.604184627532959
    tau_img: 0.0067, tau_txt: 0.0065
    Epoch: 12 iteration: 250 lr: 0.00019980746418436736 loss: -0.08432623744010925
    tau_img: 0.0066, tau_txt: 0.0064
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 12
    coco val: {'txt_r1': 18.26, 'txt_r5': 40.38, 'txt_r10': 53.12, 'txt_r_mean': 37.25333333333333, 'img_r1': 12.522990803678528, 'img_r5': 31.70731707317073, 'img_r10': 43.122750899640145, 'img_r_mean': 29.117686258829803, 'r_mean': 33.18550979608157}
    coco test: {'txt_r1': 17.34, 'txt_r5': 39.08, 'txt_r10': 52.32, 'txt_r_mean': 36.24666666666667, 'img_r1': 12.798880447820872, 'img_r5': 31.759296281487405, 'img_r10': 43.05477808876449, 'img_r_mean': 29.204318272690927, 'r_mean': 32.7254924696788}
    Epoch: 13 iteration: 0 lr: 0.00018514719516857505 loss: -1.3101189136505127
    tau_img: 0.0069, tau_txt: 0.0066
    Epoch: 13 iteration: 50 lr: 0.00018514719516857505 loss: -0.5373433828353882
    tau_img: 0.0068, tau_txt: 0.0065
    Epoch: 13 iteration: 100 lr: 0.00018514719516857505 loss: -0.2286771833896637
    tau_img: 0.0068, tau_txt: 0.0065
    Epoch: 13 iteration: 150 lr: 0.00018514719516857505 loss: -0.17678964138031006
    tau_img: 0.0067, tau_txt: 0.0064
    Epoch: 13 iteration: 200 lr: 0.00018514719516857505 loss: -0.24495404958724976
    tau_img: 0.0068, tau_txt: 0.0066
    Epoch: 13 iteration: 250 lr: 0.00018514719516857505 loss: -0.5934573411941528
    tau_img: 0.0068, tau_txt: 0.0066
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 13
    coco val: {'txt_r1': 19.1, 'txt_r5': 40.84, 'txt_r10': 53.04, 'txt_r_mean': 37.660000000000004, 'img_r1': 12.538984406237505, 'img_r5': 31.36345461815274, 'img_r10': 42.94282287085166, 'img_r_mean': 28.9484206317473, 'r_mean': 33.304210315873654}
    coco test: {'txt_r1': 18.26, 'txt_r5': 40.74, 'txt_r10': 53.12, 'txt_r_mean': 37.373333333333335, 'img_r1': 12.810875649740105, 'img_r5': 31.955217912834865, 'img_r10': 43.398640543782484, 'img_r_mean': 29.388244702119152, 'r_mean': 33.380789017726244}
    Epoch: 14 iteration: 0 lr: 0.00017015662717380974 loss: -1.136932611465454
    tau_img: 0.0069, tau_txt: 0.0067
    Epoch: 14 iteration: 50 lr: 0.00017015662717380974 loss: -1.2352209091186523
    tau_img: 0.0071, tau_txt: 0.0068
    Epoch: 14 iteration: 100 lr: 0.00017015662717380974 loss: -0.3656700551509857
    tau_img: 0.0069, tau_txt: 0.0067
    Epoch: 14 iteration: 150 lr: 0.00017015662717380974 loss: -0.7482412457466125
    tau_img: 0.0068, tau_txt: 0.0066
    Epoch: 14 iteration: 200 lr: 0.00017015662717380974 loss: -0.6269024014472961
    tau_img: 0.0070, tau_txt: 0.0068
    Epoch: 14 iteration: 250 lr: 0.00017015662717380974 loss: -0.8550422191619873
    tau_img: 0.0070, tau_txt: 0.0067
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 14
    coco val: {'txt_r1': 18.52, 'txt_r5': 39.8, 'txt_r10': 52.78, 'txt_r_mean': 37.03333333333333, 'img_r1': 12.758896441423431, 'img_r5': 32.279088364654136, 'img_r10': 44.17433026789284, 'img_r_mean': 29.73743835799014, 'r_mean': 33.38538584566174}
    coco test: {'txt_r1': 17.52, 'txt_r5': 39.96, 'txt_r10': 51.94, 'txt_r_mean': 36.473333333333336, 'img_r1': 12.902838864454218, 'img_r5': 31.887245101959216, 'img_r10': 43.750499800079965, 'img_r_mean': 29.513527922164467, 'r_mean': 32.9934306277489}
    Epoch: 15 iteration: 0 lr: 0.000155 loss: -1.8559613227844238
    tau_img: 0.0072, tau_txt: 0.0069
    Epoch: 15 iteration: 50 lr: 0.000155 loss: -1.2427170276641846
    tau_img: 0.0073, tau_txt: 0.0070
    Epoch: 15 iteration: 100 lr: 0.000155 loss: -1.1395246982574463
    tau_img: 0.0072, tau_txt: 0.0070
    Epoch: 15 iteration: 150 lr: 0.000155 loss: -1.4752817153930664
    tau_img: 0.0072, tau_txt: 0.0069
    Epoch: 15 iteration: 200 lr: 0.000155 loss: -1.8828952312469482
    tau_img: 0.0072, tau_txt: 0.0070
    Epoch: 15 iteration: 250 lr: 0.000155 loss: -1.181127905845642
    tau_img: 0.0072, tau_txt: 0.0070
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 15
    coco val: {'txt_r1': 19.9, 'txt_r5': 43.36, 'txt_r10': 55.22, 'txt_r_mean': 39.49333333333333, 'img_r1': 13.478608556577369, 'img_r5': 32.810875649740105, 'img_r10': 44.40223910435826, 'img_r_mean': 30.230574436891914, 'r_mean': 34.86195388511263}
    coco test: {'txt_r1': 19.58, 'txt_r5': 43.1, 'txt_r10': 54.84, 'txt_r_mean': 39.17333333333334, 'img_r1': 13.642542982806876, 'img_r5': 33.218712514994, 'img_r10': 44.718112754898044, 'img_r_mean': 30.526456084232976, 'r_mean': 34.849894708783154}
    Epoch: 16 iteration: 0 lr: 0.00013984337282619026 loss: -2.054107189178467
    tau_img: 0.0073, tau_txt: 0.0072
    Epoch: 16 iteration: 50 lr: 0.00013984337282619026 loss: -1.3603992462158203
    tau_img: 0.0073, tau_txt: 0.0071
    Epoch: 16 iteration: 100 lr: 0.00013984337282619026 loss: -1.8992851972579956
    tau_img: 0.0074, tau_txt: 0.0071
    Epoch: 16 iteration: 150 lr: 0.00013984337282619026 loss: -1.8692710399627686
    tau_img: 0.0074, tau_txt: 0.0072
    Epoch: 16 iteration: 200 lr: 0.00013984337282619026 loss: -1.7104038000106812
    tau_img: 0.0075, tau_txt: 0.0072
    Epoch: 16 iteration: 250 lr: 0.00013984337282619026 loss: -1.380126953125
    tau_img: 0.0073, tau_txt: 0.0071
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 16
    coco val: {'txt_r1': 20.58, 'txt_r5': 43.24, 'txt_r10': 55.3, 'txt_r_mean': 39.70666666666667, 'img_r1': 13.15873650539784, 'img_r5': 32.99480207916833, 'img_r10': 44.586165533786485, 'img_r_mean': 30.24656803945088, 'r_mean': 34.97661735305878}
    coco test: {'txt_r1': 19.36, 'txt_r5': 42.4, 'txt_r10': 54.48, 'txt_r_mean': 38.74666666666666, 'img_r1': 13.666533386645343, 'img_r5': 33.2546981207517, 'img_r10': 44.65413834466214, 'img_r_mean': 30.525123284019724, 'r_mean': 34.63589497534319}
    Epoch: 17 iteration: 0 lr: 0.00012485280483142487 loss: -2.5637669563293457
    tau_img: 0.0075, tau_txt: 0.0073
    Epoch: 17 iteration: 50 lr: 0.00012485280483142487 loss: -2.191415309906006
    tau_img: 0.0078, tau_txt: 0.0075
    Epoch: 17 iteration: 100 lr: 0.00012485280483142487 loss: -2.321763515472412
    tau_img: 0.0077, tau_txt: 0.0074
    Epoch: 17 iteration: 150 lr: 0.00012485280483142487 loss: -1.8449326753616333
    tau_img: 0.0075, tau_txt: 0.0073
    Epoch: 17 iteration: 200 lr: 0.00012485280483142487 loss: -2.31805157661438
    tau_img: 0.0077, tau_txt: 0.0075
    Epoch: 17 iteration: 250 lr: 0.00012485280483142487 loss: -2.372451066970825
    tau_img: 0.0075, tau_txt: 0.0073
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 17
    coco val: {'txt_r1': 19.98, 'txt_r5': 42.34, 'txt_r10': 54.76, 'txt_r_mean': 39.02666666666667, 'img_r1': 13.554578168732506, 'img_r5': 33.04278288684526, 'img_r10': 44.470211915233904, 'img_r_mean': 30.35585765693722, 'r_mean': 34.691262161801944}
    coco test: {'txt_r1': 19.42, 'txt_r5': 42.38, 'txt_r10': 54.96, 'txt_r_mean': 38.92, 'img_r1': 13.838464614154338, 'img_r5': 33.334666133546584, 'img_r10': 44.92602958816473, 'img_r_mean': 30.69972011195522, 'r_mean': 34.80986005597761}
    Epoch: 18 iteration: 0 lr: 0.00011019253581563262 loss: -4.260552406311035
    tau_img: 0.0081, tau_txt: 0.0079
    Epoch: 18 iteration: 50 lr: 0.00011019253581563262 loss: -2.9299917221069336
    tau_img: 0.0081, tau_txt: 0.0078
    Epoch: 18 iteration: 100 lr: 0.00011019253581563262 loss: -3.3400635719299316
    tau_img: 0.0080, tau_txt: 0.0077
    Epoch: 18 iteration: 150 lr: 0.00011019253581563262 loss: -3.453747510910034
    tau_img: 0.0079, tau_txt: 0.0077
    Epoch: 18 iteration: 200 lr: 0.00011019253581563262 loss: -3.1733462810516357
    tau_img: 0.0081, tau_txt: 0.0078
    Epoch: 18 iteration: 250 lr: 0.00011019253581563262 loss: -2.6329762935638428
    tau_img: 0.0079, tau_txt: 0.0076
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 18
    coco val: {'txt_r1': 20.76, 'txt_r5': 43.36, 'txt_r10': 55.6, 'txt_r_mean': 39.906666666666666, 'img_r1': 14.226309476209517, 'img_r5': 33.80647740903638, 'img_r10': 45.27788884446222, 'img_r_mean': 31.103558576569373, 'r_mean': 35.50511262161802}
    coco test: {'txt_r1': 20.6, 'txt_r5': 43.26, 'txt_r10': 55.16, 'txt_r_mean': 39.67333333333333, 'img_r1': 14.406237504998002, 'img_r5': 34.25029988004798, 'img_r10': 45.76169532187125, 'img_r_mean': 31.47274423563908, 'r_mean': 35.57303878448621}
    Epoch: 19 iteration: 0 lr: 9.602318675400897e-05 loss: -4.915426254272461
    tau_img: 0.0085, tau_txt: 0.0082
    Epoch: 19 iteration: 50 lr: 9.602318675400897e-05 loss: -3.8118224143981934
    tau_img: 0.0083, tau_txt: 0.0082
    Epoch: 19 iteration: 100 lr: 9.602318675400897e-05 loss: -3.6978960037231445
    tau_img: 0.0083, tau_txt: 0.0080
    Epoch: 19 iteration: 150 lr: 9.602318675400897e-05 loss: -3.7106001377105713
    tau_img: 0.0082, tau_txt: 0.0080
    Epoch: 19 iteration: 200 lr: 9.602318675400897e-05 loss: -4.195495128631592
    tau_img: 0.0083, tau_txt: 0.0080
    Epoch: 19 iteration: 250 lr: 9.602318675400897e-05 loss: -4.262701034545898
    tau_img: 0.0083, tau_txt: 0.0081
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 19
    coco val: {'txt_r1': 19.98, 'txt_r5': 43.22, 'txt_r10': 55.22, 'txt_r_mean': 39.473333333333336, 'img_r1': 14.058376649340264, 'img_r5': 33.310675729708116, 'img_r10': 44.96601359456218, 'img_r_mean': 30.778355324536857, 'r_mean': 35.125844328935095}
    coco test: {'txt_r1': 19.94, 'txt_r5': 43.16, 'txt_r10': 55.72, 'txt_r_mean': 39.60666666666666, 'img_r1': 13.94642143142743, 'img_r5': 33.65053978408636, 'img_r10': 45.33386645341863, 'img_r_mean': 30.976942556310807, 'r_mean': 35.291804611488736}
    Epoch: 20 iteration: 0 lr: 8.250000000000001e-05 loss: -4.490512371063232
    tau_img: 0.0085, tau_txt: 0.0084
    Epoch: 20 iteration: 50 lr: 8.250000000000001e-05 loss: -5.540229320526123
    tau_img: 0.0088, tau_txt: 0.0085
    Epoch: 20 iteration: 100 lr: 8.250000000000001e-05 loss: -5.427042484283447
    tau_img: 0.0088, tau_txt: 0.0085
    Epoch: 20 iteration: 150 lr: 8.250000000000001e-05 loss: -5.009304046630859
    tau_img: 0.0087, tau_txt: 0.0085
    Epoch: 20 iteration: 200 lr: 8.250000000000001e-05 loss: -5.154559135437012
    tau_img: 0.0088, tau_txt: 0.0084
    Epoch: 20 iteration: 250 lr: 8.250000000000001e-05 loss: -5.245851993560791
    tau_img: 0.0087, tau_txt: 0.0085
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 20
    coco val: {'txt_r1': 21.16, 'txt_r5': 43.64, 'txt_r10': 55.96, 'txt_r_mean': 40.25333333333333, 'img_r1': 13.914434226309476, 'img_r5': 33.954418232706914, 'img_r10': 45.64574170331867, 'img_r_mean': 31.171531387445018, 'r_mean': 35.71243236038917}
    coco test: {'txt_r1': 20.46, 'txt_r5': 43.9, 'txt_r10': 55.6, 'txt_r_mean': 39.98666666666667, 'img_r1': 14.166333466613354, 'img_r5': 34.44622151139544, 'img_r10': 45.7936825269892, 'img_r_mean': 31.46874583499933, 'r_mean': 35.727706250833}
    Epoch: 21 iteration: 0 lr: 6.97711384175914e-05 loss: -6.665648460388184
    tau_img: 0.0093, tau_txt: 0.0089
    Epoch: 21 iteration: 50 lr: 6.97711384175914e-05 loss: -5.873527526855469
    tau_img: 0.0089, tau_txt: 0.0088
    Epoch: 21 iteration: 100 lr: 6.97711384175914e-05 loss: -6.627588272094727
    tau_img: 0.0091, tau_txt: 0.0090
    Epoch: 21 iteration: 150 lr: 6.97711384175914e-05 loss: -6.532419204711914
    tau_img: 0.0093, tau_txt: 0.0091
    Epoch: 21 iteration: 200 lr: 6.97711384175914e-05 loss: -6.612300395965576
    tau_img: 0.0092, tau_txt: 0.0090
    Epoch: 21 iteration: 250 lr: 6.97711384175914e-05 loss: -5.026062965393066
    tau_img: 0.0088, tau_txt: 0.0085
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 21
    coco val: {'txt_r1': 21.2, 'txt_r5': 42.88, 'txt_r10': 55.18, 'txt_r_mean': 39.75333333333333, 'img_r1': 13.858456617353058, 'img_r5': 33.5265893642543, 'img_r10': 45.12994802079168, 'img_r_mean': 30.838331334133013, 'r_mean': 35.29583233373317}
    coco test: {'txt_r1': 19.56, 'txt_r5': 42.92, 'txt_r10': 54.92, 'txt_r_mean': 39.13333333333333, 'img_r1': 14.082367053178729, 'img_r5': 33.506597361055576, 'img_r10': 45.16993202718913, 'img_r_mean': 30.919632147141144, 'r_mean': 35.02648274023724}
    Epoch: 22 iteration: 0 lr: 5.797606207796559e-05 loss: -7.0506157875061035
    tau_img: 0.0095, tau_txt: 0.0091
    Epoch: 22 iteration: 50 lr: 5.797606207796559e-05 loss: -7.07581901550293
    tau_img: 0.0093, tau_txt: 0.0090
    Epoch: 22 iteration: 100 lr: 5.797606207796559e-05 loss: -7.153095245361328
    tau_img: 0.0096, tau_txt: 0.0093
    Epoch: 22 iteration: 150 lr: 5.797606207796559e-05 loss: -7.888920307159424
    tau_img: 0.0096, tau_txt: 0.0094
    Epoch: 22 iteration: 200 lr: 5.797606207796559e-05 loss: -6.130715847015381
    tau_img: 0.0092, tau_txt: 0.0090
    Epoch: 22 iteration: 250 lr: 5.797606207796559e-05 loss: -6.484936714172363
    tau_img: 0.0093, tau_txt: 0.0089
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 22
    coco val: {'txt_r1': 20.78, 'txt_r5': 43.78, 'txt_r10': 55.14, 'txt_r_mean': 39.9, 'img_r1': 14.338264694122351, 'img_r5': 34.038384646141544, 'img_r10': 45.71771291483407, 'img_r_mean': 31.36478741836599, 'r_mean': 35.63239370918299}
    coco test: {'txt_r1': 20.3, 'txt_r5': 42.74, 'txt_r10': 55.12, 'txt_r_mean': 39.38666666666666, 'img_r1': 14.326269492203119, 'img_r5': 34.3062774890044, 'img_r10': 45.649740103958415, 'img_r_mean': 31.427429028388644, 'r_mean': 35.40704784752765}
    Epoch: 23 iteration: 0 lr: 4.724400030577786e-05 loss: -9.242505073547363
    tau_img: 0.0100, tau_txt: 0.0099
    Epoch: 23 iteration: 50 lr: 4.724400030577786e-05 loss: -8.627782821655273
    tau_img: 0.0097, tau_txt: 0.0094
    Epoch: 23 iteration: 100 lr: 4.724400030577786e-05 loss: -8.229507446289062
    tau_img: 0.0098, tau_txt: 0.0095
    Epoch: 23 iteration: 150 lr: 4.724400030577786e-05 loss: -8.095161437988281
    tau_img: 0.0101, tau_txt: 0.0099
    Epoch: 23 iteration: 200 lr: 4.724400030577786e-05 loss: -7.361606597900391
    tau_img: 0.0099, tau_txt: 0.0096
    Epoch: 23 iteration: 250 lr: 4.724400030577786e-05 loss: -8.183349609375
    tau_img: 0.0096, tau_txt: 0.0095
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 23
    coco val: {'txt_r1': 20.56, 'txt_r5': 43.82, 'txt_r10': 55.32, 'txt_r_mean': 39.9, 'img_r1': 14.066373450619752, 'img_r5': 33.7984806077569, 'img_r10': 45.86965213914434, 'img_r_mean': 31.244835399173667, 'r_mean': 35.572417699586836}
    coco test: {'txt_r1': 19.68, 'txt_r5': 43.02, 'txt_r10': 54.9, 'txt_r_mean': 39.199999999999996, 'img_r1': 14.374250299880048, 'img_r5': 34.16233506597361, 'img_r10': 45.71771291483407, 'img_r_mean': 31.418099426895907, 'r_mean': 35.30904971344795}
    Epoch: 24 iteration: 0 lr: 3.769253581563263e-05 loss: -10.245454788208008
    tau_img: 0.0102, tau_txt: 0.0099
    Epoch: 24 iteration: 50 lr: 3.769253581563263e-05 loss: -9.013447761535645
    tau_img: 0.0102, tau_txt: 0.0100
    Epoch: 24 iteration: 100 lr: 3.769253581563263e-05 loss: -10.611595153808594
    tau_img: 0.0104, tau_txt: 0.0101
    Epoch: 24 iteration: 150 lr: 3.769253581563263e-05 loss: -8.743675231933594
    tau_img: 0.0102, tau_txt: 0.0102
    Epoch: 24 iteration: 200 lr: 3.769253581563263e-05 loss: -8.715897560119629
    tau_img: 0.0102, tau_txt: 0.0099
    Epoch: 24 iteration: 250 lr: 3.769253581563263e-05 loss: -10.123720169067383
    tau_img: 0.0102, tau_txt: 0.0101
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 24
    coco val: {'txt_r1': 20.3, 'txt_r5': 43.44, 'txt_r10': 54.48, 'txt_r_mean': 39.406666666666666, 'img_r1': 14.110355857656938, 'img_r5': 33.662534986005596, 'img_r10': 45.59776089564174, 'img_r_mean': 31.123550579768093, 'r_mean': 35.26510862321738}
    coco test: {'txt_r1': 19.24, 'txt_r5': 42.34, 'txt_r10': 55.02, 'txt_r_mean': 38.86666666666667, 'img_r1': 14.466213514594163, 'img_r5': 33.75449820071971, 'img_r10': 45.529788084766096, 'img_r_mean': 31.250166600026656, 'r_mean': 35.05841663334666}
    Epoch: 25 iteration: 0 lr: 2.9426316451256386e-05 loss: -11.852662086486816
    tau_img: 0.0108, tau_txt: 0.0105
    Epoch: 25 iteration: 50 lr: 2.9426316451256386e-05 loss: -11.105792045593262
    tau_img: 0.0108, tau_txt: 0.0105
    Epoch: 25 iteration: 100 lr: 2.9426316451256386e-05 loss: -9.328715324401855
    tau_img: 0.0103, tau_txt: 0.0100
    Epoch: 25 iteration: 150 lr: 2.9426316451256386e-05 loss: -10.47180461883545
    tau_img: 0.0105, tau_txt: 0.0101
    Epoch: 25 iteration: 200 lr: 2.9426316451256386e-05 loss: -9.260772705078125
    tau_img: 0.0104, tau_txt: 0.0103
    Epoch: 25 iteration: 250 lr: 2.9426316451256386e-05 loss: -10.207618713378906
    tau_img: 0.0103, tau_txt: 0.0102
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 25
    coco val: {'txt_r1': 20.44, 'txt_r5': 43.78, 'txt_r10': 55.58, 'txt_r_mean': 39.93333333333333, 'img_r1': 14.146341463414634, 'img_r5': 33.81447421031587, 'img_r10': 46.00559776089564, 'img_r_mean': 31.32213781154205, 'r_mean': 35.627735572437686}
    coco test: {'txt_r1': 19.66, 'txt_r5': 42.9, 'txt_r10': 55.24, 'txt_r_mean': 39.26666666666667, 'img_r1': 14.47421031587365, 'img_r5': 34.27828868452619, 'img_r10': 45.725709716113556, 'img_r_mean': 31.492736238837796, 'r_mean': 35.379701452752236}
    Epoch: 26 iteration: 0 lr: 2.2535908641822855e-05 loss: -10.570426940917969
    tau_img: 0.0106, tau_txt: 0.0105
    Epoch: 26 iteration: 50 lr: 2.2535908641822855e-05 loss: -11.204402923583984
    tau_img: 0.0110, tau_txt: 0.0107
    Epoch: 26 iteration: 100 lr: 2.2535908641822855e-05 loss: -12.513148307800293
    tau_img: 0.0110, tau_txt: 0.0108
    Epoch: 26 iteration: 150 lr: 2.2535908641822855e-05 loss: -11.783784866333008
    tau_img: 0.0110, tau_txt: 0.0108
    Epoch: 26 iteration: 200 lr: 2.2535908641822855e-05 loss: -11.702966690063477
    tau_img: 0.0111, tau_txt: 0.0107
    Epoch: 26 iteration: 250 lr: 2.2535908641822855e-05 loss: -11.340032577514648
    tau_img: 0.0111, tau_txt: 0.0110
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 26
    coco val: {'txt_r1': 20.7, 'txt_r5': 43.74, 'txt_r10': 55.58, 'txt_r_mean': 40.00666666666667, 'img_r1': 14.134346261495402, 'img_r5': 33.78248700519792, 'img_r10': 45.657736905237904, 'img_r_mean': 31.19152339064374, 'r_mean': 35.599095028655206}
    coco test: {'txt_r1': 19.48, 'txt_r5': 42.86, 'txt_r10': 55.32, 'txt_r_mean': 39.22, 'img_r1': 14.29828068772491, 'img_r5': 33.98640543782487, 'img_r10': 45.55377848860456, 'img_r_mean': 31.27948820471811, 'r_mean': 35.24974410235905}
    Epoch: 27 iteration: 0 lr: 1.7096805137202738e-05 loss: -12.180134773254395
    tau_img: 0.0114, tau_txt: 0.0113
    Epoch: 27 iteration: 50 lr: 1.7096805137202738e-05 loss: -12.57005500793457
    tau_img: 0.0112, tau_txt: 0.0110
    Epoch: 27 iteration: 100 lr: 1.7096805137202738e-05 loss: -12.195676803588867
    tau_img: 0.0115, tau_txt: 0.0113
    Epoch: 27 iteration: 150 lr: 1.7096805137202738e-05 loss: -13.575706481933594
    tau_img: 0.0116, tau_txt: 0.0113
    Epoch: 27 iteration: 200 lr: 1.7096805137202738e-05 loss: -14.225406646728516
    tau_img: 0.0115, tau_txt: 0.0113
    Epoch: 27 iteration: 250 lr: 1.7096805137202738e-05 loss: -11.519415855407715
    tau_img: 0.0113, tau_txt: 0.0111
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 27
    coco val: {'txt_r1': 20.72, 'txt_r5': 44.04, 'txt_r10': 55.4, 'txt_r_mean': 40.053333333333335, 'img_r1': 14.186325469812076, 'img_r5': 33.71451419432227, 'img_r10': 45.63374650139944, 'img_r_mean': 31.178195388511266, 'r_mean': 35.6157643609223}
    coco test: {'txt_r1': 19.42, 'txt_r5': 42.88, 'txt_r10': 55.08, 'txt_r_mean': 39.126666666666665, 'img_r1': 14.50219912035186, 'img_r5': 33.982407037185126, 'img_r10': 45.569772091163536, 'img_r_mean': 31.351459416233507, 'r_mean': 35.23906304145009}
    Epoch: 28 iteration: 0 lr: 1.3168597893598175e-05 loss: -14.22984790802002
    tau_img: 0.0116, tau_txt: 0.0115
    Epoch: 28 iteration: 50 lr: 1.3168597893598175e-05 loss: -12.658186912536621
    tau_img: 0.0117, tau_txt: 0.0115
    Epoch: 28 iteration: 100 lr: 1.3168597893598175e-05 loss: -14.149580001831055
    tau_img: 0.0117, tau_txt: 0.0114
    Epoch: 28 iteration: 150 lr: 1.3168597893598175e-05 loss: -14.180305480957031
    tau_img: 0.0119, tau_txt: 0.0115
    Epoch: 28 iteration: 200 lr: 1.3168597893598175e-05 loss: -14.528634071350098
    tau_img: 0.0121, tau_txt: 0.0118
    Epoch: 28 iteration: 250 lr: 1.3168597893598175e-05 loss: -14.142889022827148
    tau_img: 0.0120, tau_txt: 0.0116
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 28
    coco val: {'txt_r1': 20.56, 'txt_r5': 43.92, 'txt_r10': 55.18, 'txt_r_mean': 39.88666666666666, 'img_r1': 14.378248700519793, 'img_r5': 33.990403838464616, 'img_r10': 45.81767293082767, 'img_r_mean': 31.39544182327069, 'r_mean': 35.64105424496868}
    coco test: {'txt_r1': 19.56, 'txt_r5': 42.92, 'txt_r10': 55.0, 'txt_r_mean': 39.160000000000004, 'img_r1': 14.550179928028788, 'img_r5': 34.11435425829668, 'img_r10': 45.765693722510996, 'img_r_mean': 31.476742636278818, 'r_mean': 35.31837131813941}
    Epoch: 29 iteration: 0 lr: 1.0794325171600358e-05 loss: -14.580052375793457
    tau_img: 0.0120, tau_txt: 0.0117
    Epoch: 29 iteration: 50 lr: 1.0794325171600358e-05 loss: -14.782979965209961
    tau_img: 0.0124, tau_txt: 0.0122
    Epoch: 29 iteration: 100 lr: 1.0794325171600358e-05 loss: -13.903106689453125
    tau_img: 0.0121, tau_txt: 0.0118
    Epoch: 29 iteration: 150 lr: 1.0794325171600358e-05 loss: -15.160087585449219
    tau_img: 0.0125, tau_txt: 0.0121
    Epoch: 29 iteration: 200 lr: 1.0794325171600358e-05 loss: -14.430315017700195
    tau_img: 0.0118, tau_txt: 0.0117
    Epoch: 29 iteration: 250 lr: 1.0794325171600358e-05 loss: -14.369138717651367
    tau_img: 0.0120, tau_txt: 0.0118
    Computing features for evaluation...
    Computing features for evaluation...
    Epoch: 29
    coco val: {'txt_r1': 20.42, 'txt_r5': 43.82, 'txt_r10': 55.34, 'txt_r_mean': 39.86000000000001, 'img_r1': 14.234306277489004, 'img_r5': 33.750499800079965, 'img_r10': 45.48180727708917, 'img_r_mean': 31.155537784886047, 'r_mean': 35.507768892443025}
    coco test: {'txt_r1': 19.4, 'txt_r5': 42.76, 'txt_r10': 55.08, 'txt_r_mean': 39.08, 'img_r1': 14.434226309476209, 'img_r5': 33.8984406237505, 'img_r10': 45.577768892443025, 'img_r_mean': 31.303478608556578, 'r_mean': 35.191739304278286}


Visualization
-------------

Here we demonstrate the training curves of the mean validation recall
values for CLIP and iSogCLR.

.. code:: python

    clip_recall_vals = [9.56793, 26.4037, 29.3343, 29.7682, 30.5586, 30.8398, 30.8938, 31.5624, 30.5864, 31.1057, 31.775, 31.0977, 31.8895, 31.0098, 31.2123, 31.745, 31.8129, 32.0983, 31.163, 31.7523, 32.1975, 32.2302, 32.5441, 32.3621, 32.6608, 32.7541, 32.7528, 32.4954, 32.9994, 32.7601]
    isogclr_recall_vals = [10.3831, 22.1452, 26.797, 29.2797, 30.6854, 31.7613, 32.4364, 32.3897, 31.5938, 32.287, 32.5756, 33.0849, 33.1855, 33.3042, 33.3853, 34.8619, 34.9766, 34.6912, 35.5051, 35.1258, 35.7124, 35.2958, 35.6323, 35.5724, 35.2651, 35.6277, 35.599, 35.6157, 35.641, 35.5077]

.. code:: python

    import matplotlib.pyplot as plt
    import numpy as np
    
    epochs = np.arange(1, 31)
    
    plt.plot(epochs, clip_recall_vals, label='CLIP', ls=':', marker='+', color='blue')
    plt.plot(epochs, isogclr_recall_vals, label='iSogCLR', marker='*', color='orange')
    
    plt.ylabel('Mean Validation Recall', fontsize=18)
    plt.xlabel('Epoch', fontsize=18)
    
    plt.title('CLIP vs. iSogCLR', fontsize=20)
    plt.legend(fontsize=20)
    
    plt.show()


.. image:: ./imgs/Bimodal_iSogCLR_Tutorial.png