Skip to content

skydiving94/comp_sci496-sml_homework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMP_SCI496-SML Project - Image Retrieval

Problem Statement

Similar to searching information in the form of text, the task of image retrieval is to find information in the form of image that convey a certain concept a user has in mind. Such concept can be expressed purely through text, a similar image, or a combination of both. A good image retrieval system can be useful in various scenarios. For example, a customer shopping for some clothes can upload an image resembling what he/she wants, and then add some potential adjustments to the uploaded image. The system will find the closest image representing the cloth he/she desires.

pipeline

Input/Output

Input

  • An image
  • An adjustment text

Output

  • A set of retrieved images

io

Core Model

The front-end demo app and backend server are built on top of tirg, whose idea is to formulate a query representation by composing vectorized image and text information. The vectorized image features are filtered using the text information so that only relevant image information is kept, whereas the all text features are retained. This model is proven to be relatively accurate on the CSS3D dataset, but for more real life dataset such as Fashion200K, which is also used in this project, there is still room for improvement.

Deliverables

Running through Docker Containers

This project is most easily run through docker containers. Simply follow the steps listed below.

1. Pull docker images for frontend and backend

docker pull timoderbeste/tirg-backend:v3
docker pull timoderbeste/tirg-frontend:v1

2. Download and unzip prepared data/model file

Click here to download datamodel.zip. Then unzip it to a preferred directory such as/Users/timowang/Desktop/datamodel/. Lastly, set an environment variable as follows:

export DATAMODEL=/path/to/datamodel

Using the example directory, the export statement looks like follows:

export DATAMODEL=/Users/timowang/Desktop/datamodel/

3. Run docker containers and start servers

First, run the tirg-backend container. Here, we forward container port number 80 to host port 80 and also mount the datamodel directory onto the container. Here, notice that tag v3 is used.

docker run -dit -p 80:80 --name tirg-backend -v $DATAMODEL:/datamodel timoderbeste/tirg-backend:v3 /bin/bash

Then, run the tirg-frontend container. Here, we forward container port number 3000 to host port 3000.

docker run -dit -p 3000:3000 --name tirg-frontend timoderbeste/tirg-frontend:v1 /bin/bash

Lastly, execute run_backend.sh and run_frontend.sh in the tirg-backend and tirg-frontend containers, respectively.

docker exec tirg-backend /bin/bash /run_backend.sh
docker exec tirg-frontend /bin/bash /run_frontend.sh

4. Launch a browser and start exploring

Enter localhost:3000 in your brower's address bar, and you will be shown an interface similar to below. You can upload an image of a cloth, and then enter some description (the description field CANNOT be empty). After you hit Compile and Retrieve, a few similar images of clothes should be retrieved and displayed.

Currently, the Fashion200K dataset only has female clothes. However, it is also possible to build your own dataset and train a model for it.

image-20200310183237176

image-20200310192315896

Model Training

No training has been done for this project as a pretrained model for the Fashion200K dataset was available. However, it would definitely be possible to construct a new dataset and train a new model with it. Here, a brief set of instructions will be given.

Dataset Construction

If you only want to train a model using the standard datasets, you could skip this step.

1. Implement a derived dataset class

The tirg project is implemented with extension of dataset kept in mind. Particularly, the abstract class BaseDataset defined in datasets.py, which can be found here, specifies a set of methods that must be implemented by its derived class, such as Fashion200k class.

Those methods that must be implemented are:

  • get_all_texts: it returns a list of str, where the $i^{th}$ text corresponds to the description for the $i^{th}$ image in imgs.
  • __getitem__: it returns an example, which is a dict object, with the following information. You could either handbuild examples conforming to the content below, or you could implement a function to do it automatically. The key is how to efficiently create mod text for each example. mod is the text representing the adjustment to be done to the source image.
    • source_img_id (int)
    • source_img_data (PIL.Image)
    • source_caption (str)
    • target_img_id (int)
    • target_img_data (PIL.Image)
    • target_caption (str)
    • mod (str): you could either handbuild it, or you could create a function that compares source_caption and target_caption to automatically generate one. You could refer here as an example.
  • generate_random_query_target: it returns a random example, whose content should be the same as the one listed for __getitem__. The difference is that in __getitem__ the example is determined by its idx. In the case of generate_random_query_target, it can be any example.
  • get_img: it returns either a raw image, i.e. a PIL.Image object, or a 2D array obtained through a transform function composed with torchvision.transforms.Compose.

You will also likely need to modify __init__ function so that you could polulate two essential list objects, imgs and test_queries.

2. Make your dataset available

After you implemented a dataset class dervied from BaseDataset, you need to make sure it can be used in main.py.

The load_dataset function loads the dataset specified by input arguments of main.py, dataset and dataset_path. It is done in an if-else fashion. Simply follow the format and add another elif statement for your dataset, load the train and test set with correct transform functions and you are all set.

Training and Evaluation

To run training and evaluation on the originally provided datasets, you could simply find instructions on the GitHub repo for the original project.

To run training and evaluation with your own dataset, the step is similar.

python main.py \
--dataset=[your-dataset-name] \
--dataset_path=[/path/to/your/dataset] \
--num_iters=[desired-number-of-iterations] \
--model=pconcat/tirg] \
--loss=psoft_triplet/batch_based_classification] \
--learning_rate_decay_frequency=[desired-lr-decay-freq] \
--comment=[dataset-name_model-name]

Reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published