Similar to searching information in the form of text, the task of image retrieval is to find information in the form of image that convey a certain concept a user has in mind. Such concept can be expressed purely through text, a similar image, or a combination of both. A good image retrieval system can be useful in various scenarios. For example, a customer shopping for some clothes can upload an image resembling what he/she wants, and then add some potential adjustments to the uploaded image. The system will find the closest image representing the cloth he/she desires.
Input
- An image
- An adjustment text
Output
- A set of retrieved images
The front-end demo app and backend server are built on top of tirg, whose idea is to formulate a query representation by composing vectorized image and text information. The vectorized image features are filtered using the text information so that only relevant image information is kept, whereas the all text features are retained. This model is proven to be relatively accurate on the CSS3D dataset, but for more real life dataset such as Fashion200K, which is also used in this project, there is still room for improvement.
- Code for pre-trained model loading
- Code for dataset loading and preparation
- Code for pretrained model evaluation
- Code for single batch query execution
- Commandline interface for querying image indices
- RESTful API for querying implemented with Flask
- Simple Demo app for image retrieval using React
- Docker image for backend service (Use tag v3)
- Docker image for frontend demo (Use tag v1)
- A set of instructions for training with custom dataset (not included by the original project).
This project is most easily run through docker containers. Simply follow the steps listed below.
docker pull timoderbeste/tirg-backend:v3
docker pull timoderbeste/tirg-frontend:v1
Click here to download datamodel.zip
. Then unzip it to a preferred directory such as/Users/timowang/Desktop/datamodel/
. Lastly, set an environment variable as follows:
export DATAMODEL=/path/to/datamodel
Using the example directory, the export statement looks like follows:
export DATAMODEL=/Users/timowang/Desktop/datamodel/
First, run the tirg-backend
container. Here, we forward container port number 80 to host port 80 and also mount the datamodel directory onto the container. Here, notice that tag v3
is used.
docker run -dit -p 80:80 --name tirg-backend -v $DATAMODEL:/datamodel timoderbeste/tirg-backend:v3 /bin/bash
Then, run the tirg-frontend
container. Here, we forward container port number 3000 to host port 3000.
docker run -dit -p 3000:3000 --name tirg-frontend timoderbeste/tirg-frontend:v1 /bin/bash
Lastly, execute run_backend.sh
and run_frontend.sh
in the tirg-backend
and tirg-frontend
containers, respectively.
docker exec tirg-backend /bin/bash /run_backend.sh
docker exec tirg-frontend /bin/bash /run_frontend.sh
Enter localhost:3000
in your brower's address bar, and you will be shown an interface similar to below. You can upload an image of a cloth, and then enter some description (the description field CANNOT be empty). After you hit Compile and Retrieve
, a few similar images of clothes should be retrieved and displayed.
Currently, the Fashion200K
dataset only has female clothes. However, it is also possible to build your own dataset and train a model for it.
No training has been done for this project as a pretrained model for the Fashion200K
dataset was available. However, it would definitely be possible to construct a new dataset and train a new model with it. Here, a brief set of instructions will be given.
If you only want to train a model using the standard datasets, you could skip this step.
The tirg project is implemented with extension of dataset kept in mind. Particularly, the abstract class BaseDataset
defined in datasets.py
, which can be found here, specifies a set of methods that must be implemented by its derived class, such as Fashion200k
class.
Those methods that must be implemented are:
-
get_all_texts
: it returns a list ofstr
, where the$i^{th}$ text corresponds to the description for the$i^{th}$ image inimgs
. -
__getitem__
: it returns an example, which is adict
object, with the following information. You could either handbuild examples conforming to the content below, or you could implement a function to do it automatically. The key is how to efficiently createmod
text for each example.mod
is the text representing the adjustment to be done to the source image.-
source_img_id
(int
) -
source_img_data
(PIL.Image
) -
source_caption
(str
) -
target_img_id
(int
) -
target_img_data
(PIL.Image
) -
target_caption
(str
) -
mod
(str
): you could either handbuild it, or you could create a function that comparessource_caption
andtarget_caption
to automatically generate one. You could refer here as an example.
-
-
generate_random_query_target
: it returns a random example, whose content should be the same as the one listed for__getitem__
. The difference is that in__getitem__
the example is determined by itsidx
. In the case ofgenerate_random_query_target
, it can be any example. -
get_img
: it returns either a raw image, i.e. aPIL.Image
object, or a 2D array obtained through atransform
function composed withtorchvision.transforms.Compose
.
You will also likely need to modify __init__
function so that you could polulate two essential list objects, imgs
and test_queries
.
After you implemented a dataset class dervied from BaseDataset
, you need to make sure it can be used in main.py
.
The load_dataset
function loads the dataset specified by input arguments of main.py
, dataset
and dataset_path
. It is done in an if-else
fashion. Simply follow the format and add another elif
statement for your dataset, load the train and test set with correct transform functions and you are all set.
To run training and evaluation on the originally provided datasets, you could simply find instructions on the GitHub repo for the original project.
To run training and evaluation with your own dataset, the step is similar.
python main.py \
--dataset=[your-dataset-name] \
--dataset_path=[/path/to/your/dataset] \
--num_iters=[desired-number-of-iterations] \
--model=pconcat/tirg] \
--loss=psoft_triplet/batch_based_classification] \
--learning_rate_decay_frequency=[desired-lr-decay-freq] \
--comment=[dataset-name_model-name]