Skip to content

CLIP + GPT transcribe images: GPT proposes how to continue, CLIP decides which proposal to use, continue

Notifications You must be signed in to change notification settings

nielsrolf/ImageTranscription

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image Transcription using CLIP + GPT

We do image transcription by alternating between:

  • letting GPT suggest a number of potential next words
  • let CLIP decide which of the suggested texts is the best fit for the image

Example: This image

creates a following search tree (after prefiltering 10 out of 10000 proposals at each decision step): search tree

And atfer a few iterations comes up with:

This photo shows two bodies of a cat (seen on Flickr, April 2011) carrying a knife

About

CLIP + GPT transcribe images: GPT proposes how to continue, CLIP decides which proposal to use, continue

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published