Should we build remote interface for UForm? #13
-
Guys, should we build and keep alive an API with embedding inference? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Of course! The question should be, how exactly do we build the remote interface? At this point, all of the checkpoints pre-packaged into UForm are tiny. They are easy to deploy in any embedded setup. But some of the networks we are currently baking may require a custom setup for efficient inference at scale. from ujrpc.rich_posix import Server
import ufrom
server = Server()
model = uform.get_model('unum-cloud/uform-vl-multilingual')
@server
def vectorize(description: str, photo: PIL.Image.Image) -> numpy.ndarray:
image = model.preprocess_image(photo)
tokens = model.preprocess_text(description)
joint_embedding = model.encode_multimodal(image=image, text=tokens)
return joint_embedding.cpu().detach().numpy() Let's use UJRPC for those Remote Procedure Calls. On DGX-A100 servers with current models, we can squeeze over 300K inferences/sec. FastAPI, and even gRPC, can't sustain that. Only UJRPC can. Luckily, there is now support for Pillow images so that the whole deployment script can fit in just 10 lines of Python. |
Beta Was this translation helpful? Give feedback.
-
Great, let's do that! |
Beta Was this translation helpful? Give feedback.
Of course! The question should be, how exactly do we build the remote interface?
At this point, all of the checkpoints pre-packaged into UForm are tiny. They are easy to deploy in any embedded setup. But some of the networks we are currently baking may require a custom setup for efficient inference at scale.