diff --git a/README.md b/README.md index 6b2ac0ca4026c..622bf24c60027 100644 --- a/README.md +++ b/README.md @@ -1,56 +1,39 @@
-Build multimodal AI applications with cloud-native technologies +Build and deploy multimodal AI services at scale
- +Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic. -Jina lets you build multimodal [**AI services**](#build-ai-models) and [**pipelines**](#build-a-pipeline) that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production. You can focus on your logic and algorithms, without worrying about the infrastructure complexity. +## Key Features -![](./.github/images/build-deploy.png) - -Jina provides a smooth Pythonic experience for serving ML models transitioning from local deployment to advanced orchestration frameworks like Docker-Compose, Kubernetes, or Jina AI Cloud. Jina makes advanced solution engineering and cloud-native technologies accessible to every developer. - -- Build and serve models for any [data type](https://docs.docarray.org/data_types/first_steps/) and any mainstream [deep learning framework](https://docarray.org/docarray/how_to/multimodal_training_and_serving/). -- Design high-performance services, with [easy scaling](https://docs.jina.ai/concepts/orchestration/scale-out/), duplex client-server streaming, batching, [dynamic batching](https://docs.jina.ai/concepts/serving/executor/dynamic-batching/), async/non-blocking data processing and any [protocol](https://docs.jina.ai/concepts/serving/gateway/#set-protocol-in-python). -- Serve [LLM models while streaming their output](https://github.com/jina-ai/jina#streaming-for-llms). -- Docker container integration via [Executor Hub](https://cloud.jina.ai), OpenTelemetry/Prometheus observability. -- Streamlined CPU/GPU hosting via [Jina AI Cloud](https://cloud.jina.ai). -- Deploy to your own cloud or system with our [Kubernetes](https://docs.jina.ai/cloud-nativeness/k8s/) and [Docker Compose](https://docs.jina.ai/cloud-nativeness/docker-compose/) integration. +- Native support for all major ML frameworks and data types +- High-performance service design with scaling, streaming, and dynamic batching +- LLM serving with streaming output +- Built-in Docker integration and Executor Hub +- One-click deployment to Jina AI Cloud +- Enterprise-ready with Kubernetes and Docker Compose supportexecutor.py |
-
---|
+Let's create a gRPC-based AI service using StableLM: ```python from jina import Executor, requests from docarray import DocList, BaseDoc - from transformers import pipeline - class Prompt(BaseDoc): - text: str - + text: str class Generation(BaseDoc): - prompt: str - text: str - + prompt: str + text: str class StableLM(Executor): - def __init__(self, **kwargs): - super().__init__(**kwargs) - self.generator = pipeline( - 'text-generation', model='stabilityai/stablelm-base-alpha-3b' - ) - - @requests - def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]: - generations = DocList[Generation]() - prompts = docs.text - llm_outputs = self.generator(prompts) - for prompt, output in zip(prompts, llm_outputs): - generations.append(Generation(prompt=prompt, text=output)) - return generations + def __init__(self, **kwargs): + super().__init__(**kwargs) + self.generator = pipeline( + 'text-generation', model='stabilityai/stablelm-base-alpha-3b' + ) + + @requests + def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]: + generations = DocList[Generation]() + prompts = docs.text + llm_outputs = self.generator(prompts) + for prompt, output in zip(prompts, llm_outputs): + generations.append(Generation(prompt=prompt, text=output)) + return generations ``` - | -
Python API: deployment.py |
- YAML: deployment.yml |
-
---|---|
+Deploy with Python or YAML: ```python from jina import Deployment @@ -145,503 +92,188 @@ from executor import StableLM dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345) with dep: - dep.block() + dep.block() ``` - | -- ```yaml jtype: Deployment with: - uses: StableLM - py_modules: - - executor.py - timeout_ready: -1 - port: 12345 + uses: StableLM + py_modules: + - executor.py + timeout_ready: -1 + port: 12345 ``` -And run the YAML Deployment with the CLI: `jina deployment --uses deployment.yml` - - | -
text_to_image.py |
-
---|
- -```python -import numpy as np -from jina import Executor, requests -from docarray import BaseDoc, DocList -from docarray.documents import ImageDoc - - -class Generation(BaseDoc): - prompt: str - text: str - - -class TextToImage(Executor): - def __init__(self, **kwargs): - super().__init__(**kwargs) - from diffusers import StableDiffusionPipeline - import torch - - self.pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16 - ).to("cuda") - - @requests - def generate_image(self, docs: DocList[Generation], **kwargs) -> DocList[ImageDoc]: - result = DocList[ImageDoc]() - images = self.pipe( - docs.text - ).images # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/) - result.tensor = np.array(images) - return result +prompt = Prompt(text='suggest an interesting image generation prompt') +client = Client(port=12345) +response = client.post('/', inputs=[prompt], return_type=DocList[Generation]) ``` - | -
Python API: flow.py |
- YAML: flow.yml |
-
---|---|
+Chain services into a Flow: ```python from jina import Flow -from executor import StableLM -from text_to_image import TextToImage flow = ( - Flow(port=12345) - .add(uses=StableLM, timeout_ready=-1) - .add(uses=TextToImage, timeout_ready=-1) + Flow(port=12345) + .add(uses=StableLM) + .add(uses=TextToImage) ) with flow: - flow.block() + flow.block() ``` - | -- -```yaml -jtype: Flow -with: - port: 12345 -executors: - - uses: StableLM - timeout_ready: -1 - py_modules: - - executor.py - - uses: TextToImage - timeout_ready: -1 - py_modules: - - text_to_image.py -``` - -Then run the YAML Flow with the CLI: `jina flow --uses flow.yml` - - | -
Normal Deployment | -Scaled Deployment | -
---|---|
+Example scaling a Stable Diffusion deployment: ```yaml jtype: Deployment with: - uses: TextToImage - timeout_ready: -1 - py_modules: - - text_to_image.py + uses: TextToImage + timeout_ready: -1 + py_modules: + - text_to_image.py + env: + CUDA_VISIBLE_DEVICES: RR + replicas: 2 + uses_dynamic_batching: + /default: + preferred_batch_size: 10 + timeout: 200 ``` - | -- -```yaml -jtype: Deployment -with: - uses: TextToImage - timeout_ready: -1 - py_modules: - - text_to_image.py - env: - CUDA_VISIBLE_DEVICES: RR - replicas: 2 - uses_dynamic_batching: # configure dynamic batching - /default: - preferred_batch_size: 10 - timeout: 200 -``` - - | -
config.yml |
- requirements.txt |
-
---|---|
+2. Configure: ```yaml +# config.yml jtype: TextToImage py_modules: - - executor.py + - executor.py metas: - name: TextToImage - description: Text to Image generation Executor based on StableDiffusion - url: - keywords: [] + name: TextToImage + description: Text to Image generation Executor ``` - | -- -```requirements.txt -diffusers -accelerate -transformers -``` - - | -