Skip to content

Commit

Permalink
add note on chunked prefill
Browse files Browse the repository at this point in the history
  • Loading branch information
omrishiv committed Sep 20, 2024
1 parent 879047a commit ea2dd0c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/getting_started/neuron-installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Installation with Neuron
========================

vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK with continuous batching.
Paged Attention is currently in development and will be available soon.
Paged Attention and Chunked Prefill are currently in development and will be available soon.
Data types currently supported in Neuron SDK are FP16 and BF16.

Requirements
Expand Down

0 comments on commit ea2dd0c

Please sign in to comment.