Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved benchmark utils #1679

Merged
merged 1 commit into from
Aug 19, 2024
Merged

Improved benchmark utils #1679

merged 1 commit into from
Aug 19, 2024

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented Aug 19, 2024

Improved the benchmark utils because reporting the tok/sec for new PRs will be an important info moving forward.

Speed and resource estimates

Use the .benchmark() method to compare the computational performance of different settings. The .benchmark() method takes the same arguments as the .generate() method. For example, we can estimate the speed and GPU memory consumption as follows (the resulting numbers were obtained on an A10G GPU):

from litgpt.api import LLM
from pprint import pprint

llm = LLM.load(
    model="microsoft/phi-2",
    distribute=None
)

llm.distribute(fixed_kv_cache_size=500)

text, bench_d = llm.benchmark(prompt="What do llamas eat?", top_k=1, stream=True)
print(text)
pprint(bench_d)


# Llamas are herbivores and primarily eat grass, leaves, and shrubs. They have a specialized 
# digestive system that allows them to efficiently extract nutrients from plant material.

# Using 1 device(s)
#  Llamas are herbivores and primarily eat grass, leaves, and shrubs. They have a unique digestive system that allows them to efficiently extract nutrients from tough plant material.

# {'Inference speed in tokens/sec': [17.617540650112936],
#  'Seconds to first token': [0.6533610639999097],
#  'Seconds total': [1.4758019020000575],
#  'Tokens generated': [26],
#  'Total GPU memory allocated in GB': [5.923729408]}

To get more reliably estimates, it's recommended to repeat the benchmark for multiple iterations via num_iterations=10:

text, bench_d = llm.benchmark(num_iterations=10, prompt="What do llamas eat?", top_k=1, stream=True)
print(text)
pprint(bench_d)

# Using 1 device(s)
#  Llamas are herbivores and primarily eat grass, leaves, and shrubs. They have a unique digestive system that allows them to efficiently extract nutrients from tough plant material.

# {'Inference speed in tokens/sec': [17.08638672485105,
#                                    31.79908547222976,
#                                    32.83646959864293,
#                                    32.95994240022436,
#                                    33.01563039816964,
#                                    32.85263413816648,
#                                    32.82712094713627,
#                                    32.69216141907453,
#                                    31.52431714347663,
#                                    32.56752130561681],
#  'Seconds to first token': [0.7278506560005553,
#                             0.022963577999689733,
#                             0.02399449199947412,
#                             0.022921959999621322,
# ...

As one can see, the first iteration may take longer due to warmup times. So, it's recommended to discard the first iteration:

for key in bench_d:
    bench_d[key] = bench_d[key][1:]

For better visualization, you can use the benchmark_dict_to_markdown_table function

from litgpt.api import benchmark_dict_to_markdown_table

print(benchmark_dict_to_markdown_table(bench_d_list))
Metric Mean Std Dev
Seconds total 0.80 0.01
Seconds to first token 0.02 0.00
Tokens generated 26.00 0.00
Inference speed in tokens/sec 32.56 0.50
Total GPU memory allocated in GB 5.92 0.00

@rasbt rasbt merged commit 7581313 into main Aug 19, 2024
8 of 9 checks passed
@rasbt rasbt deleted the benchmark-utils branch August 19, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant