Fix cost_estimate.py #1810

xksteven · 2024-05-08T17:36:15Z

Fixes several bugs to get cost_estimate.py working again. Bugs include:

python scripts/cost_estimate.py 
...
  File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 91, in main
    task_dict={taskname: tasks.get_task(taskname)()},
AttributeError: module 'lm_eval.tasks' has no attribute 'get_task'

Traceback (most recent call last):
...
  File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
TypeError: simple_evaluate() got an unexpected keyword argument 'lm'

Traceback (most recent call last):
...
  File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
TypeError: simple_evaluate() got an unexpected keyword argument 'task_dict'

  File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 24, in loglikelihood
    for ctx, cont in requests:
TypeError: cannot unpack non-iterable Instance object

Needed to add

res.append(-random.random())

Otherwise wikitext evaluation fails and crashes the program.

CLAassistant · 2024-05-08T17:36:22Z

All committers have signed the CLA.

Fixes several bugs to get cost_estimate.py working again. Bugs include: ``` python scripts/cost_estimate.py ... File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 91, in main task_dict={taskname: tasks.get_task(taskname)()}, AttributeError: module 'lm_eval.tasks' has no attribute 'get_task' ``` ``` Traceback (most recent call last): ... File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper return fn(*args, **kwargs) TypeError: simple_evaluate() got an unexpected keyword argument 'lm' ``` ``` Traceback (most recent call last): ... File "/data/steven_test/workspace/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper return fn(*args, **kwargs) TypeError: simple_evaluate() got an unexpected keyword argument 'task_dict' ``` ``` File "/data/steven_test/workspace/lm-evaluation-harness/scripts/cost_estimate.py", line 24, in loglikelihood for ctx, cont in requests: TypeError: cannot unpack non-iterable Instance object ```

sepiatone · 2024-05-09T04:59:30Z

@xksteven @haileyschoelkopf @lintangsutawika I wonder whether lm-eval can usefully provide this costing functionality.

Models do not all use the same tokenizer. OpenAI models use the tiktoken library, for Claude 3, Anthropic has not yet publicly provided a tokenization libarary, Cohere uses it's own etc.
Pricing information is frequently updated. Till date it's been a downward revision so that may be less of a problem (unless it stops someone from running an eval :)) Amusingly, the cost associated with the models reported by the cost estimator are double or three times the final cost at the time of their deprecation.
Even from the same organization, every model has a slightly different cost structure. For example LangChain has to manage this just for Open AI.

Perhaps it may be better to consider this functionality as a token count estimator (rather than a token cost estimator) and users could use it to determine their cost from the current pricing of the model they intend to use. Yes, this still runs up against point 1, but still as an estimate it may be useful.

xksteven · 2024-05-09T05:17:34Z

@sepiatone I am not sure I have enough time to make a proper abstraction from this into what you're suggesting.

I wanted the base code to work to then be able to edit the model pricing myself to get an estimate of the cost. After doing that I made the PR to just get the previous code working again. I'm happy to have someone else take over to convert it into a general cost estimator for running different models.

Also add a personal preference I like small PRs as they're easier to review.
As an aside once the code is working it's not difficult to swap out the pricing or the tokenizer either to whatever custom one you'd like.

sepiatone · 2024-05-09T09:43:25Z

@sepiatone I am not sure I have enough time to make a proper abstraction from this into what you're suggesting.

Perhaps I should've been clearer - I'm not suggesting anything specific with this PR other than what has already been done. What is being done in this PR is required whatever further direction is taken.

Perhaps it's better to take this discussion (having a token count estimator) to a separate issue.

StellaAthena · 2024-05-16T16:08:39Z

@xksteven @sepiatone At the time we wrote this code, there was just the OpenAI ada/babbage/currie/devinci APIs and computing the costs was quite easy. Now this is a much bigger lift and I think that it would probably make sense to delete it because anything that works would need to be very bespoke and can be broken by policy changes at companies.

Maybe we can add a warning somewhere that says "this might be really expensive, enter y if you are sure you want to do it" whenever someone runs a sufficiently large eval task.

xksteven requested review from haileyschoelkopf and lintangsutawika as code owners May 8, 2024 17:36

xksteven force-pushed the patch-1 branch from 8301a96 to 8f1d11f Compare May 9, 2024 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cost_estimate.py #1810

Fix cost_estimate.py #1810

xksteven commented May 8, 2024 •

edited

Loading

CLAassistant commented May 8, 2024 •

edited

Loading

sepiatone commented May 9, 2024

xksteven commented May 9, 2024

sepiatone commented May 9, 2024

StellaAthena commented May 16, 2024 •

edited

Loading

Fix cost_estimate.py #1810

Are you sure you want to change the base?

Fix cost_estimate.py #1810

Conversation

xksteven commented May 8, 2024 • edited Loading

CLAassistant commented May 8, 2024 • edited Loading

sepiatone commented May 9, 2024

xksteven commented May 9, 2024

sepiatone commented May 9, 2024

StellaAthena commented May 16, 2024 • edited Loading

xksteven commented May 8, 2024 •

edited

Loading

CLAassistant commented May 8, 2024 •

edited

Loading

StellaAthena commented May 16, 2024 •

edited

Loading