Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #143: Add data creation CLI #144

Merged
merged 27 commits into from
Jan 4, 2024
Merged

Conversation

rchan26
Copy link
Collaborator

@rchan26 rchan26 commented Jan 2, 2024

For #143:

  • Create reginald_create_index CLI to create an index
  • Create Dockerfile for reginald_create_index
  • Split up data index creation from LlamaIndex to make a DataIndexCreator class
  • Also add a setup_service_context helper function which gets called in both setting up LlamaIndex model and when creating the index on its own
  • Some more logging with the parser
    • Created get_env_var helper function which just logs when it's trying to read a environment variable and says if it's able to find it. If not able to, default arguments are used
    • I made the default arguments in argparses lambda functions so that we only try to get environment variables if they are needed (a bit of a hack because otherwise, it'd log that it's trying to find an environment variable even if we pass in a value for the CLI argument)
    • get_args loops through the arguments and calls the arguments and calls any lambda functions to get the default arguments
  • k, chunk_size, chunk_overlap_ratio, num_output are now able to be passed as CLI arguments in reginald_run, reginald_run_api_llm and reginald_create_index

@rchan26 rchan26 requested a review from rwood-97 January 2, 2024 17:11
@rchan26 rchan26 marked this pull request as draft January 3, 2024 14:51
@rchan26
Copy link
Collaborator Author

rchan26 commented Jan 3, 2024

converting to draft as a few things to fix regarding the types of the default arguments. trying to fix this now

@rchan26 rchan26 changed the base branch from pulumi-gh-actions to main January 3, 2024 17:02
@rchan26 rchan26 marked this pull request as ready for review January 3, 2024 17:28
@rwood-97
Copy link
Contributor

rwood-97 commented Jan 4, 2024

Looks good - I changed the instances of os.getenv to get_env_var which means it prints the "getting SLACK_APP_TOKEN " bit twice (since it does actually get it twice) but I think that is okay?

@rchan26 rchan26 merged commit 308d9f3 into main Jan 4, 2024
1 check passed
@rchan26 rchan26 deleted the data-index-creation-container branch January 4, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants