Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add download files to example Python Large Output Files #503

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

saimanikant
Copy link
Collaborator

Description

Please provide a brief description of the changes in this pull request.

Checklist

  • I have tested these changes locally.
  • I have added unit tests (if appropriate).
  • I have added necessary documentation or updated existing documentation.
  • I have linked the issue(s) addressed by this PR if any.

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.45%. Comparing base (b7f1e48) to head (cd35015).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #503   +/-   ##
=======================================
  Coverage   92.45%   92.45%           
=======================================
  Files          64       64           
  Lines        2599     2599           
=======================================
  Hits         2403     2403           
  Misses        196      196           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 23 to 25
"""
Example to query resources from a project.

- Query values from evaluated jobs, computing some simple statistics on parameter values.
- Download files from the project

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the docstring, there's no querying of statistics here

log.info(
f"=== Example 1: Downloading output files of {num} jobs using ProjectApi.download_file()"
)
for job in jobs[0:num]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for job in jobs[0:num]:
for job in jobs:

num = len(jobs)

log.info(
f"=== Example 1: Downloading output files of {num} jobs using ProjectApi.download_file()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no example 2, please clean up the msg

for f in files:
fpath = os.path.join(out_path, f"task_{task.id}")
log.info(f"Download output file {f.evaluation_path} to {fpath}")
start = time.process_time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to use the same timing function you use 2 lines later, otherwise you get meaningless numbers

Suggested change
start = time.process_time()
start = time.time()

args = parser.parse_args()

logger = logging.getLogger()
logging.basicConfig(format="%(message)s", level=logging.DEBUG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe default the log level to INFO, to avoid the many DT client debug messages


def download_files(client, project_name):
"""Download files."""
out_path = os.path.join(os.path.dirname(__file__), "downloads")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make the download dir a configurable CLI argument and even default it to a temp directory if unset, those downloaded files are of no use.

for f in files:
fpath = os.path.join(out_path, f"task_{task.id}")
log.info(f"Download output file {f.evaluation_path} to {fpath}")
start = time.process_time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first timing won't be accurate because it will include initialization of the DT client. Is there a way to force that to happen outside of the loop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants