Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export dataset in CVAT format misses frames in tasks with non-default… #8662

Merged
merged 6 commits into from
Nov 8, 2024

Conversation

bsekachev
Copy link
Member

@bsekachev bsekachev commented Nov 7, 2024

… frame step

Motivation and context

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

  • Bug Fixes

    • Resolved an issue where dataset exports in CVAT format were missing frames when using non-default frame steps.
  • Improvements

    • Enhanced the efficiency of dataset export operations by optimizing how updated timestamps are retrieved.
    • Improved error handling for export cache management with more specific exception handling.
  • Changes

    • Modified frame iteration logic to allow for broader frame processing without skipping based on frame steps.

Copy link
Contributor

coderabbitai bot commented Nov 7, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update to the CVAT (Computer Vision Annotation Tool) addresses issues related to dataset exports, specifically ensuring that all frames are included when a non-default frame step is used. Key modifications include improvements in the export function for more efficient timestamp retrieval, enhanced error handling in the clear_export_cache function, and changes to the frame iteration logic in the iterate_frames method to allow for broader frame processing. A new custom exception class has also been introduced for better error specificity.

Changes

File Path Change Summary
changelog.d/20241107_165701_sekachev.bs_fixed_export.md Updated to document the fix for dataset export failures related to non-default frame steps.
cvat/apps/dataset_manager/views.py Modified export function for efficient timestamp retrieval using values_list. Updated error handling in clear_export_cache to use a custom FileIsBeingUsedError exception. Added the new exception class.
cvat/apps/engine/frame_provider.py Changed iterate_frames method to initialize frame_range with itertools.count(start_frame) instead of using the frame step, allowing iteration over every integer from start_frame. No changes to error handling or public entities.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ExportFunction
    participant CacheManager
    participant FrameProvider

    User->>ExportFunction: Request dataset export
    ExportFunction->>FrameProvider: Retrieve frames
    FrameProvider->>FrameProvider: Iterate frames without skipping
    FrameProvider-->>ExportFunction: Return frames
    ExportFunction->>CacheManager: Manage export cache
    CacheManager-->>ExportFunction: Handle cache errors
    ExportFunction-->>User: Return exported dataset
Loading

🐰 In the fields where frames do play,
The rabbit hops and jumps all day.
With every step, a frame is found,
In CVAT's world, joy does abound!
Exporting now, with frames galore,
Hooray for changes, let’s explore! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
cvat/apps/dataset_manager/views.py (2)

Line range hint 213-214: Enhance the FileIsBeingUsedError exception.

While introducing a custom exception is good practice, consider making it more informative by:

  1. Adding a docstring explaining when this exception is raised
  2. Including file details in the error message

Consider this enhancement:

 class FileIsBeingUsedError(Exception):
-    pass
+    """Raised when attempting to clear an export cache file that is still in use."""
+    def __init__(self, file_path: str, ttl: timedelta):
+        self.file_path = file_path
+        self.ttl = ttl
+        super().__init__(
+            f"Export cache file '{file_path}' is still in use. "
+            f"Will be available for cleanup after {ttl}"
+        )

Line range hint 31-33: Consider addressing the TODO comment regarding RQ scheduler.

The comment indicates a potential race condition where two jobs with the same name could run in parallel during retries. Consider:

  1. Implementing proper retries using RQ's built-in scheduler
  2. Adding a unique suffix to retried job IDs to prevent parallel execution

Would you like me to help design a more robust retry mechanism that prevents this race condition?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 58775bf and 50de904.

📒 Files selected for processing (3)
  • changelog.d/20241107_165701_sekachev.bs_fixed_export.md (1 hunks)
  • cvat/apps/dataset_manager/views.py (1 hunks)
  • cvat/apps/engine/frame_provider.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • changelog.d/20241107_165701_sekachev.bs_fixed_export.md
🔇 Additional comments (2)
cvat/apps/dataset_manager/views.py (1)

125-126: Efficient database query optimization.

The change to use values_list('updated_date', flat=True) instead of fetching full task instances is a good optimization that reduces memory usage and database load.

Let's verify if this change fully addresses the frame step issue mentioned in the PR objectives:

cvat/apps/engine/frame_provider.py (1)

372-372: LGTM! The change fixes frame skipping during iteration.

The modification ensures that all frames are processed during iteration, addressing the issue of missing frames in exports with non-default frame steps.

Let's verify that this change doesn't affect other functionality that might depend on frame steps:

✅ Verification successful

Frame step is still properly handled in other code paths

The verification shows that frame steps are correctly maintained throughout the codebase:

  • Frame step is used for absolute/relative frame number conversions
  • Frame ranges in segments and jobs respect the step
  • Frame step is properly applied in data import/export paths
  • The change in TaskFrameProvider.iterate_frames() only affects the iteration order, while the frame mapping logic remains intact

The removal of frame step from itertools.count() is safe as it only changes how frames are iterated internally within the provider, while all frame number translations still respect the configured step.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other references to frame steps to ensure they're not affected
rg "get_frame_step" -A 5

Length of output: 16932

@@ -369,7 +369,7 @@ def iterate_frames(
quality: FrameQuality = FrameQuality.ORIGINAL,
out_type: FrameOutputType = FrameOutputType.BUFFER,
) -> Iterator[DataWithMeta[AnyFrame]]:
frame_range = itertools.count(start_frame, self._db_task.data.get_frame_step())
frame_range = itertools.count(start_frame)
Copy link
Contributor

@zhiltsov-max zhiltsov-max Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, db_segment_frame_set below has to be converted to relative ids. There is the get_rel_frame_number() method for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted idx to absolute instead as it done in another iterate_frames

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please call dev/format_python_code.sh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to add it to git pre-commit hook?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem with determining the right interpreter in a cross-platform manner.

@zhiltsov-max
Copy link
Contributor

It would be nice to add a test for this to avoid regressions in future.

@bsekachev
Copy link
Member Author

Yes, it would be. Hovewer I do not have enough time. Let's keep the card on the agile board.

Copy link

sonarqubecloud bot commented Nov 7, 2024

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.25%. Comparing base (58775bf) to head (df81f93).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8662   +/-   ##
========================================
  Coverage    74.24%   74.25%           
========================================
  Files          401      401           
  Lines        43465    43465           
  Branches      3950     3950           
========================================
+ Hits         32270    32273    +3     
+ Misses       11195    11192    -3     
Components Coverage Δ
cvat-ui 78.53% <ø> (+0.01%) ⬆️
cvat-server 70.58% <100.00%> (ø)

@bsekachev bsekachev merged commit a6fd1e5 into develop Nov 8, 2024
34 checks passed
@bsekachev bsekachev deleted the bs/fixed_export branch November 8, 2024 13:40
@cvat-bot cvat-bot bot mentioned this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants