Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-14005 - GetFileResource processor #9519

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pvillard31
Copy link
Contributor

@pvillard31 pvillard31 commented Nov 14, 2024

Summary

NIFI-14005 - GetFileResource processor

The goal of this new GetFileResource processor is to provide a way for a user to inject a File Resource as a FlowFile with custom attributes. It means that the processor would be able to specify a path to a file and its content would be used as FlowFile's content. By leveraging the asset feature, it makes it easy to load a test dataset as a FlowFile in a cloud native environment (container based).

Alternatives that have been considered:

  • GenerateFlowFile with a "Custom File/Content" property. However, a processor being able to access the local file system needs to be associated with a specific set of permissions and that would be a significant breaking change for all users already using GenerateFlowFile processor where such permissions are enforced.
  • GetFile with some specific improvements. However, longer term we would likely want to completely remove this processor and only have ListFile/FetchFile. Beside, its configuration is overly complex for the intended goal here.

Testing:

  • tested with local file reference
  • tested with file referenced via URL
  • tested using Asset feature and Parameter reference

For information:

$ ./bin/cli.sh nifi create-asset -p nifi-cli.properties -pcid dc39db55-1eb2-3d26-9a8a-caaf097cfcc2 -af /Users/pierre/dev/myFile.txt

b099141f-2bc4-3073-8dc0-243dac470f08

$ ./bin/cli.sh nifi add-asset-reference -p nifi-cli.properties -aid b099141f-2bc4-3073-8dc0-243dac470f08 -pcid dc39db55-1eb2-3d26-9a8a-caaf097cfcc2 -pn MyAsset

The parameter #{MyAsset} can then be used in the processor (assuming proper Parameter Context binding on the Process Group).

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @pvillard31. It looks like a good approach in general, I noted a handful of minor adjustments.

@pvillard31
Copy link
Contributor Author

Thanks for the review @exceptionfactory - I pushed a commit to address your comments

@EndzeitBegins
Copy link
Contributor

Thank you for the addition @pvillard31.
I just glanced over the code, so I might've missed something, but the overall code looked good to me.

I just wondered, Is there a particular reason to have a distinct property for adding a mime.type attribute?
From what I've gathered users can add arbitrary attributes to the resulting FlowFile using dynamic properties. There are other Core attributes used quite regularly by other processors, such as filename.

Just trying to understand, is there a reason we explicitly want / need a property for this particular attribute?
If not, we might not introduce a separate property and simplify the implementation even further?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants