Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceCollection should not have a path #8711

Merged
merged 21 commits into from
Nov 1, 2022

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Oct 13, 2022

A ResourceCollection should have a null path and URI. It is by definition a collection of different paths and URIs, so it cannot just pick one of them and return it. This is hiding lots of bugs in the code where resource.getPath() is called and assumes it will always get a non-null value that holds all the required resources. However, when scanning for descriptors, annotations, TLDs etc. the underlying resource is often a collection, thus code that uses getPath is currently only scanning the first resource in the collection.

This PR fixes ResourceCollection, but will almost certainly break some other tests. I think we want to use other PRs to fix those uses of getPath first before considering completing this PR.

Nor name, nor filename unless all resources agree on it.
Nor name, nor filename unless all resources agree on it.
@joakime
Copy link
Contributor

joakime commented Oct 14, 2022

If we go this path, then I think ResourceCollection should also not have ...

  • lastModified()
  • length()
  • getURI()
  • getFileName()
  • getName()

The changes you are doing here also apply to MemoryResource btw

@joakime
Copy link
Contributor

joakime commented Oct 14, 2022

The code overall would be so much easier with a List<> instead of hiding this collection behind an interface that is for a single entry.

@gregw
Copy link
Contributor Author

gregw commented Oct 17, 2022

If we go this path, then I think ResourceCollection should also not have ...

  • lastModified()

lastModified has a sensible and meaningful interpretation for composite directories: the most recently modified time of all the composites

  • length()

Agreed.

  • getURI()

Yep, should return null... I think this PR already does.

  • getFileName()

I'm on the fence about this one.... if for example you have a composite base resource and resolve "WEB-INF", then you may get back a composite that contains WEB-INF directories from multiple resources in the base composite. In that case, since they are all called WEB-INF, I think it is reasonable to return "WEB-INF" for the filename. If they don't agree, then return null. That's how I have implemented it for now.

Flip side, is that if it always returned null, I don't think it would be a big problem... except perhaps for debugging.... maybe when we are naming contexts etc. So I think on balance going with the impl in this PR is OK.

  • getName()

Yep.

The changes you are doing here also apply to MemoryResource btw

Oh I thought we got rid of that class? Is it still used? If not, let's kill it. If we are still using it, then I'll also review.

@gregw
Copy link
Contributor Author

gregw commented Oct 17, 2022

The code overall would be so much easier with a List<> instead of hiding this collection behind an interface that is for a single entry.

Absolutely not!

We have a bunch of methods that can apply to a composite collection of resources that we have to think carefully about how to interpret them against a collection: is it the first; is it a combination; is it null; is it allowed at all? With a ResourceCollection class we can put all those difficult decisions in one place and make them consistent. We can police the creation of the collection to only be non-null directory resources.

If we pass around a List<Resource> instead, then

  • every usage of that list will have to make the difficult choices: is it the first; is it a combination; is it null; is it allowed at all? Inevitably different choices will be made in different places so we will have inconsistency... a file might be visible for annotation scanning but not for TLD scanning!
  • We will have duplicated code iterating over the lists in multiple places... or perhaps we decide to have static utility methods into which we pass the list? In which case we have just re-invented object oriented programming.
  • Since we could not enforce the construction of the List<Resource> to contained only non-null directory resources, then we would need to spread sanity and null checking code everywhere. We'd need to check on every usage of the collection rather than just check on the creation of the collection. So instead of doing those checks once, we could end up doing them hundreds of thousands of times per second on every request.
  • A directory is itself just a collection of other resources and we have many places where a single such collection is simple to deal with (e.g. scanning WEB-INF). Turning that simple abstraction of a collection of files into a collection of collections of files is needlessly complex.
  • The readability of the code will be reduced. Instead of web-inf.lastModified() we will have Instance last = null; for (Resource r : list) { Instant l = r.lastModified(); if (last == null || l.isAfter(last)) last = l; } return last;... repeated.... in many places.... ?!?!?!?

Note that there are valid usages for List<Resource> and we have many examples in the code. We might have multiple descriptors, or multiple jar files or multiple webapps etc. In these cases, the list is not trying to represent a single resource, but is genuinely representing multiple resources that the caller needs to explicitly handle in whatever capacity it is doing. These are not use-cases for ResourceCollection because they are not abstracting a single resource.

@joakime
Copy link
Contributor

joakime commented Oct 17, 2022

If we go this path, then I think ResourceCollection should also not have ...

  • lastModified()

lastModified has a sensible and meaningful interpretation for composite directories: the most recently modified time of all the composites

I'm not sure that's a valid use case, where do you see this being used like this?

  • length()

Agreed.

  • getURI()

Yep, should return null... I think this PR already does.

  • getFileName()

I'm on the fence about this one.... if for example you have a composite base resource and resolve "WEB-INF", then you may get back a composite that contains WEB-INF directories from multiple resources in the base composite. In that case, since they are all called WEB-INF, I think it is reasonable to return "WEB-INF" for the filename. If they don't agree, then return null. That's how I have implemented it for now.

Interesting, I'll have to see if we use this from a Directory based Resource.

Flip side, is that if it always returned null, I don't think it would be a big problem... except perhaps for debugging.... maybe when we are naming contexts etc. So I think on balance going with the impl in this PR is OK.

Most debugging should be using either getName() or getURI() as it's more clear.

  • getName()

Yep.

The changes you are doing here also apply to MemoryResource btw

Oh I thought we got rid of that class? Is it still used? If not, let's kill it. If we are still using it, then I'll also review.

The original proposal was in (the now closed, un-merged) PR #8470

Now, the work Lachlan is doing for HttpContent.Factory can easily get rid of MemoryResource, as it can serve a static HttpContent for jetty-dir.css as a fallback.
Lets wait for that work to complete and it'll be easy to remove MemoryResource.

gregw added 3 commits October 18, 2022 13:21
revert combine and related methods to return Resource and not explicitly a ResourceCollection, as if there is only 1, then a collection is not needed
@@ -112,7 +113,7 @@ public interface Context extends ClassVisibilityChecker
*/
boolean isParentLoaderPriority();

ResourceCollection getExtraClasspath();
Resource getExtraClasspath();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorban @joakime @sbordet I believe it was a miss-use of ResourceCollection when this was changed from a List<Resource>. We should very VERY rarely ever explicitly use a ResourceCollection, because the whole point of that class is to look like a singe resource. If anything needs to iterate over the contained resources, then it probably should be using a List<Resources> rather than a ResourceCollection.

However, for not, I have only partially reverted to try how it looks if extra classpath is just a Resource, which may or may not be a collection. This requires the addClassPath method below to be somewhat ResourceCollection away, as it needs to extract URLs. I've done this for now with a static stream method.... but not sure I like. Reverting all the way back to List<Resource> may be the best thing to do here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe ResourceCollection should be renamed to MultiResource to emphasise that it is-a Resource and should not be thought of as a Collection (which is an implementation detail).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better name options include:

  • MultiResource
  • CompositeDirectoryResource
  • CombinedResource
  • CombinedDirectoryResource

{
for (Resource resource: resources)
ResourceCollection.stream(resource).forEach(r ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since any Resource might be a collection, if we need URLs, URIs or Paths from a resource, then we should use this stream abstraction to get back the one or more values. The stream can then be used to pick the first, collect into a list or do forEach handling like I am below:

* @see ResourceCollection
*/
static ResourceCollection combine(List<Resource> resources)
static Resource combine(List<Resource> resources)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine should return a Resource and not a ResourceCollection, since if the passed list has only a single entry, then a collection is not needed. The caller should not need to know if it is a collection or not.

gregw added 6 commits October 18, 2022 17:20
Fixed QuickStart handling to handle collections
cleanup ResourceCollection creation. Avoid sanity checks on resolved resources.
update after merge to head
@gregw gregw requested a review from lachlan-roberts October 24, 2022 10:04
@gregw gregw marked this pull request as ready for review October 24, 2022 10:26
@gregw
Copy link
Contributor Author

gregw commented Oct 24, 2022

@joakime nudge

Copy link
Contributor

@joakime joakime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My not-yet-fully-reviewed review

return null;
}

@Override
public String getFileName()
{
String filename = null;
// return a non-null filename only if all resources agree on the same name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should make getFileName() not be the last segment name in case of Directory? Is there a use case were in we care about, and use getFileName() in a directory case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the fileName to generate a context name. I'm not 100% sure we still use this method for that, but it is a reasonable use-case to ask a directory for the name of it's last segment.

This method exists and I think this is a reasonable interpretation of it for a combined resource.

@gregw gregw requested a review from joakime October 26, 2022 21:58
Copy link
Contributor

@lachlan-roberts lachlan-roberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but there are some test failures.

*/
ResourceCollection(List<Resource> resources)
static Resource combine(List<Resource> resources)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also have a Resource combine(Resource... resources) same as the ResourceFactory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a public API. It is just there to move the implementation into the correct file. So if the ... is not needed, no need to add it.

@gregw gregw requested a review from lachlan-roberts October 27, 2022 07:59
@gregw gregw merged commit 976ab3d into jetty-12.0.x Nov 1, 2022
@gregw gregw deleted the jetty-12.0.x-resourceCollect-no-path branch November 1, 2022 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants