projection segment merge fixes #17460

clintropolis · 2024-11-07T20:55:25Z

changes:

fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger
modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir
fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time

Alternative to #17388, this does a nicer thing of persisting temporary files containing the id mapping buffers of parent mergers to re-use when merging projections, similar to what auto columns were doing with their dictionary.

I made a conservative but kind of odd change to DimensionHandler interface involving a default implementation of makeMerger that calls a now deprecated existing version of makeMerger to do a signature change to make this PR a bit less disruptive in the event we want to do a 31.1 patch release (i would like to do this to fix another issue with the 31 release, I will start a thread soonly). DimensionHandler is not officially an extension point, but is the mechanism which custom dimensions can be defined, so I wanted to be chill about this change. After this PR I will open another one to delete the deprecated thing and remove the default implementation. The interface was changed to add a parameter for the path to where we are writing out the segment in case a merger or serializer needs to create any temporary files, since prior to this the task define storage location was not directly available to the mergers, so this lets things write stuff to a more correct place.

While I was here I modified the 'auto' column merger to use this new segment write out path argument for its temp files instead of java.io.tmpDir.

Also fixes an issue with projections on a QueryableIndex to make them not put the time-like column as a dimension, instead only adding it as __time to ensure it doesn't get handled incorrectly.

changes: * fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger * modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir * fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time

processing/src/main/java/org/apache/druid/segment/DimensionHandler.java

+    return makeMerger(
+        outputName,
+        indexSpec,
+        segmentWriteOutMedium,
+        capabilities,
+        progress,
+        closer
+    );


gianm · 2024-11-13T16:15:48Z

processing/src/main/java/org/apache/druid/common/utils/SerializerUtils.java

+  /**
+   * Write a {@link Serializer} to a {@link Path} and then map it as a read-only {@link MappedByteBuffer}
+   */
+  @SuppressFBWarnings("NP_NONNULL_PARAM_VIOLATION")


Serializer is in a package where EverythingIsNonnullByDefault. It seems the second arg to writeTo can actually be null though. It would be nice to clean this up.

Please consider one of these things:

Add a @Nullable annotation to the second arg, update javadoc describing what happens when the smoosher is null, and remove these warning suppressions. (This one, and the DataFlowIssue one.)

Or, replace Serializer here with a more specific subclass or sub-interface that already has @Nullable on the second arg. I see that some do.

Or, update the javadoc for mapSerializer to say that not all Serializer are supported: the caller must take care to pass in one that supports a null Smoosher in writeTo.

I'm not sure which approach is best. It depends on whether all Serializer are meant to support null Smoosher or not.

the answer to this is a bit complicated, yea some serializers do not require a FileSmoosher (basically everything that does not support "large" files greater than 2gb does not need a smoosher). Technically none of the current callers need a smoosher since the new caller is just taking an IntBuffer and persisting/mapping it, and the other callers are using FixedIndexed which does not support large files yet, but accepting a Serializer here is kind of risky for future misuse.

I think I like two options here.

One option is that mapSerializer just makes its own FileSmoosher/SmooshedWriter and just write the serializer into a temporary smoosh at the same name as the temp file name, and then just return the loaded smoosh instead of a mapped byte buffer directly. This is actually already what DictionaryIdLookup (where i refactored this code from) is doing for string dictionaries, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/nested/DictionaryIdLookup.java#L235 and would let it work with any serializer implementation, and presumably the thing we want to read from the mapped file knows how to load other stuff from the smoosher.

The other option that seems worth considering is as you said, instead of passing in Serializer just pass in some other interface that writes with just WriteableByteChannel and a length argument with the expectation that its only going to write to the channel and then map the byte buffer and it fits in the buffer. This solves the immediate use case I think, but is sort of limited as long as we are stuck with ByteBuffer.

I think I'm sort of leaning towards the 'make a new smoosh' option since it seems the most flexible, its like "make a smoosh file that contains only the contents of this serializer", it does have a little extra overhead for things that do fit in a single buffer, but doesn't seem too bad. Thoughts?

IMO, a good approach is to adjust this to pass an actual FileSmoosher, and to return SmooshFileMapper rather than MappedByteBuffer.

gianm · 2024-11-13T16:51:31Z

processing/src/main/java/org/apache/druid/segment/DictionaryEncodedColumnMerger.java

@@ -101,6 +116,9 @@ public abstract class DictionaryEncodedColumnMerger<T extends Comparable<T>> imp
  @Nullable
  protected T firstDictionaryValue;

+  protected File segmentBaseDir;
+  @MonotonicNonNull
+  protected PersistedIdConversions persistedIdConversions;


Javadoc please that this is used for projections.

gianm · 2024-11-13T17:49:20Z

processing/src/main/java/org/apache/druid/segment/DimensionHandler.java

@@ -100,6 +102,25 @@ default MultiValueHandling getMultivalueHandling()
   */
  DimensionIndexer<EncodedType, EncodedKeyComponentType, ActualType> makeIndexer(boolean useMaxMemoryEstimates);

+  /**
+   * @deprecated use {@link #makeMerger(String, IndexSpec, SegmentWriteOutMedium, ColumnCapabilities, ProgressIndicator, File, Closer)}


Javadoc please about why this still exists. I think it's to avoid breaking DimensionHandler impls that exist in extensions.

yea, i'll add, though I do plan to remove this shortly after this PR is merged, just trying to not be disruptive in the event we do a 31.1 (which i'd like to do as mentioned in PR description)

gianm · 2024-11-13T17:50:46Z

processing/src/main/java/org/apache/druid/segment/DimensionMergerV9.java

+   * state which might be required for the projection mergers to do their thing. This method MUST be called prior to
+   * performing any merge work
+   */
+  default void markAsParent()


As to types that may exist in extensions: is this default impl ok? Let's say some type exists in an extension with its own dictionaries, and inherits this default impl. What happens then?

I found a missing piece for this to "just work" with default implementations, this method is fine doing nothing, but attachParent actually needs to be modified to something like

default void attachParent(DimensionMergerV9 parent, List<IndexableAdapter> projectionAdapters) throws IOException { // by default fall through to writing merged dictionary writeMergedValueDictionary(projectionAdapters); }

since it is called as an alternative to writeMergedValueDictionary, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/IndexMergerV9.java#L395, will fix in this PR

gianm · 2024-11-15T18:30:43Z

processing/src/main/java/org/apache/druid/segment/nested/DictionaryIdLookup.java

+          ColumnSerializerUtils.mapSerializer(longDictionaryFile, longDictionaryWriter, fileName)
+      );
+      try {
+        final ByteBuffer buffer = longBufferMapper.mapFile(fileName);


All of these seem to be assuming that there will be no secondary smoosh components. Is that a good assumption? Should it be checked? If this a common pattern: maybe add a mapOnlyFile to SmooshFileMapper that verifies that the mapped file is the only one?

i think this is a fine assumption for now, FixedIndexed currently only writes to a single chunk and doesn't support "large" parts that are spread across multiple containers like GenericIndexed. Once this changes, we would still map the entry point file container the same way (assuming we are still stuck with ByteBuffer and have to chunk big stuff) and pass in the smoosh mapper to FixedIndexed.read similar to how we do for the string case:

... final ByteBuffer stringBuffer = stringBufferMapper.mapFile(fileName); stringDictionary = StringEncodingStrategies.getStringDictionarySupplier( stringBufferMapper, stringBuffer, ByteOrder.nativeOrder() ).get(); ...

I could add some checks in case someone updates FixedIndexedWriter to allow it to spread across chunks but forgets to update these read call sites, though I imagine that if this is done all of the calls to FixedIndexed.read would be updated to take a mapper if we do it in a similar way as GenericIndexed.

changes: * fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger * modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir * fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time * use smoosh for temp files so can safely write any Serializer to a temp smoosh

clintropolis added the Bug label Nov 7, 2024

github-actions bot added the Area - Segment Format and Ser/De label Nov 7, 2024

github-advanced-security bot found potential problems Nov 7, 2024

View reviewed changes

clintropolis added 3 commits November 7, 2024 13:47

spotbugs check fix

8b86c82

fix inspection

f20a146

fix npe

247eaf2

clintropolis mentioned this pull request Nov 12, 2024

projections fixes #17388

Closed

gianm reviewed Nov 13, 2024

View reviewed changes

clintropolis added 3 commits November 14, 2024 01:58

use smoosh for temp files

9e4f3a1

remove stale comment

716330c

remove unused

310441b

gianm approved these changes Nov 15, 2024

View reviewed changes

clintropolis merged commit 24a1faf into apache:master Nov 16, 2024
89 checks passed

clintropolis deleted the fix-projection-merge-persistent-idmapping branch November 16, 2024 00:46

clintropolis added this to the 31.0.1 milestone Nov 22, 2024

clintropolis mentioned this pull request Nov 22, 2024

[Backport] projection segment merge fixes #17503

Merged

clintropolis mentioned this pull request Dec 4, 2024

[DRAFT] 31.0.1 Release Notes #17534

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

projection segment merge fixes #17460

projection segment merge fixes #17460

clintropolis commented Nov 7, 2024 •

edited

Loading

gianm Nov 13, 2024

clintropolis Nov 13, 2024

gianm Nov 14, 2024

gianm Nov 13, 2024

gianm Nov 13, 2024

clintropolis Nov 13, 2024

gianm Nov 13, 2024

clintropolis Nov 13, 2024

gianm Nov 15, 2024

clintropolis Nov 15, 2024 •

edited

Loading

projection segment merge fixes #17460

projection segment merge fixes #17460

Conversation

clintropolis commented Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

clintropolis commented Nov 7, 2024 •

edited

Loading

clintropolis Nov 15, 2024 •

edited

Loading