Skip to content

Commit

Permalink
add High-Level Design to handle dependency.
Browse files Browse the repository at this point in the history
  • Loading branch information
thetumbled committed Nov 26, 2024
1 parent dd32203 commit e08f5ea
Showing 1 changed file with 39 additions and 25 deletions.
64 changes: 39 additions & 25 deletions pip/pip-393.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,26 @@ ledgerMap.put(ledgerId, entrySet);
map.put(timestamp, ledgerMap);
```

### Space complexity analysis
#### Space complexity of `ConcurrentLongLongPairHashMap`
### Configuration

Add a new configuration `negativeAckPrecisionBitCnt` to control the precision of the redelivery time.
```
@ApiModelProperty(
name = "negativeAckPrecisionBitCnt",
value = "The redelivery time precision bit count. The lower bits of the redelivery time will be\n" +
"trimmed to reduce the memory occupation. The default value is 8, which means the redelivery time\n" +
"will be bucketed by 256ms. In worst cases, the redelivery time will be 512ms earlier(no later)\n" +
"than the expected time. If the value is 0, the redelivery time will be accurate to ms.".
)
private long negativeAckPrecisionBitCnt = 8;
```
The higher the value, the more entries will be grouped into the same bucket, the less memory occupation, the less accurate the redelivery time.
Default value is 8, which means the redelivery time will be bucketed by 256ms. In worst cases, the redelivery time will be 512ms earlier(no later)
than the expected time.


## Space complexity analysis
### Space complexity of `ConcurrentLongLongPairHashMap`
Before analyzing the new data structure, we need to know how much space it take before this pip.

We need to store 4 long field for `(ledgerId, entryId, partitionIndex, timestamp)` for each entry, which takes `4*8=32byte`.
Expand All @@ -113,7 +131,7 @@ In the experiment showed in the PR, there are 100w entries in the map, taking up
the space utility rate is 30/64=0.46, in the range of `[0.15, 0.66]`.


#### Space complexity of the new data structure
### Space complexity of the new data structure
The space used by new data structure is related to several factors: `message rate`, `the time deviation user accepted`,
`the max entries written in one ledger`.
- Pulsar conf `managedLedgerMaxEntriesPerLedger=50000` determine the max entries can be written into one ledger,
Expand Down Expand Up @@ -145,7 +163,7 @@ each entry takes `0.163byte` by average.
- when 4096<M<50000, `D = ceil(M/50000)*size(bitmap) = 1 * 8KB = 8KB`, each entry takes `8*1024/M=8192/M byte` by average.
- when M<4096, `D = ceil(M/50000)*size(bitmap) = 1 * 4M byte = 4M byte`, each entry take `4 byte` by average.

#### Conclusion
### Conclusion
Assuming N is the number of entries, M is the number of messages in one bucket.
- `ConcurrentLongLongPairHashMap`: `48N` byte in best case, `213N byte` in worst case.
- New data structure:
Expand All @@ -155,42 +173,38 @@ Assuming N is the number of entries, M is the number of messages in one bucket.

Some experiment results are showed in the PR, we can fine tune the configuration to get the best performance.

### Effect
## Effect

#### Memory occupation is high
### Memory occupation is high
With such kind of design, we can reduce the memory occupation of `NegativeAcksTracker` to 1% less than the current implementation.
The detailed test result will be provided in the PR.

#### Code execution efficiency is low
### Code execution efficiency is low
With the new design, we can avoid the iteration of all entries in `NegativeAcksTracker.nackedMessages`, and only iterate the entries
that need to be redelivered.

#### Redelivery time is not accurate
### Redelivery time is not accurate
With the new design, we avoid the fixed interval of the redelivery check time. We can control the precision of the redelivery time
by trimming the lower bits of the timestamp. If user can accept 1024ms deviation of the redelivery time, we can trim the lower
10 bits of the timestamp, which can group a lot

#### Multiple negative ack for messages in the same entry(batch) will interfere with each other
### Multiple negative ack for messages in the same entry(batch) will interfere with each other
With the new design, if we let msg1 redelivered 10s later, then let msg2 redelivered 20s later, these two nacks will not interfere
with each other, as they are stored in different buckets.


### Configuration
## High-Level Design
As this pip introduce new dependency `fastutil` into client, which is very large(23MB), while few classes are used, we need to
reduce the size of the dependency.

Though there is alternative dependency `fastutil-core`, which is smaller(6MB), but it is also
relatively large and using `fastutil-core` will introduce another problem on the broker side since there's already `fastutil` jar
which also includes `fastutil-core` jar classes.

The optimal solution would be to include only the classes from fastutil into the shaded pulsar-client and pulsar-client-all
which are really used and needed. This could be achieved in many ways. One possible solution is to introduce an intermediate
module for shaded pulsar-client and pulsar-client-all that isn't published to maven central at all.
It would be used to minimize and include only the classes from fastutil which are required by pulsar-client shading.

Add a new configuration `negativeAckPrecisionBitCnt` to control the precision of the redelivery time.
```
@ApiModelProperty(
name = "negativeAckPrecisionBitCnt",
value = "The redelivery time precision bit count. The lower bits of the redelivery time will be\n" +
"trimmed to reduce the memory occupation. The default value is 8, which means the redelivery time\n" +
"will be bucketed by 256ms. In worst cases, the redelivery time will be 512ms earlier(no later)\n" +
"than the expected time. If the value is 0, the redelivery time will be accurate to ms.".
)
private long negativeAckPrecisionBitCnt = 8;
```
The higher the value, the more entries will be grouped into the same bucket, the less memory occupation, the less accurate the redelivery time.
Default value is 8, which means the redelivery time will be bucketed by 256ms. In worst cases, the redelivery time will be 512ms earlier(no later)
than the expected time.


# Backward & Forward Compatibility
Expand Down

0 comments on commit e08f5ea

Please sign in to comment.