Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature][pulsar-io-mongo] Add support for full message synchronization #16003

Merged
merged 16 commits into from
Sep 20, 2022

Conversation

shink
Copy link
Member

@shink shink commented Jun 10, 2022

Motivation

Now, the MongoDB source connector only supports the incremental message synchronization.
This PR adds support for full message synchronization.

Since MongDB 4.0, we can set the starting point for the change stream by the startAtOperationTime field.
So, we can set it to 0 to make start point the earliest.
See https://www.mongodb.com/docs/v4.2/reference/method/db.collection.watch/ for more information.

Modifications

  1. Improve config object.
    There are some commonalities and differences between sink configuration and source configuration.
    So, I created an abstract class called MongoAbstractConfig which contains the commonalities between them.
    MongoSourceConfig and MongoSinkConfig contain the unique configuration.

  2. Add support for full message synchronization in the source connector.

if (mongoSourceConfig.getSyncType() == SyncType.FULL_SYNC) {
    // sync currently existing messages
    // startAtOperationTime is the starting point for the change stream
    // setting startAtOperationTime to 0 means the start point is the earliest
    // see https://www.mongodb.com/docs/v4.2/reference/method/db.collection.watch/ for more information
    stream.startAtOperationTime(new BsonTimestamp(0L));
}

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

No.

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

Comment on lines 113 to 120
stream.batchSize(mongoConfig.getBatchSize()).fullDocument(FullDocument.UPDATE_LOOKUP);
stream.batchSize(mongoSourceConfig.getBatchSize())
.fullDocument(FullDocument.UPDATE_LOOKUP);

if (SyncType.FULL_SYNC.equals(mongoSourceConfig.getSyncType())) {
// sync currently existing messages
// startAtOperationTime is the starting point for the change stream
// setting startAtOperationTime to 0 means the start point is the earliest
// see https://www.mongodb.com/docs/v4.2/reference/method/db.collection.watch/ for more information
stream.startAtOperationTime(new BsonTimestamp(0L));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the key.

@shink
Copy link
Member Author

shink commented Jun 10, 2022

I will finish UT later. Convert this PR to a draft temporarily.

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Jun 10, 2022
@shink shink marked this pull request as draft June 10, 2022 10:22
@shink shink marked this pull request as ready for review June 12, 2022 10:58
@shink
Copy link
Member Author

shink commented Jun 12, 2022

@315157973 PTAL 🎉

@315157973
Copy link
Contributor

Please add some doc and make sure that the change passes the CI checks.

@shink
Copy link
Member Author

shink commented Jun 21, 2022

@315157973 OK. And where should I add the doc to?

@github-actions
Copy link

github-actions bot commented Aug 4, 2022

The pr had no activity for 30 days, mark with Stale label.

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@shink
Copy link
Member Author

shink commented Aug 10, 2022

@codelipenghui Thank you very much! But why are there three failed checks after a rerun, and before there were four. And I didn't make changes to broker.

@github-actions github-actions bot removed the Stale label Aug 11, 2022
@codelipenghui codelipenghui added this to the 2.12.0 milestone Aug 13, 2022
@codelipenghui codelipenghui added type/feature The PR added a new feature or issue requested a new feature area/connector labels Aug 13, 2022
@shink
Copy link
Member Author

shink commented Aug 21, 2022

@codelipenghui Hi, it seems that all the failed checks are flask tests, what should I do? Thank you!

@eolivelli
Copy link
Contributor

/pulsar-bot rerun-failure-checks

@poorbarcode
Copy link
Contributor

/pulsarbot rerun-failure-checks

@shink
Copy link
Member Author

shink commented Sep 17, 2022

/pulsarbot rerun-failure-checks

@poorbarcode
Copy link
Contributor

Hi @shink

https://github.com/apache/pulsar/actions/runs/3071925540/jobs/4963093961

this is a flaky test, It should be fine if it is run several times

https://github.com/apache/pulsar/actions/runs/3071925517/jobs/4963032830

I read the log and found that instability was caused by mockito. It should be fine if it is run several times

截屏2022-09-17 12 51 34

Many tests in this group(Pulsar CI Flaky) are unstable, but this does not affect merge. These problems are being fixed

@shink
Copy link
Member Author

shink commented Sep 17, 2022

@poorbarcode Thank you so much for your help!

@shink shink marked this pull request as draft September 17, 2022 05:43
@shink
Copy link
Member Author

shink commented Sep 18, 2022

/pulsarbot rerun-failure-checks

1 similar comment
@shink
Copy link
Member Author

shink commented Sep 18, 2022

/pulsarbot rerun-failure-checks

@shink shink marked this pull request as ready for review September 18, 2022 07:49
@codelipenghui codelipenghui merged commit cda2ea7 into apache:master Sep 20, 2022
@momo-jun
Copy link
Contributor

@shink seems you've already included docs in this PR, so I will update the doc-required label to doc. Thank you for adding the docs!

@momo-jun momo-jun added doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. and removed doc-required Your PR changes impact docs and you will update later. labels Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connector doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants