Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queues snapshot #1387

Merged
merged 21 commits into from
Sep 2, 2021
Merged

queues snapshot #1387

merged 21 commits into from
Sep 2, 2021

Conversation

nassiharel
Copy link
Contributor

@nassiharel nassiharel commented Aug 17, 2021

This change is Reviewable

@nassiharel
Copy link
Contributor Author

/deploy

@nassiharel nassiharel requested a review from yehiyam August 17, 2021 13:17
@hkube-ci hkube-ci temporarily deployed to dev August 17, 2021 13:18 Inactive
Copy link
Contributor

@yehiyam yehiyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 22 files at r1, 2 of 2 files at r3.
Reviewable status: 20 of 24 files reviewed, 7 unresolved discussions (waiting on @nassiharel and @yehiyam)


core/algorithm-operator/lib/deployments/algorithm-queue.js, line 60 at r3 (raw file):

    spec = applyJaeger(spec, CONTAINERS.ALGORITHM_QUEUE, options);
    spec = applyNodeSelector(spec, clusterOptions);
    spec = applyStorage(spec, options.defaultStorage, CONTAINERS.PIPELINE_DRIVER, 'algorithm-operator-configmap');

pipeline-driver? shouldn't it be algorithm-queue?


core/algorithm-queue/processed.txt, line 1 at r2 (raw file):

Statistical profiling result from isolate-0x6184770-19866-v8.log, (15921 ticks, 10408 unaccounted, 0 excluded).

no need to commit this


core/algorithm-queue/config/main/config.base.js, line 69 at r2 (raw file):

    [heuristicsNames.ENTRANCE_TIME]: 0.2,
    [heuristicsNames.BATCH]: 0.2,
    // [heuristicsNames.CURRENT_BATCH_PLACE]: 0.1

why did you change that?


core/algorithm-queue/lib/queue.js, line 183 at r2 (raw file):

    _orderQueue() {
        this.queue = orderBy(this.queue, j => j.calculated.score, 'desc');

the calculated was never used?


core/algorithm-queue/lib/queue.js, line 211 at r2 (raw file):

        }
        const pendingAmount = await this._producer.getWaitingCount();
        this.enrichmentRunner(this.queue);

not needed?


core/algorithm-queue/lib/queue-runner.js, line 34 at r2 (raw file):

    _taskRemoved(task) {
        aggregationMetricFactory.getMetric(metricsName.TIME_IN_QUEUE)(task, metricsTypes.HISTOGRAM_OPERATION.end);

did you. move the prometheus metrics someplace else?


core/resource-manager/lib/store/algorithms-flat/store.js, line 9 at r2 (raw file):

    async setData(data) {
        this._log(data);
        // do we want to limit our scoring array?

I think we can do it because the task executor will cut it anyway

@yehiyam yehiyam requested review from yehiyam and removed request for yehiyam August 17, 2021 14:08
Copy link
Contributor Author

@nassiharel nassiharel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 25 files reviewed, 7 unresolved discussions (waiting on @yehiyam)


core/algorithm-operator/lib/deployments/algorithm-queue.js, line 60 at r3 (raw file):

Previously, yehiyam wrote…

pipeline-driver? shouldn't it be algorithm-queue?

Done.


core/algorithm-queue/config/main/config.base.js, line 69 at r2 (raw file):

Previously, yehiyam wrote…

why did you change that?

Done.


core/algorithm-queue/lib/queue.js, line 183 at r2 (raw file):

Previously, yehiyam wrote…

the calculated was never used?

Done.


core/algorithm-queue/lib/queue.js, line 211 at r2 (raw file):

Previously, yehiyam wrote…

not needed?

Done.


core/algorithm-queue/lib/queue-runner.js, line 34 at r2 (raw file):

Previously, yehiyam wrote…

did you. move the prometheus metrics someplace else?

Done.


core/algorithm-queue/processed.txt, line 1 at r2 (raw file):

Previously, yehiyam wrote…

no need to commit this

Done.


core/resource-manager/lib/store/algorithms-flat/store.js, line 9 at r2 (raw file):

Previously, yehiyam wrote…

I think we can do it because the task executor will cut it anyway

Done.

@nassiharel
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev August 18, 2021 12:27 Inactive
@nassiharel
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev August 19, 2021 08:56 Inactive
@nassiharel nassiharel changed the title algorithm-queue snapshot queues snapshot Aug 22, 2021
@nassiharel
Copy link
Contributor Author

/deploy

Copy link
Contributor

@yehiyam yehiyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 22 files at r1, 1 of 1 files at r4, 13 of 18 files at r5, 3 of 10 files at r7, 25 of 27 files at r8, 8 of 8 files at r9, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @nassiharel)


core/algorithm-queue/lib/persistency/snapshot.js, line 10 at r9 (raw file):

            const start = now();
            onStart({ key, length: data.length });
            await storageManager.hkubePersistency.put({ type: TYPE, name: key, data });

you need to add volume and volumeMount to the algorithm-queue (in code) and pipeline-driver-queue (in helm)

Copy link
Contributor Author

@nassiharel nassiharel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @yehiyam)


core/algorithm-queue/lib/persistency/snapshot.js, line 10 at r9 (raw file):

Previously, yehiyam wrote…

you need to add volume and volumeMount to the algorithm-queue (in code) and pipeline-driver-queue (in helm)

Already did :)

@nassiharel nassiharel requested a review from yehiyam September 1, 2021 07:25
Copy link
Contributor

@yehiyam yehiyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @nassiharel)

@yehiyam yehiyam merged commit 383394d into master Sep 2, 2021
@yehiyam yehiyam deleted the alg-queue-snapshot branch September 2, 2021 06:10
hkube-ci pushed a commit that referenced this pull request Sep 2, 2021
* fix: redis persist

* feat: improve snapshot

* feat: improve snapshot

* feat: improve snapshot

* feat: improve snapshot

* Merge branch 'master' into alg-queue-snapshot

* feat: improve snapshot

* Merge branch 'alg-queue-snapshot' of github.com:kube-HPC/hkube into alg-queue-snapshot

* feat: add storage envs

* feat: PR changes

* feat: PR changes

* feat: improve persistency

* feat: queues snapshots

* feat: queues snapshot

* feat: queues snapshot

* feat: queues snapshot

* feat: queues snapshot

* Merge branch 'master' into alg-queue-snapshot

* Merge branch 'master' into alg-queue-snapshot

* Merge branch 'master' into alg-queue-snapshot

* Merge branch 'master' into alg-queue-snapshot .... bump version [skip ci]
hkube-ci pushed a commit that referenced this pull request Sep 2, 2021
* fix: redis persist

* feat: improve snapshot

* feat: improve snapshot

* feat: improve snapshot

* feat: improve snapshot

* Merge branch 'master' into alg-queue-snapshot

* feat: improve snapshot

* Merge branch 'alg-queue-snapshot' of github.com:kube-HPC/hkube into alg-queue-snapshot

* feat: add storage envs

* feat: PR changes

* feat: PR changes

* feat: improve persistency

* feat: queues snapshots

* feat: queues snapshot

* feat: queues snapshot

* feat: queues snapshot

* feat: queues snapshot

* Merge branch 'master' into alg-queue-snapshot

* Merge branch 'master' into alg-queue-snapshot

* Merge branch 'master' into alg-queue-snapshot

* Merge branch 'master' into alg-queue-snapshot .... bump version [skip ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

possible inconsistency upon algorithm queue crashes
3 participants