Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove-empty-parents and convert-bad-dob-format migrations only apply to first 25 results #9004

Open
jkuester opened this issue Apr 11, 2024 · 4 comments
Labels
Type: Bug Fix something that isn't working as intended

Comments

@jkuester
Copy link
Contributor

jkuester commented Apr 11, 2024

Describe the bug
I am pretty sure that both the convert-bad-dob-format and the remove-emtpy-parents migrations have never fully worked since the migration will only be executed for the first 25 results of their find query.

Under the hood, the Pouch.find function just passes the request through to Couch when you are running with the http adapter (aka when the server is just using Pouch to connect to a remote Couch instance).

However, the Couch Docs are clear that when no explicit limit value is provided for the _find call, it will default the limit to 25.

(To make things even more confusing, the Pouch Docs do not mention any default value for the optional limit parameter. This is probably because, when running with the local adapter, Pouch does not seem to have any default limit on how many results it will return from find. I have logged a separate Pouch issue: pouchdb/pouchdb#8927 about this inconsistency.)

Also, for the extra sharp folks out there who will note that we did not use Pouch for these migrations back when they were originally written, just know that the _find call in the initial implementation also did not include a limit.... (Maybe the behavior of Couch changed in the last 8 years and _find used to be unlimited by default, but that does not really affect my conclusion below.)

To Reproduce
You can demonstrate the issue by adding this test case to remove-empty-parents.spec.js integration test file:

  it.only('should migrate all queried records', async () => {
    const docsWithNoParents = Array.from({ length: 30 }, (_, i) => ({
      _id: `abc${i}`,
      type: 'district_hospital',
      name: 'myfacility',
      parent: {},
    }));

    await utils.initDb(docsWithNoParents);

    await utils.runMigration('remove-empty-parents');

    const { rows } = await db.get('medic-test').allDocs({ include_docs: true });
    const docsWithNoParent = rows
      .map(({ doc }) => doc)
      .filter(({ type }) => type === 'district_hospital')
      .filter(({ parent }) => !parent);

    expect(docsWithNoParent.length).to.equal(30);
  });

All 30 of the contacts should be migrated, but if you run the test it will fail because the length is only 25.

Expected behavior

Honestly, at this point I am not sure what value there is in fixing these really old migrations. How many instances do we have running with large amounts of data from < 2016? If the non-migrated data has not caused a problem until now, do we need to try to fix it?

My main concern with this code is that it seems to be the only place in our code-base currently that is using the Pouch.find function and if anyone is referencing this code (e.g. like #8928), they are going to miss the fact that they need a limit. Seems like we should either just delete these migrations or maybe add a comment in this code pointing out they are bugged.

@jkuester jkuester added the Type: Bug Fix something that isn't working as intended label Apr 11, 2024
@dianabarsan
Copy link
Member

Great find @jkuester .

I believe you are right, there's no value to urgently fix this old migration code.
I think we should find some way to document this. PouchDb.find requires additional plugins (https://pouchdb.com/guides/mango-queries.html), might not behave as expected, and I believe find is not even implemented in PouchDb indexeddb adapter (or it's severely low performance). Maybe we should just openly discourage using find for now.

@jkuester
Copy link
Contributor Author

jkuester commented Apr 15, 2024

So, I ended up going down the rabbit-hole last week digging through this stuff trying to figure out where things sat for Mango queries. In the CHT code, using Mango queries seemed to kinda get blocked up by switching to IDBNext. But the status of "IDBNext" within Pouch seems a lot more complicated... @dianabarsan, I expect you and/or Gareth probably have the inside scoop on this, but for anyone else reading this later, it seems like the IDBNext stuff was "done" (and I guess became pouchdb-adapter-indexeddb???). Stefan even got pouchdb-find working with it, but then it seems like no progress was made after 2019. The old pouchdb-adapter-idb is still what is getting used by pouchdb-browser (which is the package that cht-core/webapp depends on). From reading the old threads, it seems there was lingering performance concerns about IDBNext. The most recent development is that towards the end of 2022, Alex Anderson did some informal perf tests.

All this is to say, I agree that more testing/verification is needed before I would be comfortable using Mango queries in the webapp (at least some clarity on whats up with "IDBNext").

This is probably a subject for a different thread, but the Pouch Docs are pretty optimistic about Mango queries:

Finally, it's important to understand that Mango queries are much easier to use than map/reduce queries, and they can usually satisfy 99% of use cases. The point of map/reduce is to provide an extremely advanced API for building secondary indexes, suitable for those with specific querying needs.

That statement does contextualize a lot of the developer frustration I have seen (and personally experienced) around Couch views. I do wonder if it would be possible to lower development/maintenance costs by using Mango indexes/queries without incurring too much of a performance cost...

@garethbowen
Copy link
Member

The performance improvements on idbnext has just wrapped up and it's starting to go through the release process. There is no way to migrate data from idb -> idbnext so it won't be a drop in replacement for us but my plan is to get to a point where new users use idbnext and old ones just work. Once it's proven in production we can talk about how to do the migration. The first step is to get pouch released, then run our own performance tests over it to ensure idbnext performs better for our workload, and then go from there.

Regarding mango queries, my understanding is they're designed to improve the developer experience, but there's some additional landmines, for example, the query may work without an index which is fine in dev and terrible in prod. This is especially tricky if we support multiple idb adapters.

@dianabarsan
Copy link
Member

I added an issue to just remove these migrations: #9639 along with all other migrations that have been added before 3.x.
I think these should only end up running on an existent data set if the migration_log doc was tampered with somehow (any cause).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Fix something that isn't working as intended
Projects
None yet
Development

No branches or pull requests

3 participants