Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Mango to reduce the number of indexes needed client side #5592

Open
SCdF opened this issue Apr 17, 2019 · 4 comments
Open

Use Mango to reduce the number of indexes needed client side #5592

SCdF opened this issue Apr 17, 2019 · 4 comments
Labels
Priority: 2 - Medium Normal priority Type: Performance Make something faster

Comments

@SCdF
Copy link
Contributor

SCdF commented Apr 17, 2019

We currently query for data records in PouchDB / CouchDB via mapreduce queries.

Each map-reduce defined query creates its own index that needs to be kept up to date as data changes.

Mango splits the creation of the index from the use of it in a query. So, if we can convert many of our mapreduce queries to Mango, we can reduce how many indexes that need to be generated.

An example of looking at this for startup is here: #5264 (along with some other changes)

We need to be careful and measure performance, in both offline and online situations, both for query speed and individual index generation time, as well as network costs in the online scenario, and balance that against the fact that Mango allows for less indexes. It's complicated!

One core difference is that Mango queries are always equivalent to include_docs: true in relation to how CouchDB pulls data off disk into memory. One example of how this affects things is that a mapreduce view which doesn't include_docs will query much faster than the equivalent Mango query, when running in CouchDB (PouchDB should be equivalent).

The core goal is to reduce the count of indexes generated locally, while making sure we do not accidentally degrade performance elsewhere.

@SCdF SCdF added Type: Performance Make something faster Priority: 2 - Medium Normal priority labels Apr 17, 2019
@SCdF SCdF mentioned this issue Apr 17, 2019
@kennsippell kennsippell self-assigned this Jul 2, 2019
@kennsippell
Copy link
Member

kennsippell commented Aug 7, 2019

The impacts of convert a set of indexes to Mango were evaluated for feasibility and impact. In the first round of evaluations, the six MapReduce views which are warmed during bootstrap were evaluated.

Feasibility

MapReduce View Feasibility
medic-user/read We use this query with a reducer. Converting this to Mango has blocking bandwidth impacts for online users. This would be a good index to lazy load (#5859).
medic-client/contacts_by_type The sorting logic in this index cannot be reproduced in Mango. Users would need to fetch all documents and perform an in-memory sort. This has blocking bandwidth impacts for online users and poor (unmeasured) characteristics for offline users.
medic-client/data_records_by_type We use this query with a reducer. Converting this to Mango has blocking bandwidth impacts for online users. This would be a good index to lazy load (#5859).
medic-client/reports_by_validity I believe it is a bug that we build this index during startup (#5866)
medic-client/forms Feasible
medic-client/docs_by_id_lineage This MapReduce view allows for selecting a document based on the content of another document. This is not possible via Mango queries.

Performance Metrics for medic-client/forms

Execution Times - Measured via 100x tight loop

Index  Device MapReduce Mango Delta
'type' field only  Tecno F1 12,403 14,158 +1,755 (+14.1%)
'type' field only  Desktop 872 944 +72 (+8.3%)
'type' + '_attachments.xml'  Desktop 872 959 +83 (+9.5%)

Scripts:

(() =>{
  const start = performance.now();
  let chain = Promise.resolve();
  for (let i = 0; i < 100; i ++) {
    chain = chain.then(() => PouchDB('medic-user-ac1').query('medic-client/forms', { include_docs: true }));
  }
  chain.then(() => console.log('MapReduce Execution Time', performance.now() - start));
})();
(() =>{
  const start = performance.now();
  let chain = Promise.resolve();
  for (let i = 0; i < 100; i ++) {
    chain = chain.then(() => PouchDB('medic-user-ac1').find({ selector: { type: 'form', '_attachments.xml': { $exists: true }, }, }));
  }
  chain.then(() => console.log('Mango Execution Time', performance.now() - start));
})();

Build PouchDB Index - Based on sample of 3 measures

Device MapReduce Mango Delta
Tecno F1 13,895 18,022 +4127 (+29.7%)
Desktop 1124 2096 +972 (+86%)

Scripts:

(() =>{ 
  const start = performance.now();  
  PouchDB('medic-user-ac1').createIndex({ index: { fields: ['type'] } })
  .then(idx => { 
    console.log('Index', idx, performance.now() - start); 
    return PouchDB('medic-user-ac1').deleteIndex({ ddoc: idx.id, name: idx.name }); 
  }).then(console.log);
})(); 
(function() { 
  const start = performance.now(); 
  let chain = Promise.resolve(); 
  chain = chain.then(() => PouchDB('medic-user-ac1').query('medic-client/forms', { limit: 0 })); 
  chain.then(() => {
    console.log('MapReduce', performance.now() - start);
    window.indexedDB.deleteDatabase('_pouch_medic-user-ac1-mrview-bc4e9efc3baf76a2da15c82a700c0908');
  }); 
})(); 

Other Metrics

Metric MapReduce Mango Delta
Index disk use PouchDB 712 324 -388 (-54%)
Index heap use PouchDB 0 0 0
Bandwidth for online users CouchDB 11469 10058 -1411 (-12%)
Inbox.js script size with pouchdb-find 2,962,350 bytes 3,013,512 bytes +51.1 kB (+1.7%)

Measure IndexedDB Disk Use (Chrome 70 only): await window.navigator.storage.estimate()

Bandwidth Scripts:

curl http://admin:pass@localhost:5984/medic/_design/medic-client/_view/forms?include_docs=true -w '%{size_download}' 
curl http://admin:[email protected]:5984/medic/_find -X POST -H 'Content-Type: application/json' --data '{"selector": {"type": "form", "_attachments.xml":{"$exists":true}}}' -w '%{size_download}'

Conclusions

Based on these findings, Mango is not particularly well suited to help with the WebApp's bootstrapping. It is likely that Mango is better suited for use outside of the webapp's performance hot paths. Some potentially fruitful options are to use it in API/Sentinel. Or within WebApp, the filtered search indexes should be investigated as well as less hot performance code paths like editing user settings (doc_by_type).

To help scope down 3.7, and because it is believed that IDBNext is likely to impact the performance characteristics of Mango and indexing - it was recommended that this investigation be continued after the IDBNext work.

@kennsippell kennsippell removed their assignment Aug 12, 2019
@SCdF
Copy link
Contributor Author

SCdF commented Aug 12, 2019

Chatted to Kenn. Things to try in the future:

  • Slow index building is really weird, so we should try it again with another query (doc_by_type maybe) and see if this is an aberration or not
  • Repeat tests with idbnext

@garethbowen
Copy link
Member

Deferring to 3.9.0

@garethbowen
Copy link
Member

Let's wait for IDBNext to land and then run this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 2 - Medium Normal priority Type: Performance Make something faster
Projects
None yet
Development

No branches or pull requests

3 participants