-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance with better algorithms and caching of expensive computed values #1594
Conversation
I have some old wallets but this is different league! :) |
Did benchmarking with some three years old wallet, here's my results. Before:
After:
|
I haven't tried the One thing to be aware of is that you won't see the full benefit of the cache until after you've run a command that rewrites your wallet on disk. For my testing, I actually added a dummy |
There are bunch of references to removed
Otherwise I think that concept of first three commits could be also split into separate PR for easier reviews and testing. P.S. See also two review comments above - I added typehints, that's what we now try to do with new code or code we touch and in this case they are simple ones. |
Look more carefully. I didn't remove either of those functions from the wallet class. I only removed them from |
Ohh, right, forgot that there are two of them. They are called by UTXO manager tests. Seems simple to rewrite these few tests with new method. https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/master/test/jmclient/test_utxomanager.py |
Ahh, okay. Typically you wouldn't want to have unit tests asserting private implementation details; you would test for correct functionality of the public interfaces, allowing for the underlying implementation to be changed out freely.
Yes, |
Their implementations were identical to those in the superclass.
utxo_d = [] for k, v in disabled.items(): utxo_d.append(k) {'frozen': True if u in utxo_d else False} The above was inefficient. Replace with: {'frozen': u in disabled} Checking for existence of a key in a dict takes time proportional to O(1), whereas checking for existence of an element in a list takes time proportional to O(n).
Sometimes calling code is only interested in the balance or UTXOs at a single mixdepth. In these cases, it is wasteful to get the balance or UTXOs at all mixdepths, only to throw away the returned information about all but the single mixdepth of interest. Implement new methods in BaseWallet to get the balance or UTXOs at a single mixdepth. Also, correct an apparent oversight due to apparently misplaced indentation: the maxheight parameter of get_balance_by_mixdepth was ignored unless the include_disabled parameter was passed as False. It appears that the intention was for include_disabled and maxheight to be independent filters on the returned information.
Rather than evaluating wallet_service.get_utxos_by_mixdepth()[md], instead evaluate wallet_service.get_utxos_at_mixdepth(md). This way we're not computing a bunch of data that we'll immediately discard.
The algorithm in get_imported_privkey_branch was O(m*n): for each imported path, it was iterating over the entire set of UTXOs. Rewrite the algorithm to make one pass over the set of UTXOs up front to compute the balance of each script (O(m)) and then, separately, one pass over the set of imported paths to pluck out the balance for each path (O(n)).
2625747
to
ed20951
Compare
Hoist _populate_script_map from BIP32Wallet into BaseWallet, rename it to _populate_maps, and have it populate the new _addr_map in addition to the existing _script_map. Have the constructor of each concrete wallet subclass pass to _populate_maps the paths it contributes. Additionally, do not implement yield_known_paths by iterating over _script_map, but rather have each wallet subclass contribute its own paths to the generator returned by yield_known_paths.
Deriving private keys from BIP32 paths, public keys from private keys, scripts from public keys, and addresses from scripts are some of the most CPU-intensive tasks the wallet performs. Once the wallet inevitably accumulates thousands of used paths, startup times become painful due to needing to re-derive these data items for every used path in the wallet upon every startup. Introduce a persistent cache to avoid the need to re-derive these items every time the wallet is opened. Introduce _get_keypair_from_path and _get_pubkey_from_path methods to allow cached public keys to be used rather than always deriving them on the fly. Change many code paths that were calling CPU-intensive methods of BTCEngine so that instead they call _get_key_from_path, _get_keypair_from_path, _get_pubkey_from_path, get_script_from_path, and/or get_address_from_path, all of which can take advantage of the new cache.
ed20951
to
5bc7eb4
Compare
src/jmclient/wallet.py
Outdated
@@ -2116,7 +2110,7 @@ def _get_supported_address_types(cls): | |||
|
|||
def get_script_from_path(self, path): | |||
if not self._is_my_bip32_path(path): | |||
raise WalletError("unable to get script for unknown key path") | |||
return super().get_script_from_path(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, I do like this logic better.
Starting from a current hierarchy of (ImportWallet), (BIP32Wallet), (BaseWallet), you're making it so that we always try to get the script as part of the BIP32 hierarchy for this wallet (we always have that BIP32 structure); if we fail, we fall back on a generic "type discovery" routine (get_key_from_path), which then gives us a 'cryptoengine' (never liked that name but whatever) that will be able to output a script for a key. This process can of course fail, but we just see the failure occur in a different place than before, and more importantly, now if we have new Mixins (or superclasses) and support new script types/engines it should naturally slot in, I think.
Side note: I have always strongly disliked the idea of importing keys in the wallet (even since literally the first few weeks of the project), so I have paid very little attention to the import functions, never using them myself. I did have to spend many painful hours figuring out details about private key formats back in the day though, to ensure nothing went catastrophically wrong. All this to say: I would really much prefer to remove any such functionality, but I know that people have used it quite a bit, so I'm rather torn on the subject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have the right understanding of it. The overarching intention is to have the four principal getters — _get_keypair_from_path
, _get_pubkey_from_path
, get_script_from_path
, and get_address_from_path
— implemented by default (in BaseWallet
) in terms of the fundamental getter _get_key_from_path
, whose default implementation raises NotImplementedError
. Then each subclass or mix-in overrides _get_key_from_path
with an implementation that handles only its own recognized form of paths and defers all other paths to the superclass. All of the convenience getters — get_addr
, get_script
, addr_to_path
, script_to_path
, addr_to_script
, script_to_addr
, pubkey_to_script
, and get_key_from_addr
— are defined (only in BaseWallet
) in terms of the four principal getters and the fundamental getter.
_get_key_from_path
- Default implementation in
BaseWallet
raisesNotImplementedError
. The expectation is that all wallet subclasses and mixins will override. ImportWalletMixin
overrides it to return imported private keys only for imported paths and defers all other paths to the superclass.BIP32Wallet
overrides it to return BIP32-derived private keys only for BIP32 paths and raisesWalletError
for all other paths. It could alternatively defer all other paths to the superclass, but sinceBIP32Wallet
is not a mixin, we know that the superclass would just raiseNotImplementedError
anyway, so we raise a more meaningful error instead.FidelityBondMixin
overrides it to return(privkey, locktime)
tuples only for timelocked paths and defers all other paths to the superclass.FidelityBondWatchonlyWallet
overrides it to raiseWalletError
since watch-only wallets can't provide private keys.
_get_keypair_from_path
- Default implementation in
BaseWallet
calls_get_key_from_path
to get the engine and private key for the path and then passes the private key toengine.privkey_to_pubkey
to derive the public key for the path. BIP32Wallet
overrides it really for no good reason since the default implementation would work, but I put it in there for parallelism withFidelityBondMixin
.FidelityBondMixin
overrides it to return(pubkey, timelock)
tuples only for timelocked paths and defers all other paths to the superclass.FidelityBondWatchonlyWallet
overrides it to raiseWalletError
since watch-only wallets can't provide private keys.
_get_pubkey_from_path
- Default implementation in
BaseWallet
calls_get_keypair_from_path
and returns only the public key and engine. FidelityBondWatchonlyWallet
overrides it to return(pubkey, timelock)
tuples only for timelocked paths or plain public keys (derived from the master public key) for all other BIP32 paths and defers all other paths to the superclass.
get_script_from_path
- Default implementation in
BaseWallet
calls_get_pubkey_from_path
to get the engine and public key for the path and then passes the public key toengine.pubkey_to_script
to derive the script for the path. BIP32Wallet
overrides it to increment the index cache and populate the script-to-path and address-to-path maps as needed before deferring to the superclass.
get_address_from_path
- Default implementation in
BaseWallet
calls_get_pubkey_from_path
to get the engine for the path andget_script_from_path
to get the script for the path and then passes the script toengine.script_to_address
to derive the address for the path. BIP32Wallet
overrides it to increment the index cache and populate the script-to-path and address-to-path maps as needed before deferring to the superclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good summary, very helpful, thanks, on the connections between these calls, but:
_get_pubkey_from_path
doesn't exist though? Either before or after this patch? Or I'm being very dumb here :) There is only _get_key_from_path which is used for get_script_from_path
and get_address_from_path
. I could certainly see why it might be useful though!
Edit: Oh, I see, this is a change happening in a later commit.
Reviewed b58ac67 fc1e000 184d76f b58ac67 - see my comment, agreed fc1e000 and 184d76f seem like very good changes. The first improves the performance of show_utxos for large wallets, and I take note of that detail on list vs dict, will remember it in future! For the second, I was careful to audit (at least, by eye) that none of the details of the return formats of utxos and balances have changed. I did not review (nor run, yet) any of the tests. |
64f18bc - it's in fact a simple change to an, in hindsight, pretty extreme case of redundant calculation. This solves a mystery (for me), over the years: why did I occasionally hear reports of people having a wallet taking an hour to sync, when most people (including myself), saw slowdowns, but nothing too crazy, with bigger wallets. The difference is I never used imported keys. (and I still don't like it - even after this improvement ! :) ). As to the change itself, it is very simple (in particular, after the previous 2 commits 184d76f and 77f0194) and I have no comments or questions. |
Just a brief additional anecdotal note of testing a "medium used" wallet (350 used addresses), which has no imported keys: Testing (i have a manual password entry in there, drop a few seconds). I'm not sure if it was mentioned elsewhere in thread or code comments, but an important property: the additional cache, being just a different key in the dict, does not affect old code (i.e. pre- this PR), i.e. old code can sync etc. fine, it just ignores the cached data. |
Right, that was an intentional property. Also, a wallet file that has had a cache added to it can be used and modified by an older JoinMarket version that doesn't know about the cache, and then subsequently a newer JoinMarket version that does know about the cache will not barf on the wallet file that was modified by the older version. In other words, this change is both forward- and backward-compatible. |
I'm sure we're close to merging this, modulo actually testing a few things that the test suite doesn't cover but: on my 'paranoia' point: I rather wish I hadn't used that term, as it focused our minds, I think, on the more extreme aspects: hardware errors, which is always a tricky one; there are multiple layers of checks (I'm thinking about magic bytes and AES encryption, error checking in hardware, error checking in addresses etc), which doesn't dismiss the concern, but still .. meanwhile probably the reason I originally had the gut reaction "hmm, not entirely safe" is more the thing that you can never really anticipate: some stupid error in software, triggered by an unexpected circumstance. Imagine if addresses were stored cleanly/correctly, but .. for some bizarre reason they were just the wrong address. Or script. A reasonable counter is 'the storage process is dumb; if you stored a wrong mapping of path/address or similar, then that error existed in the running code, before you persisted it, so the risk existed to start with'. So at worst you could only imagine this being relevant if the code that actually effects the storage is, itself, bugged. But, again, who knows what failure of imagination we might have here ... Overall I think it just makes sense to add something that says 'if you are requesting a destination address then, even though this should not hit the cache (but: gap limit?), if it does, double check that it fits what we currently store as our master secret', simply because that won't affect performance meaningfully. Do people agree with me on that? It just seems like the right level of carefulness. |
Add a validate_cache parameter to the five principal caching methods: - _get_key_from_path - _get_keypair_from_path - _get_pubkey_from_path - get_script_from_path - get_address_from_path and to the five convenience methods that wrap the above: - get_script - get_addr - script_to_addr - get_new_script - get_new_addr The value of this new parameter defaults to False in all but the last two methods, where we are willing to sacrifice speed for the sake of extra confidence in the correctness of *new* scripts and addresses to be used for new deposits and new transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for working on this.
Tested c3c10f1 a little and I have skimmed through the code.
Test suite passes locally. I've done some manual testing, including going back and forth between master and this PR. All worked well.
My timings (I'm using a small wallet, the same as per #1349 (comment)). I've repeated this a few times and the results seem consistent:
Master
$ time python3 scripts/wallet-tool.py wallet.jmdat
real 0m11.842s
user 0m8.335s
sys 0m0.222s
This PR
$ time python3 scripts/wallet-tool.py wallet.jmdat
real 0m4.063s
user 0m1.273s
sys 0m0.215s
Left some minor comments/nits.
I didn't review carefully the low level logic.
Re cache validation:
Do people agree with me on that? It just seems like the right level of carefulness.
I'm not sure what's the correct balance between performance and security here. The current validation checks seem sensible to me.
I guess it would be good to have lot of tests to ensure, amongst everything else, that a wrong cache is always caught when validation is requested by the caller.
A more general comment I'd make is that this code has been slow basically since the beginning, so I think it can stay slow just a little longer if that increases the chance of someone reviewing this PR or anyway weighing in.
balances = collections.defaultdict(int) | ||
for md in range(self.mixdepth + 1): | ||
balances[md] = self.get_balance_at_mixdepth(md, verbose=verbose, | ||
include_disabled=include_disabled, maxheight=maxheight) | ||
return balances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 184d76f:
Is there a reason to use a defaultdict
? I see we are assigning to it anyway.
Can it just be
return {md: self.get_balance_at_mixdepth(md, verbose=verbose,
include_disabled=include_disabled, maxheight=maxheight) for md in range(self.mixdepth + 1)}
or anyway a normal dict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this exists to handle mixdepth changes/increases of the wallet. But it's been a long time since I wrote the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since we are assigning to it, not accessing the value, there should be no difference? Maybe there are some cases I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaultdict is returned by the function. In order to reason about it you'd have to check the calling code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. If that's the case that callers need this it might be good to spell this out in the docstring that this intentionally return a defaultdict
to save the caller a check when searching for missing mixdepths.
script_utxos = collections.defaultdict(dict) | ||
for md, data in mix_utxos.items(): | ||
if md > self.mixdepth: | ||
continue | ||
for md in range(self.mixdepth + 1): | ||
script_utxos[md] = self.get_utxos_at_mixdepth(md, | ||
include_disabled=include_disabled, includeheight=includeheight) | ||
return script_utxos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 184d76f:
Same question about defaultdict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that returning a defaultdict
is part of the contract of the function and didn't want to break that contract. If there had been a -> dict
type hint, then I would have been free to return a normal dict
instead.
assert isinstance(addr, str) | ||
return addr in self._addr_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 01ec2a4:
Is this "type-check"assert
necessary? I see we use them a lot in JM already, and we probably should move away from them, In general, I think a better way to do this in Python is by using type hints and enforcing them in the test suite.
Alternatively, if we want to keep a stronger check, I think it might be better to use something like:
if not isinstance(addr, str):
raise TypeError("Address should be a string")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "type-check"assert necessary
Well, no, it's not. That's the reason why it's an assert. When running the python interpreter in optimization mode (python -O
) it will strip out all asserts but during development you still get the benefit of additional sanity checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but by that logic we should then add assert
in almost every function and method. Also, if we really want to do this, any reason to prefer assert
over TypeError
? I doubt the performance is at all noticeable (and most people don't run in optimized mode anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, with the introduction of type hints this kind of assert can be expressed better in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints are ignored at runtime. In order for the method to return a correct result, the address given must be of the expected type, or else the method will quietly return False
even when the address is present when encoded in the expected type. Add a type hint, sure, but don't remove the runtime check.
Generally I don't support hard checks for preconditions that should always be true if the code is correct. Exceptions are for exceptional conditions. Incorrect code is not an exceptional condition; it's a bug. That's exactly what assertions are meant to catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints are ignored at runtime
Yeah but this is Python. I just don't think adding assert
/Exception
everywhere is a scalable approach. I'm sure there are plenty of other cases where a wrong type would be a problem like here.
Exceptions are for exceptional conditions. Incorrect code is not an exceptional condition; it's a bug.
That's reasonable, tho in this case, we don't know what the caller is gonna provide. It doesn't seem impossible to me that a caller somewhere might use a try/except
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notwithstanding C++'s std::logic_error
, exception handling really isn't supposed to be used for foreseeable errors. A foreseeable error, by definition, is not exceptional. A caller shouldn't be wrapping a call in try/except
to guard against its own misapplication of a called function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. Aren't all exceptions foreseeable, or we wouldn't write an exception for it? In any case, terminology aside, there are plenty of (built-in) functions in Python that raise TypeError
, and using try/except
is very common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't all exceptions foreseeable, or we wouldn't write an exception for it?
Fair point. I should have said "avoidable." In the case of an error that will only arise if the code is incorrect, that is a case for an assertion. In the case of an error that may arise due to conditions outside of the programmer's control, that is a case for an exception. Pedantic ideals aside, I will grant you that there are many languages that (ab)use exceptions to signal programming errors, and I'm at all not surprised that Python is one of them.
I wouldn't raise a blocking objection if someone wanted to upgrade type-checking assertions to exceptions.
assert isinstance(script, bytes) | ||
path = self._script_map.get(script) | ||
assert path is not None | ||
return path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 01ec2a4:
Same thing with the assert
.
I would personally also replace the assert path is not None
, with a ValueError
def script_to_addr(self, script, | ||
validate_cache: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are adding type-hints to validate_cache
, we could maybe add type-hints to the rest too?
See #1571. Main issue is to figure out how to handle custom gaplimits. |
Finished code review of c3c10f1 I like the logic and structure of this, it seems very clean. First the commit comment explains it clearly, and the logic of 'if is None or validate_cache, reconstruct, then if was not None, compare the two and raise if disagreement' is exactly what I think we need. Also double checked 'entry points', it's as I remember: we just use Hopefully we don't need more edits at this point, just more testing. |
A list of tests, I will edit as I get through them: Regtest tests, using a persistent file of course (not the memory-only regtest wallet!): CLI actions: [:heavy_check_mark:] tumbler run on regtest [:heavy_check_mark: ]sendpayment no-coinjoin with PSBT output [:heavy_check_mark: ] import a single private key into the wallet after activity above: [ ✔️ ] displayall (doing these last because dealing with FB times is janky in test env): [ ✔️ ]get and fund fidelity bond address wallet re-sync: [ ✔️ ] run [:heavy_check_mark: ] do one further coinjoin tx to check sync still functioning correctly. using POSTMAN for manual tests of wallet rpc calls to jmwalletd backend: [ ✔️ ] unlock, display, session (sanity check, displayed balances match) [ ✔️ ] payjoin as sender At the end of this set of tests, there were 35 used addresses in the wallet (including 2 FB addresses and 1 imported address) and the size of the wallet file is 203kB. |
I realise that much of the manual testing above is of marginal, or no, relevance to the code here, but my philosophy was to try and address any possible "unknown unknowns", basically to see if there was some unexpected interaction with the main existing workflows. So far the report is very positive, the one thing that happened that was unexpected to me, was unrelated to this PR (the thing about fast sync with imported keys e.g.), and I didn't see anything like a bug from this code. The only testing thing that remains outstanding is to ensure that the |
tACK c3c10f1 Some extra commentary that I think may be helpful:
|
@AdamISZ: I definitely agree on not squashing. Each of these commits embodies a logically independent change, and I was very careful not to break the build or the unit tests at each commit, so bisecting across/through them will work. |
@PulpCattel @kristapsk I'm going to leave it in your guys' hands when to merge this, though I do think it should be sooner rather than later! (I would not like to redo the testing work because of substantial rebases (unlikely, I admit, but still), and obviously, it's such a big change in user experience, that's already a good reason.) |
@AdamISZ I want to do some manual testing and benchmarking myself, then I'm ok with merging. |
else: | ||
assert 0 | ||
return self.script_to_addr(script) | ||
|
||
def get_external_addr(self, mixdepth): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_external_addr(self, mixdepth): | |
def get_external_addr(self, mixdepth: int) -> str: |
def get_external_addr(self, mixdepth): | ||
""" | ||
Return an address suitable for external distribution, including funding | ||
the wallet from other sources, or receiving payments or donations. | ||
JoinMarket will never generate these addresses for internal use. | ||
""" | ||
return self._get_addr_int_ext(self.ADDRESS_TYPE_EXTERNAL, mixdepth) | ||
return self.get_new_addr(mixdepth, self.ADDRESS_TYPE_EXTERNAL) | ||
|
||
def get_internal_addr(self, mixdepth): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_internal_addr(self, mixdepth): | |
def get_internal_addr(self, mixdepth: int) -> str: |
include_disabled=include_disabled, maxheight=maxheight) | ||
return balances | ||
|
||
def get_balance_at_mixdepth(self, mixdepth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_balance_at_mixdepth(self, mixdepth, | |
def get_balance_at_mixdepth(self, mixdepth: int, |
@@ -1054,8 +1154,8 @@ def is_known_script(self, script): | |||
return script in self._script_map | |||
|
|||
def get_addr_mixdepth(self, addr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_addr_mixdepth(self, addr): | |
def get_addr_mixdepth(self, addr: str) -> int: |
def _populate_maps(self, paths): | ||
for path in paths: | ||
self._script_map[self.get_script_from_path(path)] = path | ||
self._addr_map[self.get_address_from_path(path)] = path | ||
|
||
def addr_to_path(self, addr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def addr_to_path(self, addr): | |
def addr_to_path(self, addr: str): |
assert isinstance(addr, str) | ||
path = self._addr_map.get(addr) | ||
assert path is not None | ||
return path | ||
|
||
def script_to_path(self, script): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def script_to_path(self, script): | |
def script_to_path(self, script: bytes): |
@PulpCattel I know you don't like the idea of rushing, but I am very keen to not leave this one sitting around and then have to rebase and then recheck it in painstaking detail. In particular, I think it is way better to have big changes like this merged well before a release, so that inevitable but hopefully very minor issues will be discovered naturally by those few users/devs who work on master. @kristapsk obviously no issue with those type hint changes, it can be included here, or after. @whitslack do you have any further thoughts or want to change anything, or presumably you think this is ready? |
@AdamISZ: I am very happy with this as it stands. I have no objections to the suggested type hints. I didn't add them myself because I hadn't touched any of those lines and preferred to minimize my changeset to maximize its chances of being merged. I'll leave it to one of you to commit the new type hints and would humbly recommend not squashing them into any of my commits since they're not logically related but are more like drive-by, opportunistic nitpicks. |
I think we should merge this as-is. Does anyone disagree? Once this is merged I'll be happy to do the minor task as described in point 3 of this comment, and probably also point 4, there, at the same time. |
I'm ok with merging this as-is. |
OK. Given the need with such (larger) changes to be tested by users well before release, it makes sense to merge it now then. Sorry for contradicting slightly what I said here! |
6ec6308 Deduplicate wallet error messages (Kristaps Kaupe) Pull request description: Already proposed something similar while reviewing #1594. Also, type hints and f-strings. ACKs for top commit: AdamISZ: concept ACK 6ec6308 Tree-SHA512: 987638b5ca74214d56f64288bae5c13b53a4ab1d310dc4df03c086e6a81530ec84ec49d5925edc92631a781cba4857fa8e12704d842ca74a8caaa83c1c2cf8b0
f2ae8ab Don't validate cache during initial sync. (Adam Gibson) Pull request description: Prior to this commit, the calls to get_new_addr in the functions in the initial sync algo used for recovery, used the default value of the argument validate_cache, which is True (because in normal running, get_new_addr is used to derive addresses as destinations, for which it's safer to not use the cache, and as one-off calls, are not performance-sensitive). This caused initial sync to be very slow in recovery, especially if using large gap limits (which is common). After this commit, we set the argument validate_cache to False, as is intended during initial sync. This allows the optimised performance from caching to be in effect. See earlier PRs #1594 and #1614 for context. Top commit has no ACKs. Tree-SHA512: 2e16642dbb071f3f4e8c3bcfc6cfb71b63865acfb576be6f31b2a8945795b9e9a5de5c93bc2ed534db8ee9ac12cbddef180c303ed6e3c30c89f6f67d49a2d834
Note: Reviewing each commit individually will make more sense than trying to review the combined diff.
This PR implements several performance enhancements that take the CPU time to run
wallet-tool.py display
on my wallet down from ~44 minutes to ~11 seconds.The most significant gains come from replacing an O(m*n) algorithm in
get_imported_privkey_branch
with a semantically equivalent O(m+n) algorithm and from adding a persistent cache for computed private keys, public keys, scripts, and addresses.Below are some actual benchmarks on my wallet, which has 5 mixdepths, each having path indices reaching into the 4000s, and almost 700 imported private keys.
origin/master
(baseline)wallet
: remove a dead store inget_index_cache_and_increment
wallet
: addget_{balance,utxos}_at_mixdepth
methodswallet_utils
: use newget_utxos_at_mixdepth
methodwallet_showutxos
: use O(1) check for frozen instead of O(n)get_imported_privkey_branch
: use O(m+n) algorithm instead of O(m*n)wallet
: add_addr_map
, paralleling_script_map
wallet
: add persistent cache, mapping path->(priv, pub, script, addr)wallet-tool.py display
now runs in: