Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when returning from background - SIGTRAP(TRAP_BRKPT) #160

Closed
adamup928 opened this issue Jan 22, 2019 · 15 comments
Closed

Crash when returning from background - SIGTRAP(TRAP_BRKPT) #160

adamup928 opened this issue Jan 22, 2019 · 15 comments
Assignees
Labels
bug Something isn't working closing-soon-if-no-response This issue will be closed in 7 days unless further comments are made.

Comments

@adamup928
Copy link

Describe the bug
When returning from the background, we experience intermittent crashes related to the AppSync SDK.

To Reproduce
Steps to reproduce the behavior:

  1. Initialize an AppSync instance.
  2. Minimize the app (while not connected to a debugger).
  3. When the app returns to an active state, occasionally it will crash with the AWS AppSync module referenced in the stack trace.

Expected behavior
Returning from the background should not result in a crash.

Environment(please complete the following information):

  • SDK Version: AWS Core 2.8.4, AWSAppSync 2.9.1
  • Dependency Manager: Cocoapods
  • Swift Version : 4.2

Device Information (please complete the following information):

  • Device: Multiple (iPhone X, iPhone 7 Plus)
  • iOS Version: [e.g. iOS 11.4] 12.1 and 12.1.2
  • Specific to simulators: n/a

Additional context

Here is a symbolicated stack trace:

Cause: Crash due to signal: SIGTRAP(TRAP_BRKPT) at 10bf4a1a4


0    libswiftCore.dylib                       0x10bf4a1a4     $Ss17_assertionFailure__4file4line5flagss5NeverOs12StaticStringV_SSAHSus6UInt32VtFTf4nxnnn_n + 164
1    libswiftCore.dylib                       0x10bf9fc48     swift_unexpectedError + 280
2    SQLite                                   0x10a334464     (null)
3    SQLite                                   0x10a33eea0     (null)
4    libswiftCore.dylib                       0x10bf2e320     $Ss21_ClosureBasedIteratorV4nextxSgyF + 48
5    libswiftCore.dylib                       0x10bf2e4ac     $Ss12_IteratorBoxC4next7ElementQzSgyF + 36
6    libswiftCore.dylib                       0x10bdf0050     $Ss11AnyIteratorV4nextxSgyF + 28
7    libswiftCore.dylib                       0x10bd5e9c8     $SSTsE3mapySayqd__Gqd__7ElementQzKXEKlF + 816
8    libswiftCore.dylib                       0x10bf39390     $Ss21_ClosureBasedSequenceVyxGSTsST3mapySayqd__Gqd__7ElementQzKXEKlFTW + 84
9    libswiftCore.dylib                       0x10bf2f068     $Ss12_SequenceBoxC4_mapySayqd__Gqd__7ElementQzKXEKlF + 140
10   AWSAppSync                               0x105a779b8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC13selectRecords031_399C1C9730F07D9D275A1626795051K0LL7forKeysSayAA6RecordVGSaySSG_tKF (<compiler-generated>:0)
11   AWSAppSync                               0x105a7669c     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC12mergeRecords031_399C1C9730F07D9D275A1626795051K0LL7recordsShySSGAA9RecordSetV_tKF (AWSSQLLiteNormalizedCache.swift:201)
12   AWSAppSync                               0x105a765d8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tF (AWSSQLLiteNormalizedCache.swift:161)
13   AWSAppSync                               0x105a787a4     $S10AWSAppSync25AWSSQLLiteNormalizedCacheCAA0dE0A2aDP5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tFTW (<compiler-generated>:0)
14   AWSAppSync                               0x105a3042c     $S10AWSAppSync11ApolloStoreC7publish7records7contextAA7PromiseCyytGAA9RecordSetV_SvSgtFyyyt_tc_ys5Error_pctXEfU_yycfU_ (ApolloStore.swift:40)
15   AWSAppSync                               0x105a2fd10     $SIeg_IeyB_TR (<compiler-generated>:0)
16   libdispatch.dylib                        0x19ac696c8     _dispatch_call_block_and_release + 24
17   libdispatch.dylib                        0x19ac6a484     _dispatch_client_callout + 16
18   libdispatch.dylib                        0x19ac189c8     _dispatch_lane_concurrent_drain + 924
19   libdispatch.dylib                        0x19ac127a0     _dispatch_lane_invoke$VARIANT$mp + 568
20   libdispatch.dylib                        0x19ac0d360     _dispatch_queue_override_invoke + 376
21   libdispatch.dylib                        0x19ac19a18     _dispatch_root_queue_drain + 360
22   libdispatch.dylib                        0x19ac1a2c0     _dispatch_worker_thread2 + 112
23   libsystem_pthread.dylib                  0x19ae4d17c     _pthread_wqthread + 472
24   libsystem_pthread.dylib                  0x19ae4fcec     start_wqthread + 4```
@adamup928 adamup928 changed the title AppSync SDK crash when returning from background - SIGTRAP(TRAP_BRKPT) Crash when returning from background - SIGTRAP(TRAP_BRKPT) Jan 22, 2019
@palpatim
Copy link
Contributor

Thanks for reporting this issue @adamup928. I'm working on cleaning up the persistence layer for the mutation queue (which @larryonoff is also working on), including better SQLite connection discipline. I'll make sure to include this issue in the list of things I'm considering.

@palpatim palpatim self-assigned this Jan 22, 2019
@palpatim palpatim added bug Something isn't working AppSync labels Jan 22, 2019
@adamup928
Copy link
Author

Thanks, @palpatim, I appreciate the update.

@palpatim
Copy link
Contributor

@adamup928

tl;dr: I can't repro, but have a hypothesis.

Discussion

Unfortunately I'm not able to repro this in any of my tests or test environments, but we can make some progress just looking at the stack trace & the code.

It looks like the crash is happening during the delta sync upon resuming from background. The delta query response handler, attempts to call publish on the ApolloStore. Examining the stack trace, it looks like the error is being thrown down in the iterator of SQLite.Swift's Statement.map implementation. That iterator will throw if it encounters a SQLite result code of anything other than SQLITE_OK, SQLITE_ROW, or SQLITE_END. Because SQLite.Swift implements the failable iterator with a try!, that will end up crashing the app if we encounter a non-success code during the read.

My hypothesis is that we're hitting an error condition returning from background, possibly something like SQLITE_BUSY.

You may already know, but to provide context for future readers: SQLite allows multiple connections to a given database file. It coordinates access at the OS level so that any number of processes may read the database concurrently, but only one may write. In order to write, we have to obtain an exclusive lock on the database, during which time no other process can access, even to read. If the lock lasts longer than the busy timeout (which we set to 100ms), then the update fails and the busy handler is invoked. We don't specify a busy handler, so that means the update simply fails.

Because our single database file is used for all three of mutation queue, subscription metadata, and record caching, it is possible that those three largely unrelated processes may be executing updates that take longer than the busy timeout, thus causing the error.

If that's the case, one mitigation would be to isolate each database-accessing system to use a separate database file. That would also allow us to provide granularity for people who want to use, say, a persistent record cache without persisting a mutation queue. That wouldn't solve the problem if the issue were related to activity in a single module, but it would help overall contention.

I will put together a change that causes these modules to use separate DB files, and we'll see if it helps your case.

@larryonoff
Copy link
Contributor

@palpatim our team also have deadlock quite often. As far as I've investigated it occurs when offline mutations are being loaded from cache and someone performs mutation so deadlock happens on SQLite.swift

@adamup928
Copy link
Author

@palpatim - Thanks for the detailed update. Your solution to split the database makes sense. If I can reproduce the issue consistently, I'll post a sample project.

@palpatim
Copy link
Contributor

palpatim commented Feb 1, 2019

@larryonoff Thanks for calling out the deadlock. Part of my work to split the databases will be to audit our mutation queue's database writes to ensure we're serializing them appropriately.

@adamup928 A sample project would be great if you're able to get a consistent repro, thanks!

@larryonoff
Copy link
Contributor

@palpatim I investigated a bit the issue. SQLite.swift works in serial manner. So I thought about putting all its work in async DispatchQueue, but not sure that this's correct.

@adamup928
Copy link
Author

Wanted to add this additional stack trace that has the SQLite lines symbolicated. Still can't reproduce reliably.

# Cause: Crash due to signal: SIGTRAP(TRAP_BRKPT) at 10bf4a1a4

Thread 16 [Crashed]:

0    libswiftCore.dylib                       0x10bf4a1a4     $Ss17_assertionFailure__4file4line5flagss5NeverOs12StaticStringV_SSAHSus6UInt32VtFTf4nxnnn_n + 164
1    libswiftCore.dylib                       0x10bf9fc48     swift_unexpectedError + 280
2    SQLite                                   0x10a334464     $S6SQLite10ConnectionC7prepareys11AnySequenceVyAA3RowVGAA9QueryType_pKFs0D8IteratorVyAHGycfU_AHSgycfU_ (Statement.swift:246)
3    SQLite                                   0x10a33eea0     $S6SQLite3RowVSgIego_ADIegr_TRTA (<compiler-generated>:0)
4    libswiftCore.dylib                       0x10bf2e320     $Ss21_ClosureBasedIteratorV4nextxSgyF + 48
5    libswiftCore.dylib                       0x10bf2e4ac     $Ss12_IteratorBoxC4next7ElementQzSgyF + 36
6    libswiftCore.dylib                       0x10bdf0050     $Ss11AnyIteratorV4nextxSgyF + 28
7    libswiftCore.dylib                       0x10bd5e9c8     $SSTsE3mapySayqd__Gqd__7ElementQzKXEKlF + 816
8    libswiftCore.dylib                       0x10bf39390     $Ss21_ClosureBasedSequenceVyxGSTsST3mapySayqd__Gqd__7ElementQzKXEKlFTW + 84
9    libswiftCore.dylib                       0x10bf2f068     $Ss12_SequenceBoxC4_mapySayqd__Gqd__7ElementQzKXEKlF + 140
10   AWSAppSync                               0x105a779b8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC13selectRecords031_399C1C9730F07D9D275A1626795051K0LL7forKeysSayAA6RecordVGSaySSG_tKF (<compiler-generated>:0)
11   AWSAppSync                               0x105a7669c     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC12mergeRecords031_399C1C9730F07D9D275A1626795051K0LL7recordsShySSGAA9RecordSetV_tKF (AWSSQLLiteNormalizedCache.swift:201)
12   AWSAppSync                               0x105a765d8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tF (AWSSQLLiteNormalizedCache.swift:161)
13   AWSAppSync                               0x105a787a4     $S10AWSAppSync25AWSSQLLiteNormalizedCacheCAA0dE0A2aDP5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tFTW (<compiler-generated>:0)
14   AWSAppSync                               0x105a3042c     $S10AWSAppSync11ApolloStoreC7publish7records7contextAA7PromiseCyytGAA9RecordSetV_SvSgtFyyyt_tc_ys5Error_pctXEfU_yycfU_ (ApolloStore.swift:40)
15   AWSAppSync                               0x105a2fd10     $SIeg_IeyB_TR (<compiler-generated>:0)
16   libdispatch.dylib                        0x19ac696c8     _dispatch_call_block_and_release + 24
17   libdispatch.dylib                        0x19ac6a484     _dispatch_client_callout + 16
18   libdispatch.dylib                        0x19ac189c8     _dispatch_lane_concurrent_drain + 924
19   libdispatch.dylib                        0x19ac127a0     _dispatch_lane_invoke$VARIANT$mp + 568
20   libdispatch.dylib                        0x19ac0d360     _dispatch_queue_override_invoke + 376
21   libdispatch.dylib                        0x19ac19a18     _dispatch_root_queue_drain + 360
22   libdispatch.dylib                        0x19ac1a2c0     _dispatch_worker_thread2 + 112
23   libsystem_pthread.dylib                  0x19ae4d17c     _pthread_wqthread + 472
24   libsystem_pthread.dylib                  0x19ae4fcec     start_wqthread + 4

@palpatim
Copy link
Contributor

palpatim commented Feb 5, 2019

@adamup928

Thanks for the extra symbolication. This does point to a different area of the code--I had assumed this was hitting the error attempting to force-unwrap in the FailableIterator, but this points to the force-unwrap of the Statement's handle in Cursor.init(_:).

I'll dig into this a bit to see if this changes my thinking about whether separating the DBs will help.

@palpatim
Copy link
Contributor

palpatim commented Feb 5, 2019

I think this ends up being basically the same path--the force unwrap in question is the one attempting to get the statement handle (rather than force-trying the iterator as I mentioned above). I'm somewhat concerned that the error isn't being thrown from prepare or the iterator's next itself, but rather from the Cursor constructor. That implies that the statement handle is being cleared between the time the prepare returns and the time iterator.next is first called, which doesn't make sense to me.

Some other questions, which are basically grasping at straws at this point:

  • In your app code, do you clear release/nil out your appSyncClient? That might cause the Statement to be deinitialized while the mergeRecords function is iterating over the cursor (although the Statement, in its role as Iterator should be retained by map if my understanding is right).
  • Does your app maintain any handles to the database separately from the ones instantiated by AppSync?
  • Does your app attempt to do any filesystem cleanup of the database file?
  • Along that path I'm noticing something else: our filtering uses keys.contains(_), which translates to a SQL statement using host parameters like SELECT xxx FROM foo WHERE key IN (?,?,...). SQLite has a limit of 999 host parameters in a given clause--is it possible that one of your AppSync query results is returning more than 999 results in the set? (I don't really think this is the issue, since I'd expect the error to be thrown during the initial prepare rather than the iteration, but I wanted to call it out.)

@adamup928
Copy link
Author

Thanks for your investigation and fast response, @palpatim.

To answer your questions:

  • No, the appSyncClient is retained and is not being released or cleared
  • No, we don't maintain any handles to the database separately
  • No, we don't attempt to cleanup the filesystem or database file
  • Our app sync result count is in the single digits.

Hope this helps!

@palpatim
Copy link
Contributor

palpatim commented Feb 6, 2019

@adamup928

Thanks for the clarifications. At this point, I think separating the DB files is our best bet since the stack traces aren't displaying anything that looks like a deadlock or invalid SQLite access pattern.

@palpatim
Copy link
Contributor

palpatim commented Feb 6, 2019

PR #171 is merged to master and should go out with the next release (which I'm working on getting done by end of week)

@adamup928
Copy link
Author

Thanks, @palpatim. That's a lot of work, and it's greatly appreciated!

@palpatim palpatim added pending-release Work is done, but issue can't be closed since the changes were not released yet and removed Reviewing labels Feb 8, 2019
@palpatim
Copy link
Contributor

palpatim commented Feb 8, 2019

This is released in 2.10.0. Please let us know if you see any issues, otherwise, we'll let this issue close out automatically.

@palpatim palpatim added closing-soon-if-no-response This issue will be closed in 7 days unless further comments are made. and removed pending-release Work is done, but issue can't be closed since the changes were not released yet labels Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working closing-soon-if-no-response This issue will be closed in 7 days unless further comments are made.
Projects
None yet
Development

No branches or pull requests

3 participants