Crash when returning from background - SIGTRAP(TRAP_BRKPT) #160

adamup928 · 2019-01-22T19:53:03Z

Describe the bug
When returning from the background, we experience intermittent crashes related to the AppSync SDK.

To Reproduce
Steps to reproduce the behavior:

Initialize an AppSync instance.
Minimize the app (while not connected to a debugger).
When the app returns to an active state, occasionally it will crash with the AWS AppSync module referenced in the stack trace.

Expected behavior
Returning from the background should not result in a crash.

Environment(please complete the following information):

SDK Version: AWS Core 2.8.4, AWSAppSync 2.9.1
Dependency Manager: Cocoapods
Swift Version : 4.2

Device Information (please complete the following information):

Device: Multiple (iPhone X, iPhone 7 Plus)
iOS Version: [e.g. iOS 11.4] 12.1 and 12.1.2
Specific to simulators: n/a

Additional context

Here is a symbolicated stack trace:

Cause: Crash due to signal: SIGTRAP(TRAP_BRKPT) at 10bf4a1a4


0    libswiftCore.dylib                       0x10bf4a1a4     $Ss17_assertionFailure__4file4line5flagss5NeverOs12StaticStringV_SSAHSus6UInt32VtFTf4nxnnn_n + 164
1    libswiftCore.dylib                       0x10bf9fc48     swift_unexpectedError + 280
2    SQLite                                   0x10a334464     (null)
3    SQLite                                   0x10a33eea0     (null)
4    libswiftCore.dylib                       0x10bf2e320     $Ss21_ClosureBasedIteratorV4nextxSgyF + 48
5    libswiftCore.dylib                       0x10bf2e4ac     $Ss12_IteratorBoxC4next7ElementQzSgyF + 36
6    libswiftCore.dylib                       0x10bdf0050     $Ss11AnyIteratorV4nextxSgyF + 28
7    libswiftCore.dylib                       0x10bd5e9c8     $SSTsE3mapySayqd__Gqd__7ElementQzKXEKlF + 816
8    libswiftCore.dylib                       0x10bf39390     $Ss21_ClosureBasedSequenceVyxGSTsST3mapySayqd__Gqd__7ElementQzKXEKlFTW + 84
9    libswiftCore.dylib                       0x10bf2f068     $Ss12_SequenceBoxC4_mapySayqd__Gqd__7ElementQzKXEKlF + 140
10   AWSAppSync                               0x105a779b8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC13selectRecords031_399C1C9730F07D9D275A1626795051K0LL7forKeysSayAA6RecordVGSaySSG_tKF (<compiler-generated>:0)
11   AWSAppSync                               0x105a7669c     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC12mergeRecords031_399C1C9730F07D9D275A1626795051K0LL7recordsShySSGAA9RecordSetV_tKF (AWSSQLLiteNormalizedCache.swift:201)
12   AWSAppSync                               0x105a765d8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tF (AWSSQLLiteNormalizedCache.swift:161)
13   AWSAppSync                               0x105a787a4     $S10AWSAppSync25AWSSQLLiteNormalizedCacheCAA0dE0A2aDP5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tFTW (<compiler-generated>:0)
14   AWSAppSync                               0x105a3042c     $S10AWSAppSync11ApolloStoreC7publish7records7contextAA7PromiseCyytGAA9RecordSetV_SvSgtFyyyt_tc_ys5Error_pctXEfU_yycfU_ (ApolloStore.swift:40)
15   AWSAppSync                               0x105a2fd10     $SIeg_IeyB_TR (<compiler-generated>:0)
16   libdispatch.dylib                        0x19ac696c8     _dispatch_call_block_and_release + 24
17   libdispatch.dylib                        0x19ac6a484     _dispatch_client_callout + 16
18   libdispatch.dylib                        0x19ac189c8     _dispatch_lane_concurrent_drain + 924
19   libdispatch.dylib                        0x19ac127a0     _dispatch_lane_invoke$VARIANT$mp + 568
20   libdispatch.dylib                        0x19ac0d360     _dispatch_queue_override_invoke + 376
21   libdispatch.dylib                        0x19ac19a18     _dispatch_root_queue_drain + 360
22   libdispatch.dylib                        0x19ac1a2c0     _dispatch_worker_thread2 + 112
23   libsystem_pthread.dylib                  0x19ae4d17c     _pthread_wqthread + 472
24   libsystem_pthread.dylib                  0x19ae4fcec     start_wqthread + 4```

The text was updated successfully, but these errors were encountered:

palpatim · 2019-01-22T21:06:24Z

Thanks for reporting this issue @adamup928. I'm working on cleaning up the persistence layer for the mutation queue (which @larryonoff is also working on), including better SQLite connection discipline. I'll make sure to include this issue in the list of things I'm considering.

adamup928 · 2019-01-22T21:08:05Z

Thanks, @palpatim, I appreciate the update.

palpatim · 2019-01-31T19:31:04Z

@adamup928

tl;dr: I can't repro, but have a hypothesis.

Discussion

Unfortunately I'm not able to repro this in any of my tests or test environments, but we can make some progress just looking at the stack trace & the code.

It looks like the crash is happening during the delta sync upon resuming from background. The delta query response handler, attempts to call publish on the ApolloStore. Examining the stack trace, it looks like the error is being thrown down in the iterator of SQLite.Swift's Statement.map implementation. That iterator will throw if it encounters a SQLite result code of anything other than SQLITE_OK, SQLITE_ROW, or SQLITE_END. Because SQLite.Swift implements the failable iterator with a try!, that will end up crashing the app if we encounter a non-success code during the read.

My hypothesis is that we're hitting an error condition returning from background, possibly something like SQLITE_BUSY.

You may already know, but to provide context for future readers: SQLite allows multiple connections to a given database file. It coordinates access at the OS level so that any number of processes may read the database concurrently, but only one may write. In order to write, we have to obtain an exclusive lock on the database, during which time no other process can access, even to read. If the lock lasts longer than the busy timeout (which we set to 100ms), then the update fails and the busy handler is invoked. We don't specify a busy handler, so that means the update simply fails.

Because our single database file is used for all three of mutation queue, subscription metadata, and record caching, it is possible that those three largely unrelated processes may be executing updates that take longer than the busy timeout, thus causing the error.

If that's the case, one mitigation would be to isolate each database-accessing system to use a separate database file. That would also allow us to provide granularity for people who want to use, say, a persistent record cache without persisting a mutation queue. That wouldn't solve the problem if the issue were related to activity in a single module, but it would help overall contention.

I will put together a change that causes these modules to use separate DB files, and we'll see if it helps your case.

larryonoff · 2019-02-01T08:16:47Z

@palpatim our team also have deadlock quite often. As far as I've investigated it occurs when offline mutations are being loaded from cache and someone performs mutation so deadlock happens on SQLite.swift

adamup928 · 2019-02-01T15:23:14Z

@palpatim - Thanks for the detailed update. Your solution to split the database makes sense. If I can reproduce the issue consistently, I'll post a sample project.

palpatim · 2019-02-01T15:43:10Z

@larryonoff Thanks for calling out the deadlock. Part of my work to split the databases will be to audit our mutation queue's database writes to ensure we're serializing them appropriately.

@adamup928 A sample project would be great if you're able to get a consistent repro, thanks!

larryonoff · 2019-02-01T15:58:16Z

@palpatim I investigated a bit the issue. SQLite.swift works in serial manner. So I thought about putting all its work in async DispatchQueue, but not sure that this's correct.

adamup928 · 2019-02-05T18:21:25Z

Wanted to add this additional stack trace that has the SQLite lines symbolicated. Still can't reproduce reliably.

# Cause: Crash due to signal: SIGTRAP(TRAP_BRKPT) at 10bf4a1a4

Thread 16 [Crashed]:

0    libswiftCore.dylib                       0x10bf4a1a4     $Ss17_assertionFailure__4file4line5flagss5NeverOs12StaticStringV_SSAHSus6UInt32VtFTf4nxnnn_n + 164
1    libswiftCore.dylib                       0x10bf9fc48     swift_unexpectedError + 280
2    SQLite                                   0x10a334464     $S6SQLite10ConnectionC7prepareys11AnySequenceVyAA3RowVGAA9QueryType_pKFs0D8IteratorVyAHGycfU_AHSgycfU_ (Statement.swift:246)
3    SQLite                                   0x10a33eea0     $S6SQLite3RowVSgIego_ADIegr_TRTA (<compiler-generated>:0)
4    libswiftCore.dylib                       0x10bf2e320     $Ss21_ClosureBasedIteratorV4nextxSgyF + 48
5    libswiftCore.dylib                       0x10bf2e4ac     $Ss12_IteratorBoxC4next7ElementQzSgyF + 36
6    libswiftCore.dylib                       0x10bdf0050     $Ss11AnyIteratorV4nextxSgyF + 28
7    libswiftCore.dylib                       0x10bd5e9c8     $SSTsE3mapySayqd__Gqd__7ElementQzKXEKlF + 816
8    libswiftCore.dylib                       0x10bf39390     $Ss21_ClosureBasedSequenceVyxGSTsST3mapySayqd__Gqd__7ElementQzKXEKlFTW + 84
9    libswiftCore.dylib                       0x10bf2f068     $Ss12_SequenceBoxC4_mapySayqd__Gqd__7ElementQzKXEKlF + 140
10   AWSAppSync                               0x105a779b8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC13selectRecords031_399C1C9730F07D9D275A1626795051K0LL7forKeysSayAA6RecordVGSaySSG_tKF (<compiler-generated>:0)
11   AWSAppSync                               0x105a7669c     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC12mergeRecords031_399C1C9730F07D9D275A1626795051K0LL7recordsShySSGAA9RecordSetV_tKF (AWSSQLLiteNormalizedCache.swift:201)
12   AWSAppSync                               0x105a765d8     $S10AWSAppSync25AWSSQLLiteNormalizedCacheC5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tF (AWSSQLLiteNormalizedCache.swift:161)
13   AWSAppSync                               0x105a787a4     $S10AWSAppSync25AWSSQLLiteNormalizedCacheCAA0dE0A2aDP5merge7recordsAA7PromiseCyShySSGGAA9RecordSetV_tFTW (<compiler-generated>:0)
14   AWSAppSync                               0x105a3042c     $S10AWSAppSync11ApolloStoreC7publish7records7contextAA7PromiseCyytGAA9RecordSetV_SvSgtFyyyt_tc_ys5Error_pctXEfU_yycfU_ (ApolloStore.swift:40)
15   AWSAppSync                               0x105a2fd10     $SIeg_IeyB_TR (<compiler-generated>:0)
16   libdispatch.dylib                        0x19ac696c8     _dispatch_call_block_and_release + 24
17   libdispatch.dylib                        0x19ac6a484     _dispatch_client_callout + 16
18   libdispatch.dylib                        0x19ac189c8     _dispatch_lane_concurrent_drain + 924
19   libdispatch.dylib                        0x19ac127a0     _dispatch_lane_invoke$VARIANT$mp + 568
20   libdispatch.dylib                        0x19ac0d360     _dispatch_queue_override_invoke + 376
21   libdispatch.dylib                        0x19ac19a18     _dispatch_root_queue_drain + 360
22   libdispatch.dylib                        0x19ac1a2c0     _dispatch_worker_thread2 + 112
23   libsystem_pthread.dylib                  0x19ae4d17c     _pthread_wqthread + 472
24   libsystem_pthread.dylib                  0x19ae4fcec     start_wqthread + 4

palpatim · 2019-02-05T22:13:53Z

@adamup928

Thanks for the extra symbolication. This does point to a different area of the code--I had assumed this was hitting the error attempting to force-unwrap in the FailableIterator, but this points to the force-unwrap of the Statement's handle in Cursor.init(_:).

I'll dig into this a bit to see if this changes my thinking about whether separating the DBs will help.

palpatim · 2019-02-05T23:51:36Z

I think this ends up being basically the same path--the force unwrap in question is the one attempting to get the statement handle (rather than force-trying the iterator as I mentioned above). I'm somewhat concerned that the error isn't being thrown from prepare or the iterator's next itself, but rather from the Cursor constructor. That implies that the statement handle is being cleared between the time the prepare returns and the time iterator.next is first called, which doesn't make sense to me.

Some other questions, which are basically grasping at straws at this point:

In your app code, do you clear release/nil out your appSyncClient? That might cause the Statement to be deinitialized while the mergeRecords function is iterating over the cursor (although the Statement, in its role as Iterator should be retained by map if my understanding is right).
Does your app maintain any handles to the database separately from the ones instantiated by AppSync?
Does your app attempt to do any filesystem cleanup of the database file?
Along that path I'm noticing something else: our filtering uses keys.contains(_), which translates to a SQL statement using host parameters like SELECT xxx FROM foo WHERE key IN (?,?,...). SQLite has a limit of 999 host parameters in a given clause--is it possible that one of your AppSync query results is returning more than 999 results in the set? (I don't really think this is the issue, since I'd expect the error to be thrown during the initial prepare rather than the iteration, but I wanted to call it out.)

adamup928 · 2019-02-05T23:54:16Z

Thanks for your investigation and fast response, @palpatim.

To answer your questions:

No, the appSyncClient is retained and is not being released or cleared
No, we don't maintain any handles to the database separately
No, we don't attempt to cleanup the filesystem or database file
Our app sync result count is in the single digits.

Hope this helps!

palpatim · 2019-02-06T15:33:56Z

@adamup928

Thanks for the clarifications. At this point, I think separating the DB files is our best bet since the stack traces aren't displaying anything that looks like a deadlock or invalid SQLite access pattern.

palpatim · 2019-02-06T22:28:03Z

PR #171 is merged to master and should go out with the next release (which I'm working on getting done by end of week)

adamup928 · 2019-02-07T00:40:56Z

Thanks, @palpatim. That's a lot of work, and it's greatly appreciated!

palpatim · 2019-02-08T23:13:25Z

This is released in 2.10.0. Please let us know if you see any issues, otherwise, we'll let this issue close out automatically.

adamup928 changed the title ~~AppSync SDK crash when returning from background - SIGTRAP(TRAP_BRKPT)~~ Crash when returning from background - SIGTRAP(TRAP_BRKPT) Jan 22, 2019

palpatim self-assigned this Jan 22, 2019

palpatim added bug Something isn't working AppSync labels Jan 22, 2019

palpatim mentioned this issue Jan 31, 2019

Make AWSMutationCache thread safe #155

Merged

palpatim mentioned this issue Feb 5, 2019

Palpatim/crash resuming from background #171

Merged

palpatim added the Reviewing label Feb 6, 2019

palpatim added pending-release Work is done, but issue can't be closed since the changes were not released yet and removed Reviewing labels Feb 8, 2019

palpatim added closing-soon-if-no-response This issue will be closed in 7 days unless further comments are made. and removed pending-release Work is done, but issue can't be closed since the changes were not released yet labels Feb 8, 2019

palpatim mentioned this issue Feb 12, 2019

Audit mutation queue threading/operation model #175

Closed

palpatim closed this as completed Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash when returning from background - SIGTRAP(TRAP_BRKPT) #160

Crash when returning from background - SIGTRAP(TRAP_BRKPT) #160

adamup928 commented Jan 22, 2019

palpatim commented Jan 22, 2019

adamup928 commented Jan 22, 2019

palpatim commented Jan 31, 2019

larryonoff commented Feb 1, 2019

adamup928 commented Feb 1, 2019

palpatim commented Feb 1, 2019

larryonoff commented Feb 1, 2019

adamup928 commented Feb 5, 2019

palpatim commented Feb 5, 2019

palpatim commented Feb 5, 2019 •

edited

Loading

adamup928 commented Feb 5, 2019

palpatim commented Feb 6, 2019

palpatim commented Feb 6, 2019

adamup928 commented Feb 7, 2019

palpatim commented Feb 8, 2019

Crash when returning from background - SIGTRAP(TRAP_BRKPT) #160

Crash when returning from background - SIGTRAP(TRAP_BRKPT) #160

Comments

adamup928 commented Jan 22, 2019

palpatim commented Jan 22, 2019

adamup928 commented Jan 22, 2019

palpatim commented Jan 31, 2019

larryonoff commented Feb 1, 2019

adamup928 commented Feb 1, 2019

palpatim commented Feb 1, 2019

larryonoff commented Feb 1, 2019

adamup928 commented Feb 5, 2019

palpatim commented Feb 5, 2019

palpatim commented Feb 5, 2019 • edited Loading

adamup928 commented Feb 5, 2019

palpatim commented Feb 6, 2019

palpatim commented Feb 6, 2019

adamup928 commented Feb 7, 2019

palpatim commented Feb 8, 2019

palpatim commented Feb 5, 2019 •

edited

Loading