Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harmony crash vmState=0x00020015 or vmState=0x00020011 #20567

Closed
pshipton opened this issue Nov 11, 2024 · 18 comments
Closed

harmony crash vmState=0x00020015 or vmState=0x00020011 #20567

pshipton opened this issue Nov 11, 2024 · 18 comments
Labels
blocker comp:jit segfault Issues that describe segfaults / JVM crashes test failure

Comments

@pshipton
Copy link
Member

pshipton commented Nov 11, 2024

This also occurred on AIX with a 6/30 failure rate.
#20536 (comment)

Internal build
[Linux PPC] 80 Load_Level_2.harmony.5mins.Mode112 -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs

30x grinder failed 4/30, one with vmstate=0x0005ffff #20546

j> 16:22:09 #INFO:  No threads were activated following a resume all compilation threads call - FYI this also shows up in passing tests
j> 16:22:10 Unhandled exception
j> 16:22:10 Type=Segmentation error vmState=0x00020015
j> 16:22:10 J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
j> 16:22:10 Handler1=0F860060 Handler2=0F698FA0
j> 16:22:10 R0=0E20CAE0 R1=AC7FC940 R2=AC806930 R3=9DFFA280
j> 16:22:10 R4=FFFFFFFD R5=B19B111C R6=F74DFC98 R7=00000001
j> 16:22:10 R8=FFFF963D R9=AC7FF4F4 R10=AD7F39E4 R11=0E205D40
j> 16:22:10 R12=84004882 R13=AC804D00 R14=B1940FCF R15=0FB3CC98
j> 16:22:10 R16=000000B4 R17=00000000 R18=0DFF3AA0 R19=F744336C
j> 16:22:10 R20=F744336C R21=F7442EF0 R22=F7442EEC R23=A0EDE270
j> 16:22:10 R24=FFFFFFFD R25=9D6F77C8 R26=00000054 R27=00000015
j> 16:22:10 R28=0F42C3A8 R29=00000000 R30=0F4BA804 R31=9DFFA280
j> 16:22:10 NIP=0E205D74 MSR=0280F032 ORIG_GPR3=0E20CADC CTR=0E205D40
j> 16:22:10 LINK=0E205D64 XER=20000000 CCR=42004822 MQ=00000001
j> 16:22:10 TRAP=00000300 DAR=FFFFFFF9 dsisr=40000000 RESULT=00000000
j> 16:22:10 Module=/bluebird/builds/bld_81426/sdk/xp3280/jre/lib/ppc/default/libj9jit29.so
j> 16:22:10 Module_base_address=0DE00000
j> 16:22:10 Target=2_90_20241111_81426 (Linux 3.10.0-1160.119.1.el7.ppc64)
j> 16:22:10 CPU=ppc (16 logical CPUs) (0x17c2e0000 RAM)
j> 16:22:10 ----------- Stack Backtrace -----------
j> 16:22:12 protectedBacktrace+0x2c (0x0F6CF64C [libj9prt29.so+0x6f64c])
j> 16:22:12 omrsig_protect+0x5bc (0x0F696FBC [libj9prt29.so+0x36fbc])
j> 16:22:12 omrintrospect_backtrace_thread_raw+0x120 (0x0F6CFDC0 [libj9prt29.so+0x6fdc0])
j> 16:22:12 protectedIntrospectBacktraceThread+0x30 (0x0F6CF150 [libj9prt29.so+0x6f150])
j> 16:22:12 omrsig_protect+0x5bc (0x0F696FBC [libj9prt29.so+0x36fbc])
j> 16:22:12 omrintrospect_backtrace_thread+0x98 (0x0F6CF218 [libj9prt29.so+0x6f218])
j> 16:22:12 generateDiagnosticFiles+0xec (0x0F860A4C [libj9vm29.so+0xd0a4c])
j> 16:22:12 omrsig_protect+0x5bc (0x0F696FBC [libj9prt29.so+0x36fbc])
j> 16:22:12 structuredSignalHandler+0x134 (0x0F860194 [libj9vm29.so+0xd0194])
j> 16:22:12 mainSynchSignalHandler+0x1f0 (0x0F699190 [libj9prt29.so+0x39190])
j> 16:22:12 __kernel_sigtramp_rt32+0x0 (0x001003C0)
j> 16:22:12 _Z25ppcCreateMethodTrampolinePvS_S_+0xc4 (0x0E205E04 [libj9jit29.so+0x405e04])
j> 16:22:12 _ZN3OMR9CodeCache19syncTempTrampolinesEv+0x240 (0x0E20CAE0 [libj9jit29.so+0x40cae0])
j> 16:22:12 _ZN3OMR16CodeCacheManager22synchronizeTrampolinesEv+0xa4 (0x0E210924 [libj9jit29.so+0x410924])
j> 16:22:12 _Z18jitHookGlobalGCEndPP15J9HookInterfacejPvS2_+0x80 (0x0DFF3B20 [libj9jit29.so+0x1f3b20])
j> 16:22:12 _Z14J9HookDispatchPP15J9HookInterfacejPv+0x20c (0x0F720D8C [libj9hookable29.so+0xd8c])
j> 16:22:12 _ZN19MM_ParallelGlobalGC11reportGCEndEP18MM_EnvironmentBase+0x21c (0x0DAF107C [libj9gc29.so+0x21107c])
j> 16:22:12 _ZN19MM_ParallelGlobalGC19internalPostCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpace+0x1d0 (0x0DAEFF50 [libj9gc29.so+0x20ff50])
j> 16:22:12 _ZN15MM_ConcurrentGC19internalPostCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpace+0x138 (0x0DB2C758 [libj9gc29.so+0x24c758])
j> 16:22:12 _ZN32MM_ConcurrentGCIncrementalUpdate19internalPostCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpace+0x24 (0x0DB3AAA4 [libj9gc29.so+0x25aaa4])
j> 16:22:12 _ZN12MM_Collector11postCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpace+0xb0 (0x0DA60AB0 [libj9gc29.so+0x180ab0])
j> 16:22:12 _ZN12MM_Collector14garbageCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpaceP22MM_AllocateDescriptionjP28MM_ObjectAllocationInterfaceS3_P20MM_AllocationContext+0x4a0 (0x0DA612C0 [libj9gc29.so+0x1812c0])
j> 16:22:12 _ZN17MM_MemorySubSpace14garbageCollectEP18MM_EnvironmentBaseP22MM_AllocateDescriptionj+0x1a0 (0x0DA894C0 [libj9gc29.so+0x1a94c0])
j> 16:22:12 _ZN17MM_MemorySubSpace23percolateGarbageCollectEP18MM_EnvironmentBaseP22MM_AllocateDescriptionj+0xb0 (0x0DA89110 [libj9gc29.so+0x1a9110])
j> 16:22:12 _ZN12MM_Scavenger23percolateGarbageCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpaceP22MM_AllocateDescription15PercolateReasonj+0x94 (0x0DB18754 [libj9gc29.so+0x238754])
j> 16:22:12 _ZN12MM_Scavenger22internalGarbageCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpaceP22MM_AllocateDescription+0x117c (0x0DB1845C [libj9gc29.so+0x23845c])
j> 16:22:12 _ZN12MM_Collector14garbageCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpaceP22MM_AllocateDescriptionjP28MM_ObjectAllocationInterfaceS3_P20MM_AllocationContext+0x368 (0x0DA61188 [libj9gc29.so+0x181188])
j> 16:22:12 _ZN26MM_MemorySubSpaceSemiSpace23allocationRequestFailedEP18MM_EnvironmentBaseP22MM_AllocateDescriptionN17MM_MemorySubSpace14AllocationTypeEP28MM_ObjectAllocationInterfacePS4_S8_+0x4d8 (0x0DB949B8 [libj9gc29.so+0x2b49b8])
j> 16:22:12 _ZN24MM_MemorySubSpaceGeneric11allocateTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP28MM_ObjectAllocationInterfaceP17MM_MemorySubSpaceS7_b+0x21c (0x0DB922BC [libj9gc29.so+0x2b22bc])
j> 16:22:12 _ZN23MM_TLHAllocationSupport7refreshEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x880 (0x0DA9E840 [libj9gc29.so+0x1be840])
j> 16:22:12 _ZN23MM_TLHAllocationSupport15allocateFromTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x80 (0x0DA9EFA0 [libj9gc29.so+0x1befa0])
j> 16:22:12 _ZN25MM_TLHAllocationInterface14allocateObjectEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP14MM_MemorySpaceb+0x450 (0x0DA9D330 [libj9gc29.so+0x1bd330])
j> 16:22:12 _ZN25MM_TLHAllocationInterface21allocateArrayletSpineEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP14MM_MemorySpaceb+0x34 (0x0DA9D3D4 [libj9gc29.so+0x1bd3d4])
j> 16:22:12 _Z21OMR_GC_AllocateObjectP12OMR_VMThreadP25MM_AllocateInitialization+0x274 (0x0DAE60F4 [libj9gc29.so+0x2060f4])
j> 16:22:12 J9AllocateIndexableObject+0xf80 (0x0D946820 [libj9gc29.so+0x66820])
j> 16:22:12 newBaseTypeArray+0x74 (0x0F877CF4 [libj9vm29.so+0xe7cf4])
j> 16:22:12 _Z12newByteArrayP7JNIEnv_i+0x28 (0x0F86A528 [libj9vm29.so+0xda528])
j> 16:22:12 Java_com_ibm_crypto_plus_provider_icc_NativeInterface_PBE_1doFinal+0x324 (0x0D2D8F00 [libjgskit.so+0x18f00])
j> 16:22:12  (0x9B2E7A6C [<unknown>+0x0])
j> 16:22:12 runJavaThread+0x1d0 (0x0F841E70 [libj9vm29.so+0xb1e70])
j> 16:22:12 _Z23javaProtectedThreadProcP13J9PortLibraryPv+0x100 (0x0F8B3940 [libj9vm29.so+0x123940])
j> 16:22:12 omrsig_protect+0x5bc (0x0F696FBC [libj9prt29.so+0x36fbc])
j> 16:22:12 javaThreadProc+0x70 (0x0F8B37F0 [libj9vm29.so+0x1237f0])
j> 16:22:12 thread_wrapper+0x314 (0x0F757534 [libj9thr29.so+0x7534])
j> 16:22:12 start_thread+0x10c (0x0FFB7AEC [libpthread.so.0+0x7aec])
j> 16:22:12 clone+0x84 (0x0FE31610 [libc.so.6+0x111610])
j> 16:22:12 ---------------------------------------
@pshipton pshipton added test failure segfault Issues that describe segfaults / JVM crashes labels Nov 11, 2024
Copy link

Issue Number: 20567
Status: Open
Recommended Components: comp:gc, comp:vm, comp:test
Recommended Assignees: pshipton, linhu2016, chengjin01

@pshipton
Copy link
Member Author

@dmitripivkine @hzongaro another problem which seems recently introduced, and is repeatable. Unfortunately we're not getting diagnostic upload on AIX, but only on Linux PPC (automation/issues/116).

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Nov 11, 2024

I will check what I can see, but this one is JIT related most likely. The crash occur in the GC End Hook event.

@vij-singh
Copy link

@zl-wang FYI

@IBMJimmyk
Copy link
Contributor

I tried looking at the core file for http://vmfarm.rtp.raleigh.ibm.com/job_output.php?id=95773976 (internal link)

https://github.com/eclipse-openj9/openj9-omr/blob/3da49aa3aaed9217fb02d5ab3d069aa7142f89ba/compiler/runtime/OMRCodeCache.cpp#L711-L716

What happens is startPC there returns -3 which is treated as a real start PC and leads to a crash.

The J9Method passed into the startPC looks like this:

(kca) j9m 0xb19b111c
Method   {ClassPath/Name.MethodName}: {javax/crypto/EncryptedPrivateKeyInfo.<init>}
                           Signature: (Ljava/lang/String;[B)V
                              Access: Public
                    J9Class/J9Method: 0xb19b1000 / 0xb19b111c
               Compiled Method Start: Not Compiled! (count=-3)
                      ByteCode Start: 0xb19c11a4 (88 bytes)
                   ROM Constant Pool: 0xb19c0b80 (98 entries)
                       Constant Pool: 0xb19b0bf0 (98 entries)

void ppcCreateMethodTrampoline(void *trampPtr, void *startPC, void *method)

The trampPtr passed into ppcCreateMethodTrampoline looks to be 0x9dffa280.

The code starting from 0x9dffa280 looks like this:

0x9dffa280      480022c0 b         0x9dffc540 Trampoline ^{javax/.../EncryptedPrivateKeyInfo.<init>} +3

0x9dffc540      3d609638 lis       r11, -0x69c8 CONST 0x9637ee80 {javax/.../EncryptedPrivateKeyInfo.<init>} +3
0x9dffc544      396bee80 addi      r11, r11, -0x1180
0x9dffc548      556b003e slwi      r11, r11, 0
0x9dffc54c      7d6903a6 mtctr     r11
0x9dffc550      4e800420 bctr

0x9637ee80 {javax/.../EncryptedPrivateKeyInfo.<init>} +3                        4bffffc0 b         0x9637ee40 U>> ^-13

0x9637ee40 {javax/.../EncryptedPrivateKeyInfo.<init>} -13                  >    7c0802a6 mflr      r0 <<< ^+3
0x9637ee44 {javax/.../EncryptedPrivateKeyInfo.<init>} -12                  |    906e0008 stw       r3, 8(r14)
0x9637ee48 {javax/.../EncryptedPrivateKeyInfo.<init>} -11                  |    908e0004 stw       r4, 4(r14)
0x9637ee4c {javax/.../EncryptedPrivateKeyInfo.<init>} -10                  |    90ae0000 stw       r5, 0(r14)
0x9637ee50 {javax/.../EncryptedPrivateKeyInfo.<init>} -9                   |    48180d71 bl        0x964ffbc0
0x9637ee54 {javax/.../EncryptedPrivateKeyInfo.<init>} -8                        b19b111c sth       r12, 0x111c(r27) -> J9Method - {javax/crypto/EncryptedPrivateKeyInfo.<init>}

EncryptedPrivateKeyInfo.<init> was compiled at warm but looks like it was invalidated.

(kca) m 0x9637ee80
         Method Signature: {javax/crypto/EncryptedPrivateKeyInfo.<init>(Ljava/lang/String;[B)V}
                 MetaData: 0x9b46f4e8 (optLevel: warm)
               Frame Size: 44 bytes
                   Access: Public
         J9Class/J9Method: 0xb19b1000 / 0xb19b111c
               MethodInfo: 0xb4926568
                 BodyInfo: 0x9bf9f068 (Flags: Loops_Many_Iterations)
Compiled Method Start/End: 0x9637edc4 / 0x9637f2e8 (329 instructions)

@zl-wang
Copy link
Contributor

zl-wang commented Nov 13, 2024

a reasonable fix is: at the point to re-create the permanent trampoline, testing newPC. if it is not compiled (meaning trials of compilation all failed), recreate with oldPC (at oldPC location, it is already patched to go back to interpreter).

@zl-wang
Copy link
Contributor

zl-wang commented Nov 13, 2024

this should happen very very rarely ... never run into this situation before.

@pshipton
Copy link
Member Author

http://vmfarm.rtp.raleigh.ibm.com/job_output.php?id=95916557
[Linux PPC] 80 Load_Level_2.harmony.5mins.Mode112

@pshipton
Copy link
Member Author

http://vmfarm.rtp.raleigh.ibm.com/job_output.php?id=95953865
[AIX] 80 Load_Level_2.harmony.5mins.Mode112

@pshipton
Copy link
Member Author

pshipton commented Nov 15, 2024

Different state vmState=0x00020012
vmState [0x20012]: not a valid vmState

http://vmfarm.rtp.raleigh.ibm.com/job_output.php?id=96052043
[AIX] 80 Load_Level_2.harmony.5mins.Mode112

j> 13:14:52 #INFO:  No threads were activated following a resume all compilation threads call
j> 13:14:52 Unhandled exception
j> 13:14:52 Type=Segmentation error vmState=0x00020012
j> 13:14:52 J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000032
j> 13:14:52 Handler1=F184CC94 Handler2=F18374D0
j> 13:14:52 R0=DD12806C R1=3768E080 R2=303FA338 R3=4D38DB80
j> 13:14:52 R4=FFFFFFFD R5=355934CC R6=303F0000 R7=00000001
j> 13:14:52 R8=00004842 R9=1532011F R10=00000000 R11=00000000
j> 13:14:52 R12=DCF3973C R13=8B0C9F1D R14=F181FC80 R15=3026B464
j> 13:14:52 R16=303FA188 R17=00000000 R18=000000E4 R19=30267F54
j> 13:14:52 R20=30267F54 R21=30267AD8 R22=30267AD4 R23=000004B0
j> 13:14:52 R24=4BCD28C0 R25=FFFFFFFD R26=4BCF3674 R27=00000130
j> 13:14:52 R28=0000004C R29=00000000 R30=302E0878 R31=4D38DB80
j> 13:14:52 IAR=DCF39698 LR=DD12806C MSR=0200D032 CTR=DCF39680
j> 13:14:52 CR=4200022B FPSCR=BE224000 XER=00000000 TID=00000000
j> 13:14:52 MQ=00000000
j> 13:14:52 FPR0=00000000be224000 (f: 3189915648.000000, d: 1.576028e-314)
j> 13:14:52 FPR1=4330080000000000 (f: 0.000000, d: 4.512396e+15)
j> 13:14:52 FPR2=41f0000000000000 (f: 0.000000, d: 4.294967e+09)
j> 13:14:52 FPR3=4330080000000000 (f: 0.000000, d: 4.512396e+15)
j> 13:14:52 FPR4=3df0000000000000 (f: 0.000000, d: 2.328306e-10)
j> 13:14:52 FPR5=4530000000000000 (f: 0.000000, d: 1.934281e+25)
j> 13:14:52 FPR6=0000000000000001 (f: 1.000000, d: 4.940656e-324)
j> 13:14:52 FPR7=4530000000000000 (f: 0.000000, d: 1.934281e+25)
j> 13:14:52 FPR8=be96f1fe76c94e23 (f: 1992904192.000000, d: -3.419115e-07)
j> 13:14:52 FPR9=bf532915b968f500 (f: 3110663424.000000, d: -1.169463e-03)
j> 13:14:52 FPR10=402e082d34307e21 (f: 875593280.000000, d: 1.501597e+01)
j> 13:14:52 FPR11=3fd55555555450ef (f: 1431589120.000000, d: 3.333333e-01)
j> 13:14:52 FPR12=3c7abc9e3b39803f (f: 993624128.000000, d: 2.319047e-17)
j> 13:14:52 FPR13=4028000000000000 (f: 0.000000, d: 1.200000e+01)
j> 13:14:52 FPR14=0000000000000100 (f: 256.000000, d: 1.264808e-321)
j> 13:14:52 FPR15=3ff000003ff00000 (f: 1072693248.000000, d: 1.000000e+00)
j> 13:14:52 FPR16=0000000000000000 (f: 0.000000, d: 0.000000e+00)
j> 13:14:52 FPR17=0000000000000000 (f: 0.000000, d: 0.000000e+00)
j> 13:14:52 FPR18=00000000000a5c5c (f: 679004.000000, d: 3.354725e-318)
j> 13:14:52 FPR19=00000000000a5c5c (f: 679004.000000, d: 3.354725e-318)
j> 13:14:52 FPR20=00000000000a445c (f: 672860.000000, d: 3.324370e-318)
j> 13:14:52 FPR21=0000000000000000 (f: 0.000000, d: 0.000000e+00)
j> 13:14:52 FPR22=00000000000a5c5c (f: 679004.000000, d: 3.354725e-318)
j> 13:14:52 FPR23=00000000000a5c5c (f: 679004.000000, d: 3.354725e-318)
j> 13:14:52 FPR24=00000000000a445c (f: 672860.000000, d: 3.324370e-318)
j> 13:14:52 FPR25=0000000000000000 (f: 0.000000, d: 0.000000e+00)
j> 13:14:52 FPR26=00000000000a445c (f: 672860.000000, d: 3.324370e-318)
j> 13:14:52 FPR27=0000000000000000 (f: 0.000000, d: 0.000000e+00)
j> 13:14:52 FPR28=0000000004000000 (f: 67108864.000000, d: 3.315618e-316)
j> 13:14:52 FPR29=0000000000000000 (f: 0.000000, d: 0.000000e+00)
j> 13:14:52 FPR30=0000000000000000 (f: 0.000000, d: 0.000000e+00)
j> 13:14:52 FPR31=7ff0000000000000 (f: 0.000000, d: INF)
j> 13:14:52 Target=2_90_20241115_81702 (AIX 7.2)
j> 13:14:52 CPU=ppc (32 logical CPUs) (0x100000000 RAM)
j> 13:14:52 ----------- Stack Backtrace -----------
j> 13:14:52 (0xDD127D24 [libj9jit29.so+0x120ed24])
j> 13:14:52 (0xDD133CE8 [libj9jit29.so+0x121ace8])
j> 13:14:52 (0xDBDA3384 [libj9hookable29.so+0x384])
j> 13:14:52 (0xDD592288 [libj9gc29.so+0x91288])
j> 13:14:52 (0xDD591020 [libj9gc29.so+0x90020])
j> 13:14:52 (0xDD5F4B14 [libj9gc29.so+0xf3b14])
j> 13:14:52 (0xDD5EAAD0 [libj9gc29.so+0xe9ad0])
j> 13:14:52 (0xDD57D0D0 [libj9gc29.so+0x7c0d0])
j> 13:14:52 (0xDD57D8D8 [libj9gc29.so+0x7c8d8])
j> 13:14:52 (0xDD536A0C [libj9gc29.so+0x35a0c])
j> 13:14:52 (0xDD5365A0 [libj9gc29.so+0x355a0])
j> 13:14:52 (0xDD6DF49C [libj9gc29.so+0x1de49c])
j> 13:14:52 (0xDD6DEF68 [libj9gc29.so+0x1ddf68])
j> 13:14:52 (0xDD57D83C [libj9gc29.so+0x7c83c])
j> 13:14:52 (0xDD6BC320 [libj9gc29.so+0x1bb320])
j> 13:14:52 (0xDD698BAC [libj9gc29.so+0x197bac])
j> 13:14:52 (0xDD6A1D0C [libj9gc29.so+0x1a0d0c])
j> 13:14:52 (0xDD6A25C4 [libj9gc29.so+0x1a15c4])
j> 13:14:52 (0xDD6A0434 [libj9gc29.so+0x19f434])
j> 13:14:52 (0xDD6A0518 [libj9gc29.so+0x19f518])
j> 13:14:52 (0xDD584624 [libj9gc29.so+0x83624])
j> 13:14:52 (0xDD5836C0 [libj9gc29.so+0x826c0])
j> 13:14:52 (0xDBD1CDD8 [libj9vm29.so+0x205dd8])
j> 13:14:52 (0xDBBD26E4 [libj9vm29.so+0xbb6e4])
j> 13:14:52 (0xDBB4FC50 [libj9vm29.so+0x38c50])
j> 13:14:52 (0xDBB38AE4 [libj9vm29.so+0x21ae4])
j> 13:14:52 (0xDBE01434 [libj9prt29.so+0x56434])
j> 13:14:52 (0xDBB38920 [libj9vm29.so+0x21920])
j> 13:14:52 (0xDBD8B5CC [libj9thr29.so+0x45cc])
j> 13:14:52 _pthread_body+0xe4 (0xD0579FC8 [libpthreads.a+0x3fc8])
j> 13:14:52 ---------------------------------------

@dmitripivkine
Copy link
Contributor

just curious why vmState=0x00020012 is not recognized as valid vmState. It is defined in the code obviously:

#define OMRVMSTATE_GC_COLLECTOR_CONCURRENTGC (J9VMSTATE_GC | 0x0012)

@pshipton
Copy link
Member Author

The JIT doesn't print GC states. I shouldn't have tried to look it up like I do for JIT states.

@pshipton
Copy link
Member Author

AIX core files can be found under http://vmfarm.rtp.raleigh.ibm.com/etc/cores/tmp/

@IBMJimmyk
Copy link
Contributor

I created a PR for a fix here:
eclipse-omr/omr#7550

I was able to create a small test case to reproduce the failure. The verbose log output looks like this:

+ (warm) RedirectTrampoline.callB(I)I @ 00007DB37A8700F8-00007DB37A8701C4 OrdinaryMethod - Q_SZ=1 Q_SZI=1 QW=13 j9m=00007DB398890FA8 bcsz=40 sync compThreadID=0 CpuLoad=5%(0%avg) JvmCpu=0%
+ (warm) RedirectTrampoline.callA(I)I @ 00007DB37A870278-00007DB37A8702F0 OrdinaryMethod - Q_SZ=1 Q_SZI=1 QW=7 j9m=00007DB398890F88 bcsz=13 sync compThreadID=0 CpuLoad=5%(0%avg) JvmCpu=0%
+ (profiled very-hot) RedirectTrampoline.callB(I)I @ 00007DB37A870378-00007DB37A870668 OrdinaryMethod 3000.00% T Q_SZ=0 Q_SZI=0 QW=100 j9m=00007DB398890FA8 bcsz=40 sync JPROF compThreadID=0 CpuLoad=5%(0%avg) JvmCpu=0%
! (scorching) RedirectTrampoline.callB(I)I Q_SZ=0 Q_SZI=0 QW=100 j9m=00007DB398890FA8 time=755us <TRANSLATION FAILURE: Compilation Exception> compThreadID=0
! (warm) RedirectTrampoline.callB(I)I Q_SZ=0 Q_SZI=0 QW=12 j9m=00007DB398890FA8 time=584us <TRANSLATION FAILURE: Compilation Exception> compThreadID=0

callB is compiled.
callA is compiled. stressTrampolines is set so it uses a trampoline to call callB.
callB is re-compiled at profiled very-hot.
callB is attempted to be re-compiled at scorching but intentionally fails.
callB is attempted to be re-compiled at warm but intentionally fails and no further attempts will be made.

After that, I get a global GC to happen which triggers syncTempTrampolines and without my fix a crash occurs. With my fix, everything is okay.

Crash example:

----------- Stack Backtrace -----------
_Z25ppcCreateMethodTrampolinePvS_S_+0x30 (0x00007DFEC277FA40 [libj9jit29.so+0x5efa40])
_ZN3OMR9CodeCache16createTrampolineEPvS1_P20TR_OpaqueMethodBlock+0x34 (0x00007DFEC2D32764 [libj9jit29.so+0xba2764])
_ZN3OMR9CodeCache19syncTempTrampolinesEv+0x220 (0x00007DFEC2D32B20 [libj9jit29.so+0xba2b20])
_ZN3OMR16CodeCacheManager22synchronizeTrampolinesEv+0xb0 (0x00007DFEC2D35C20 [libj9jit29.so+0xba5c20])
jitHookGlobalGCEnd+0x78 (0x00007DFEC23258C8 [libj9jit29.so+0x1958c8])
J9HookDispatch+0x1a0 (0x00007DFEC3691920 [libj9hookable29.so+0x1920])
_ZN19MM_ParallelGlobalGC11reportGCEndEP18MM_EnvironmentBase+0x250 (0x00007DFEC1FF6530 [libj9gc_full29.so+0x1b6530])
_ZN19MM_ParallelGlobalGC19internalPostCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpace+0x90 (0x00007DFEC1FF7E30 [libj9gc_full29.so+0x1b7e30])
_ZN12MM_Collector11postCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpace+0x80 (0x00007DFEC1F9E480 [libj9gc_full29.so+0x15e480])
_ZN12MM_Collector14garbageCollectEP18MM_EnvironmentBaseP17MM_MemorySubSpaceP22MM_AllocateDescriptionjP28MM_ObjectAllocationInterfaceS3_P20MM_AllocationContext+0x1ec (0x00007DFEC1F9E83C [libj9gc_full29.so+0x15e83c])
_ZN21MM_MemorySubSpaceFlat23allocationRequestFailedEP18MM_EnvironmentBaseP22MM_AllocateDescriptionN17MM_MemorySubSpace14AllocationTypeEP28MM_ObjectAllocationInterfacePS4_S8_+0x32c (0x00007DFEC208D41C [libj9gc_full29.so+0x24d41c])
_ZN24MM_MemorySubSpaceGeneric11allocateTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP28MM_ObjectAllocationInterfaceP17MM_MemorySubSpaceS7_b+0x444 (0x00007DFEC208F704 [libj9gc_full29.so+0x24f704])
_ZN23MM_TLHAllocationSupport7refreshEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x590 (0x00007DFEC1FDD1A0 [libj9gc_full29.so+0x19d1a0])
_ZN23MM_TLHAllocationSupport15allocateFromTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x124 (0x00007DFEC1FDD414 [libj9gc_full29.so+0x19d414])
_ZN25MM_TLHAllocationInterface15allocateFromTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x28 (0x00007DFEC1FDB3F8 [libj9gc_full29.so+0x19b3f8])
_ZN25MM_TLHAllocationInterface14allocateObjectEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP14MM_MemorySpaceb+0x1e8 (0x00007DFEC1FDB628 [libj9gc_full29.so+0x19b628])
_Z21OMR_GC_AllocateObjectP12OMR_VMThreadP25MM_AllocateInitialization+0x100 (0x00007DFEC1FE4500 [libj9gc_full29.so+0x1a4500])
J9AllocateObject+0x470 (0x00007DFEC1EA4FB0 [libj9gc_full29.so+0x64fb0])
bytecodeLoopFull+0x13f30 (0x00007DFEC38B7A70 [libj9vm29.so+0x107a70])
 (0x00007DFEC393B440 [libj9vm29.so+0x18b440])
runCallInMethod+0x27c (0x00007DFEC37CE71C [libj9vm29.so+0x1e71c])
gpProtectedRunCallInMethod+0x50 (0x00007DFEC37FC540 [libj9vm29.so+0x4c540])
signalProtectAndRunGlue+0x28 (0x00007DFEC3950668 [libj9vm29.so+0x1a0668])
omrsig_protect+0x358 (0x00007DFEC373D488 [libj9prt29.so+0x3d488])
gpProtectAndRun+0xac (0x00007DFEC395073C [libj9vm29.so+0x1a073c])
gpCheckCallin+0xc4 (0x00007DFEC37FF1C4 [libj9vm29.so+0x4f1c4])
callStaticVoidMethod+0x48 (0x00007DFEC37FBB48 [libj9vm29.so+0x4bb48])
JavaMain+0x14ec (0x00007DFEC427D9DC [libjli.so+0xd9dc])
ThreadJavaMain+0x18 (0x00007DFEC4282BD8 [libjli.so+0x12bd8])
start_thread+0xe8 (0x00007DFEC4228838 [libpthread.so.0+0x8838])
clone+0x74 (0x00007DFEC40FBA44 [libc.so.6+0x14ba44])

@IBMJimmyk
Copy link
Contributor

Fixes for the syncTempTrampolines version of this problem have been merged in:
#20657
eclipse-omr/omr#7550

This is not expected to fix #20546 but I am moving on to looking at that one next since the problem looks similar.

@IBMJimmyk
Copy link
Contributor

0.49.0 version of the above PRs:
#20720
eclipse-openj9/openj9-omr#217

Copy link

Issue Number: 20567
Status: Closed
Actual Components: comp:jit, test failure, blocker, segfault
Actual Assignees: No one :(
PR Assignees: IBMJimmyk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker comp:jit segfault Issues that describe segfaults / JVM crashes test failure
Projects
None yet
Development

No branches or pull requests

5 participants