Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: enable JIT on Linux #1591

Merged
merged 6 commits into from
Oct 3, 2016
Merged

JIT: enable JIT on Linux #1591

merged 6 commits into from
Oct 3, 2016

Conversation

jianchun
Copy link

@jianchun jianchun commented Sep 16, 2016

Splitted into 4 commits.

JIT: enable JIT on Linux

Primary changes to JIT on Linux.

Custom calling convention
-------------------------

Calls to native helpers conform to Sys V AMD64 ABI.
    - Change all code to use correct arg registers.

Calls to JsMethod and native helpers that depend on
ARGUMENTS/RUNTIME_ARGUMENTS use custom calling convention:
    - Caller home arg registers onto stack.

Implemented all needed assembly thunks.

Unwind data
-----------

Emit .eh_frame data for dynamic interpreter thunks and all jit functions.
    - For less noise, the new code hides behind existing
        PrologEncoder/PData/XData.

Misc
----

ARGUMENTS/RUNTIME_ARGUMENTS:
    - Extended to support all runtime usages.
    - Limitation: known arg list must end with "callInfo".

Int32Math:
    - Use builtins to test signed int overflow. Original code does not work
        due to clang optimization.

TODO
    - AsmJs: missing ArrayBuffer out of bound access recovery.
    - Simd.

JIT: to compile on Linux

Noisy changes to compile native code gen code on Linux with clang.
Most changes are:
    - Fix missing parentheses around `&&` within `||` expressions
    - Fix `char16/_u` issues
    - Fix incorrect `inline`s

JIT: PAL related changes

PAL related changes to build JIT on Linux.
    - pal.h: bug fixes in BitScan.
    - Add missing APIs, enable Event.

JIT: build and test changes

Build:
    - Add backend files.
    - Enable --no-jit option, and as default on OSX (temporarily).

Test:
    - Add `dynapogo` test variant. Change `interpreted` test variant to
        create dynamic profiles and input to `dynapogo` test variant.
        To ensure the same profile file is used for these 2 variants,
        TestLoader (new) assigns each test case a unique test id, and uses
        the id to generate dynamic profile file name.
    - Add 'disable_jit` test variant and as default on OSX (temporarily).
    - Exclude 2 failing tests that need further work.

@jianchun
Copy link
Author

jianchun commented Sep 16, 2016

@LouisLaf @curtisman @digitalinfinity @obastemur Please CR, thanks!

Update: Please hold on CR. OOP JIT is merged and has some significant conflict.

Update: Completed merging with OOP JIT. Please CR.

@jianchun jianchun changed the title jit: enable jit on Linux JIT: enable jit on Linux Sep 20, 2016
@jianchun jianchun changed the title JIT: enable jit on Linux JIT: enable JIT on Linux Sep 20, 2016
@@ -184,6 +186,10 @@ while [[ $# -gt 0 ]]; do
ICU_PATH="-DNO_ICU_PATH_GIVEN_SH=1"
;;

--no-jit)
NO_JIT="-DNO_JIT=1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it like -DNO_JIT_SH=1 and under the CMake file,

if(NO_JIT_SH)
    unset(NO_JIT_SH Cache)
    set(NO_JIT 1)
else()
    unset(NO_JIT Cache)
endif()

This is needed to switch between interpreter / jit builds without dealing with folder cleanup.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try this, thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, thanks! I'm using NO_JIT in place of NO_JIT_SH as NO_JIT is not needed afterwards.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:_SH suffix is useful for remembering the source of the variable though. The rest of the definitions from build.sh use the same suffix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obastemur Agreed, changed 3 variables (all added by me) to _sh suffix in latest update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianchun Thanks

}

// update length record
uword length = writer->Count() - beginOffset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert(sizeof(length) + beginOffset <= writer->Count()) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this assert to be very useful. Once Begin() was called this would hardly fail as the same uword type is written again at beginOffset. Maybe I can assert beginOffset != -1 to ensure Begin() was called?



#ifndef __APPLE__
// BailOutRecord::BailOut(BailOutRecord const * bailOutRecord)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally left?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 symbols are used as JMP targets in this file. Don't know why it works with these commented out. Not sure if OSX needs them. Can cleanup when enabling this for OSX.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that part is ifndef __APPLE__ I don't think it has anything to do with OSX

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, originally I thought I needed them on Linux that's why they are under ifndef __APPLE__. I can remove ifndef and only keep the commented .externs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ifndef __APPLE__.

static const RegNum s_argRegs[IntArgRegsCount] = {
#define REG_INT_ARG(Index, Name) Reg ## Name,
#include "RegList.h"
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undef REG_INT_ARG ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this file is included many times and the file has default #defines, I added #undefs in the file itself.

reg = Reg ## Name; \
break;
#include "RegList.h"
default:
Copy link
Collaborator

@obastemur obastemur Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undef REG_INT_ARG?

*pResult = val + 1;
return val == INT32_MAX; // Overflow if val was int max
#else
return __builtin_add_overflow(val, 1, pResult);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we haven't use this for __APPLE__ too ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppleClang does not have this builtin (reported by CI).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually AppleClang has that. Don't know why CI complained.

Instead of making it ifdef __APPLE__ we better feature detect from clang.

#ifdef __has_builtin
#if __has_builtin(__builtin_add_overflow)

Same applies to other builtin comments below

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do that. Does this work on gcc?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on gcc?

Hopefully for GCC 6.0. We don't support GCC though. There are so many problems with GCC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obastemur what are those problems?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DemiMarie not a small list. i.e. Clang has some useful flags for C/C++/ASM code that is initially developed for VC/VC++

*pResult = val - 1;
return val == INT32_MIN; // Overflow if val was int min
#else
return __builtin_sub_overflow(val, 1, pResult);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we haven't use this for APPLE too ?

#define VA_LIST_TO_VARARRAY(vl, va, callInfo) Js::Var* va = (Js::Var*) vl;
#define DECLARE_ARGS_VARARRAY(va, ...) \
va_list _vl; \
va_start(_vl, callInfo); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We better call va_end once we are done with it. Previously, I had cleaned up the va leaks. See #1507

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Windows x64 only and the macro does not know when we are done with it (we cast _vl to Var* and using it).


//extrn ?GetStackSizeForAsmJsUnboxing@Js@@YAHPEAVScriptFunction@1@@Z: PROC
//extrn ?UnboxAsmJsArguments@Js@@YAPEAXPEAVScriptFunction@1@PEAPEAXPEADUCallInfo@1@@Z : PROC
// extrn ?BoxAsmJsReturnValue@Js@@YAPEAXPEAVScriptFunction@1@HNM@Z : PROC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space

#define swprintf_s _snwprintf
#define vsprintf_s _vsnprintf
#define vswprintf_s _vsnwprintf
// #define sprintf_s _snprintf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might better remove them?

@jianchun jianchun force-pushed the jit_pr branch 5 times, most recently from 2b8be73 to d326567 Compare September 24, 2016 03:27
@jianchun
Copy link
Author

@rajatd @obastemur @curtisman Updated, please review, thanks!

@@ -3141,7 +3141,7 @@ Opnd::DumpOpndKindMemRef(bool AsmDumpMode, Func *func)
void
Opnd::WriteToBuffer(_Outptr_result_buffer_(*count) char16 **buffer, size_t *count, const char16 *fmt, ...)
{
va_list argptr = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the initialization?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

va_list is unnecessarily a pointer type (don't recall exactly, likely can't be initialized to nullptr). It is initialized by va_start.

@@ -183,6 +183,12 @@
#endif
#endif

#if ENABLE_NATIVE_CODEGEN
#ifdef _WIN32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| with _WIN64?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_WIN32 contains _WIN64 (https://msdn.microsoft.com/en-us/library/b0084kay.aspx):

_WIN32 Defined as 1 when the compilation target is 32-bit ARM, 64-bit ARM, x86, or x64. Otherwise, undefined.

_WIN64 Defined as 1 when the compilation target is 64-bit ARM or x64. Otherwise, undefined. 

};

const int callArgCount = static_cast<int>(argCount) + this->helperCallArgsCount;
const int argRegs = min(callArgCount, static_cast<int>(IntArgRegsCount));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min(callArgCount, static_cast(IntArgRegsCount)); [](start = 28, length = 53)

XmmArgRegsCount is 8. If there were, in fact, 8 float arguments, wouldn't we be missing to home 2 of them in the below loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually only homes the int registers, corresponding to our runtime usage (mostly function, callInfo, Js::Var... which don't use float registers). I'm not sure of AsmJs/Simd requirements. This PR has some basic thunk assembly implementations targeting AsmJs unit tests. Simd not looked at yet.

@rajatd
Copy link
Contributor

rajatd commented Sep 28, 2016

#ifdef _WIN64

change this to TARGET_64?


Refers to: lib/Backend/Encoder.cpp:689 in ded0397. [](commit_id = ded0397, deletion_comment = False)

@rajatd
Copy link
Contributor

rajatd commented Sep 28, 2016

#if _WIN64

change this to TARGET_64?


Refers to: lib/Backend/BackwardPass.cpp:163 in ded0397. [](commit_id = ded0397, deletion_comment = False)

@jianchun
Copy link
Author

change this to TARGET_64?

Sure. Maybe I can search for all _WIN64 usage -- likely they should all be TARGET_64?

#include "RegList.h"
};

const int callArgCount = static_cast<int>(argCount) + this->helperCallArgsCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helperCallArgsCount [](start = 68, length = 19)

Can helperCallArgsCount be non-zero here? If yes, then the loop below could potentially overwrite some of the stack slots written to in the above while(argsLeft>0) loop. Please verify this and add asserts to make sure that only one of these loops run. Some comments about which loop is used for what kind of call would be nice too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will test that. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatd I looked at this issue. Yes helperCallArgsCount can be non-zero. The reason is that many lowering code uses LowerCallArgs() to lower variadic JS runtime args, and LoadHelperArgument() to lower other "known args". So the full args length is actually their sum, while argCount > 0 is an indicator that we need variadic args support so we need to manually home args.

Your concern of manual homing args here overwriting above helperCallArgs stack slots could potentially happen... e.g. helperCallArgs == 7 so it has an arg at stack[0]. I can fix this by changing above /*isHelper*/true to argCount == 0 so that their assumptions are consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using "argCount == 0" as the argument to GetArgSlotOpnd should ensure correctness.


In reply to: 81197219 [](ancestors = 81197219)

@jianchun
Copy link
Author

Maybe I can search for all _WIN64 usage -- likely they should all be TARGET_64?

To avoid more noise in this PR, I'll follow up afterwards. We already have #963.


inline Js::Var* _get_va(void* addrOfReturnAddress, int n)
{
Js::Var* pArgs = reinterpret_cast<Js::Var*>(addrOfReturnAddress) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 [](start = 71, length = 1)

A comment about what this '1' is for, would be good. (saved frame pointer, right?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, RA (ReturnAddress). Args are after RA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, my bad.


In reply to: 81249310 [](ancestors = 81249310)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a comment, thanks!


inline int _count_args(Js::CallInfo callInfo)
{
return 2; // for typical JsMethod with 2 known args "function, callInfo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to somehow enforce/assert here that the other argument is a JavascriptFunction?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to support the majority usages of ARGUMENTS(args, callInfo). I can't think of a good way to check that. If it is used incorrectly runtime likely to fail. Hope tests can guard that.

@rajatd
Copy link
Contributor

rajatd commented Sep 30, 2016

Reviewed most of the backend changes and left relevant comments. Haven't reviewed PDataManager, PrologEncoder, EhFrame and XDataAllocator as I dont know much about that area. Maybe @curtisman @pleath or @LouisLaf can help with that.

@curtisman
Copy link
Contributor

:shipit:

@jianchun jianchun force-pushed the jit_pr branch 2 times, most recently from 03de797 to d126342 Compare October 3, 2016 19:29
Jianchun Xu added 6 commits October 3, 2016 12:35
Primary changes to JIT on Linux.

Custom calling convention
-------------------------

Calls to native helpers conform to Sys V AMD64 ABI.
    - Change all code to use correct arg registers.

Calls to JsMethod and native helpers that depend on
ARGUMENTS/RUNTIME_ARGUMENTS use custom calling convention:
    - Caller home arg registers onto stack.

Implemented all needed assembly thunks.

Unwind data
-----------

Emit .eh_frame data for dynamic interpreter thunks and all jit functions.
    - For less noise, the new code hides behind existing
        PrologEncoder/PData/XData.

Misc
----

ARGUMENTS/RUNTIME_ARGUMENTS:
    - Extended to support all runtime usages.
    - Limitation: known arg list must end with "callInfo".

Int32Math:
    - Use builtins to test signed int overflow. Original code does not work
        due to clang optimization.

TODO
    - AsmJs: missing ArrayBuffer out of bound access recovery.
    - Simd.
Noisy changes to compile native code gen code on Linux with clang.
Most changes are:
    - Fix missing parentheses around `&&` within `||` expressions
    - Fix `char16/_u` issues
    - Fix incorrect `inline`s
PAL related changes to build JIT on Linux.
    - pal.h: bug fixes in BitScan.
    - Add missing APIs, enable Event.
Build:
    - Add backend files.
    - Enable --no-jit option, and as default on OSX (temporarily).

Test:
    - Add `dynapogo` test variant. Change `interpreted` test variant to
        create dynamic profiles and input to `dynapogo` test variant.
        To ensure the same profile file is used for these 2 variants,
        TestLoader (new) assigns each test case a unique test id, and uses
        the id to generate dynamic profile file name.
    - Add 'disable_jit` test variant and as default on OSX (temporarily).
    - Exclude 2 failing tests that need further work.
Address CR comments.

Refactored IntMath classes to share common code and use builtins when
available.

Cleaned up signed integer overflow cases. Added build.sh --sanity=...
support to help future diagnostics. Turned off CLANG signed integer
overflow optimization with `-fwrapv` switch.

Fixed a ICU call bug (found in running Octane/typescript.js).

Implemented another AsmJs thunk. (called by Octane/zlib.js).

Synced to latest and fix merge errors.
also merge to latest.
@chakrabot chakrabot merged commit 68e819f into chakra-core:master Oct 3, 2016
chakrabot pushed a commit that referenced this pull request Oct 3, 2016
Merge pull request #1591 from jianchun:jit_pr

Splitted into 4 commits.

JIT: enable JIT on Linux

    Primary changes to JIT on Linux.

    Custom calling convention
    -------------------------

    Calls to native helpers conform to Sys V AMD64 ABI.
        - Change all code to use correct arg registers.

    Calls to JsMethod and native helpers that depend on
    ARGUMENTS/RUNTIME_ARGUMENTS use custom calling convention:
        - Caller home arg registers onto stack.

    Implemented all needed assembly thunks.

    Unwind data
    -----------

    Emit .eh_frame data for dynamic interpreter thunks and all jit functions.
        - For less noise, the new code hides behind existing
            PrologEncoder/PData/XData.

    Misc
    ----

    ARGUMENTS/RUNTIME_ARGUMENTS:
        - Extended to support all runtime usages.
        - Limitation: known arg list must end with "callInfo".

    Int32Math:
        - Use builtins to test signed int overflow. Original code does not work
            due to clang optimization.

    TODO
        - AsmJs: missing ArrayBuffer out of bound access recovery.
        - Simd.

JIT: to compile on Linux

    Noisy changes to compile native code gen code on Linux with clang.
    Most changes are:
        - Fix missing parentheses around `&&` within `||` expressions
        - Fix `char16/_u` issues
        - Fix incorrect `inline`s

JIT: PAL related changes

    PAL related changes to build JIT on Linux.
        - pal.h: bug fixes in BitScan.
        - Add missing APIs, enable Event.

JIT: build and test changes

    Build:
        - Add backend files.
        - Enable --no-jit option, and as default on OSX (temporarily).

    Test:
        - Add `dynapogo` test variant. Change `interpreted` test variant to
            create dynamic profiles and input to `dynapogo` test variant.
            To ensure the same profile file is used for these 2 variants,
            TestLoader (new) assigns each test case a unique test id, and uses
            the id to generate dynamic profile file name.
        - Add 'disable_jit` test variant and as default on OSX (temporarily).
        - Exclude 2 failing tests that need further work.
@jianchun
Copy link
Author

jianchun commented Oct 3, 2016

Many thanks to @rajatd @obastemur @curtisman for reviewing this huge PR!
@pleath Please let me know if you have comments, I'll address later.

@jianchun jianchun deleted the jit_pr branch October 3, 2016 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants