-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cwe119_cgd.txt and cwe399_cgd.txt with right format #8
Comments
Can you explain the details of the problem? |
For example,second line of file cwe119_cgd.txt : |
Thanks. We take advantage of the commercial product Checkmarx to extract the program slices which are then assembled into code gadgets. The problem you mentioned in some code gadgets is caused by the imperfect result of Checkmarx. For example, "ZIP_FILENAME_LEN, NULL, 0, NULL, 0 )" is from the following code which Checkmarx can only extract line 440, but cannot extract line 439 and 441. 439 if( unzGetCurrentFileInfo( file, p_fileInfo, psz_fileName, |
|
@hungryfoolou each token is converted to a vector, which means for each code gadget you will get a vector of vectors. The size of each token-vector is fixed (these come from the word2vec tool), while the number of token-vectors in each gadgets naturally depends on how many tokens are in each gadget. These vectors are padded or truncated as described in the paper. |
@dengelt I highly appreciate your help,thanks. |
What is the length of each token-vector? As the paper said, it maps a token to an integer that is then converted to a fixed-length vector. Does it mean that each token-vector is represented by its index in the token list? |
I don't think this is specified in the paper, or at least I couln't find it. It would be interesting if @VulDeePecker could tell us which value they used.
If you have a list of token-vectors and you know the index of each token, then you can convert your token to its vector representation in this way. |
I totally agree with you. It would be interesting if @VulDeePecker could tell us which value they used. If using index there is no need to use word2vec. But I guess it is close to the answer because I cannot another way to map a token to an integer as mentioned in 3-E2 STEP3.2 encoding the symbolic representations into vectors. |
Due to the imperfect result of Checkmarx, the extracted code gadgets are in the wrong format. May I know are you @VulDeePecker using the wrong format of code gadgets (as provided in cwe119_cgd.txt) to generate the final result? If it is the case, will the final result be trustable? |
I found some mistakes in file cwe119_cgd.txt and cwe399_cgd.txt.Could you present the file cwe119_cgd.txt and cwe399_cgd.txt with right format?
The text was updated successfully, but these errors were encountered: