-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callback on include #727
Callback on include #727
Conversation
…e callback's result to import rules files.
Any comments @plusvic ? Anything I should improve to have it merged? |
libyara/compiler.c
Outdated
void* user_data) | ||
{ | ||
compiler->include_callback = include_callback; | ||
compiler->user_data = user_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiler->user_data
is being used by the error callback set with yr_compiler_set_callback
, you shouldn't modify its value because both callbacks could be used at the same time.
docs/capi.rst
Outdated
*``include_name``: name of the requested file. | ||
*``calling_rule_filename``: the requesting file name (NULL if not a file). | ||
*``calling_rule_namespace``: namespace (NULL if undefined). | ||
And should return the requested file as a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no mention to who's responsible for freeing the memory pointed to by the callback's result. I assume that the external code is responsible for doing that, but it's probably a good idea to clearly state it in the documentation.
libyara/lexer.l
Outdated
} | ||
// Workaround for flex issue: https://github.com/westes/flex/issues/58 | ||
yypush_buffer_state(YY_CURRENT_BUFFER, yyscanner); | ||
yy_scan_string(res, yyscanner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you are passing the pointer received from the callback to yy_scan_string
, this is fine as long as it points to memory allocated in the heap (if the callback returns a pointer to it's own stack the memory will be invalid as soon as the callback returns), but this introduces again the question of who's the responsible for freeing the memory and when it's safe to do it. Those question should be addressed in the documentation.
I'm not sure if the right approach here is making the external code responsible for freeing the memory, forcing the callback to allocate the returned buffer with a specific function provide by YARA (it could be simply yr_malloc
), and letting YARA free that buffer when the parsing is done seems like a safer alternative.
With the current design is trickier to get everything right. As an example, I think you have a memory leak in the yara-python code that uses this feature (more details in VirusTotal/yara-python#58)
libyara/lexer.l
Outdated
snprintf(buffer, sizeof(buffer), | ||
"can't open include file: %s", yyextra->lex_buf); | ||
yyerror(yyscanner, compiler, buffer); | ||
const char* res = compiler->include_callback(yyextra->lex_buf, current_file_name, compiler->current_namespace->name, compiler->user_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep line length under 80 characters.
* Limited lines to 80 chars * include callback is now using its own user_data and does not conflict with other callbacks * Copying include callback result to the stack before pushing it into the flex scanner * yara's memory management functions are now exposed through the API * Making the lexer reponsible for freeing the memory allocated by the include callback * Updating docs accordingly
Thanks for the feedback @plusvic , I believe I fixed everything accordingly. Could you please tell me if you see any other issues? If everything is OK, I will proceed with the corresponding fixes in yara-python. |
libyara/lexer.c
Outdated
if (res != NULL) | ||
{ | ||
int error_code = _yr_compiler_push_file_name(compiler, yyextra->lex_buf); | ||
const char* res_dup = yr_strdup(res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to duplicate the string, now it's safe to pass the pointer returned by the callback to yara_yy_scan_string
and free it after that function returns.
docs/capi.rst
Outdated
@@ -695,6 +699,31 @@ Functions | |||
Enables the specified rule. After being disabled with :c:func:`yr_rule_disable` | |||
a rule can be enabled again by using this function. | |||
|
|||
.. c:function:: void* yr_calloc(size_t count, size_t size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of exposing all these functions as part of the API I would do something simpler but even more flexible. Add an argument to yr_compiler_set_include_callback
which is a pointer to the free
function that YARA should use to deallocate the memory returned by the callback. This way the program can use whatever allocator it wants, as long as the program provides the appropriate free
function. The program could even use a statically allocated buffer and pass YARA a NULL instead of a pointer to the free
function, indicating the buffer don't need to be de-allocated at all. For example:
typedef void (*YR_COMPILER_INCLUDE_FREE_FUNC) (void*)
YR_API void yr_compiler_set_include_callback(
YR_COMPILER* compiler,
YR_COMPILER_INCLUDE_CALLBACK_FUNC include_callback,
YR_COMPILER_INCLUDE_FREE_FUNC include_free,
void* user_data)
libyara/lexer.l
Outdated
#else | ||
if (s != NULL) | ||
#endif | ||
if (compiler->include_callback == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elegant way of implementing this is handling all includes with the callback, and providing a default callback that implements the standard includes. That way you don't need this if
statement here. For the usual includes you can implement a default callback that YARA uses unless some other callback is specified by the user.
* adding a callback to allow external code to free the memory used to return the include callback's result * API: un-exposing yara's memory management functions * making include callbacks the only way ton include files * adding a default include callback to open files from the disk (mimicking original behavior) * updating docs accordingly
@plusvic I followed your suggestion but I changed the free function signature a little bit: |
The patch is quite neat and nifty for https://github.com/CIRCL/yara-validator. Will you plan to merge it soon? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting better. Just two memory leaks to be fixed.
{ | ||
if(file_length != fread(file_buffer, 1, file_length, fh)) | ||
{ | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a memory leak here. If the file can't be fully read for some reason file_buffer
is never freed. Add a yr_free(file_buffer)
before the return
statement. A call to fclose
is also required to prevent leaking file handles.
base_err_msg = "callback failed to provide include resource: %s"; | ||
} | ||
size_t err_buff_len = sizeof(base_err_msg) + sizeof(yyextra->lex_buf); | ||
char* error_buffer = (char*) yr_malloc(err_buff_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a memory leak here, error_buffer
is never freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(base_err_msg)
returns the pointer's size, not the buffer's size.
Already merged in c0fa0bf |
Adds the possibility to fetch 'include' rules files from external data sources (eg: database or remote server) by allowing developers to set a custom callback for the 'include' directive.
This update allows developers to create interfaces between yara and other services without requiring write access on the disk to store rules source files.
If the callback function is not set, yara behaves as in previous versions and tries to get the requested file from the disk. If it is set, yara will not try to read the file from disk.
Setting a callback is done via:
void yr_compiler_set_include_callback(YR_COMPILER* compiler, YR_COMPILER_INCLUDE_CALLBACK_FUNC callback, void* user_data)
The callback itself must have the following signature:
Parameters
include_name
: name of the requested file as stated by the 'import' directive.calling_rule_filename
: the requesting file name (NULL if not a file, ie: compiling from string).calling_rule_namespace
: namespace (NULL if undefined).user_data
: additional user data.Returns: