Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python bindings #2

Open
4 tasks
joachimmetz opened this issue Oct 2, 2014 · 14 comments
Open
4 tasks

Add Python bindings #2

joachimmetz opened this issue Oct 2, 2014 · 14 comments
Assignees

Comments

@joachimmetz
Copy link
Member

joachimmetz commented Oct 2, 2014

Complete work on Python bindings

  • complete multi value
  • add name to id map entry object
  • add message store object
  • add tests
@kevthehermit
Copy link

I appreciate this is still WIP.
Im writing a native PST Parser for a DFIR Platform. I have most of the library working the way i need it to.
See Gist here - https://gist.github.com/kevthehermit/40ff03e3adc524ece04f735e68be43c2

With the exception of Extracting attachments.
I see this partially exists -

/* TODO create attachment item

Any timeline on this feature and anything i can do to help?

Just trying to avoid OS calls.

Appreciate all the work you have done.

@kujenga
Copy link

kujenga commented Dec 12, 2016

I'm also curious about support for attachments. I'd be happy to take a look at implementing support for it and opening a PR if you could give me some pointers as to what needs to be done @joachimmetz

@joachimmetz
Copy link
Member Author

Sorry for the slower response, quite busy at the moment. What needs to be done is adding attachment support to the Python bindings. Which is largely a wrapper around the API. Unless you're very familiar with writing Python bindings, it is likely faster for me to implement it than explain it. But if you want you have a look at the current code.

@kujenga
Copy link

kujenga commented Dec 23, 2016

No worries, and fair enough! I don't have any prior experience with writing Python bindings unfortunately. Do you have a sense of when you may have the time to tackle this?

I see this commented out code:

/* TODO create attachment item
PyObject *pypff_message_get_attachment_by_index(
pypff_item_t *pypff_item,
int attachment_index );
PyObject *pypff_message_get_attachment(
pypff_item_t *pypff_item,
PyObject *arguments,
PyObject *keywords );
PyObject *pypff_message_get_attachments(
pypff_item_t *pypff_item,
PyObject *arguments );
*/

And at first glance what look like the necessary implementations here:

libpff/pypff/pypff_message.c

Lines 2089 to 2274 in 3222929

/* Retrieves a specific attachment by index
* Returns a Python object if successful or NULL on error
*/
PyObject *pypff_message_get_attachment_by_index(
pypff_item_t *pypff_item,
int attachment_index )
{
libcerror_error_t *error = NULL;
libpff_item_t *sub_item = NULL;
PyObject *sub_item_object = NULL;
static char *function = "pypff_message_get_attachment_by_index";
uint8_t sub_item_type = 0;
int result = 0;
if( pypff_item == NULL )
{
PyErr_Format(
PyExc_TypeError,
"%s: invalid item.",
function );
return( NULL );
}
Py_BEGIN_ALLOW_THREADS
result = libpff_message_get_attachment(
pypff_item->item,
attachment_index,
&sub_item,
&error );
Py_END_ALLOW_THREADS
if( result != 1 )
{
pypff_error_raise(
error,
PyExc_IOError,
"%s: unable to retrieve attachment: %d.",
function,
attachment_index );
libcerror_error_free(
&error );
goto on_error;
}
Py_BEGIN_ALLOW_THREADS
result = libpff_item_get_type(
sub_item,
&sub_item_type,
&error );
Py_END_ALLOW_THREADS
if( result != 1 )
{
pypff_error_raise(
error,
PyExc_IOError,
"%s: unable to retrieve attachment: %d type.",
function,
attachment_index );
libcerror_error_free(
&error );
goto on_error;
}
sub_item_object = pypff_item_new(
&pypff_message_type_object,
sub_item,
pypff_item->file_object );
if( sub_item_object == NULL )
{
PyErr_Format(
PyExc_MemoryError,
"%s: unable to create attachment object.",
function );
goto on_error;
}
return( sub_item_object );
on_error:
if( sub_item != NULL )
{
libpff_item_free(
&sub_item,
NULL );
}
return( NULL );
}
/* Retrieves a specific attachment
* Returns a Python object if successful or NULL on error
*/
PyObject *pypff_message_get_attachment(
pypff_item_t *pypff_item,
PyObject *arguments,
PyObject *keywords )
{
PyObject *sub_item_object = NULL;
static char *keyword_list[] = { "attachment_index", NULL };
int attachment_index = 0;
if( PyArg_ParseTupleAndKeywords(
arguments,
keywords,
"i",
keyword_list,
&attachment_index ) == 0 )
{
return( NULL );
}
sub_item_object = pypff_message_get_attachment_by_index(
pypff_item,
attachment_index );
return( sub_item_object );
}
/* Retrieves an items sequence and iterator object for the attachments
* Returns a Python object if successful or NULL on error
*/
PyObject *pypff_message_get_attachments(
pypff_item_t *pypff_item,
PyObject *arguments PYPFF_ATTRIBUTE_UNUSED )
{
libcerror_error_t *error = NULL;
PyObject *sub_items_object = NULL;
static char *function = "pypff_message_get_attachments";
int number_of_attachments = 0;
int result = 0;
PYPFF_UNREFERENCED_PARAMETER( arguments )
if( pypff_item == NULL )
{
PyErr_Format(
PyExc_TypeError,
"%s: invalid item.",
function );
return( NULL );
}
Py_BEGIN_ALLOW_THREADS
result = libpff_message_get_number_of_attachments(
pypff_item->item,
&number_of_attachments,
&error );
Py_END_ALLOW_THREADS
if( result != 1 )
{
pypff_error_raise(
error,
PyExc_IOError,
"%s: unable to retrieve number of attachments.",
function );
libcerror_error_free(
&error );
return( NULL );
}
sub_items_object = pypff_items_new(
pypff_item,
&pypff_message_get_attachment_by_index,
number_of_attachments );
if( sub_items_object == NULL )
{
PyErr_Format(
PyExc_MemoryError,
"%s: unable to create sub items object.",
function );
return( NULL );
}
return( sub_items_object );
}

Is there much left to do beyond uncommenting the headers and other definitions?

@0x6e69636f
Copy link

0x6e69636f commented Jan 9, 2017

I'm also interested in this binding, but have no experience writing this kind of things
Should I write a program in C instead or is the C lib dedicated to be used with Python ?

@joachimmetz
Copy link
Member Author

@NicolasBD the Python binding uses the library, either compiled into it or as a separate shared library

@xvolte
Copy link

xvolte commented Aug 15, 2018

Hello @joachimmetz,

did you had the time to implement the python bindings for msg attachments ?

it seems that @kujenga saw the code implemented somewhere but i can't use it so far.

i'm using the pip package (pip install pypff) which works well for parsing PST files and reading each msg one by one, but i also need to have access to attachements ...

thanks in advance,

@joachimmetz
Copy link
Member Author

did you had the time to implement the python bindings for msg attachments ?

Unfortunately no, I have a very busy schedule, and numerous things consume the little available time I have.

it seems that @kujenga saw the code implemented somewhere but i can't use it so far.

I'm unfamiliar with this work, nor has there been a PR from this person. There was #56 however that was closed by the author without any explanation.

@xvolte
Copy link

xvolte commented Aug 15, 2018

Looks like it is already written in your code ?

libpff/pypff/pypff_message.c

Lines 2089 to 2274 in 3222929

/* Retrieves a specific attachment by index
* Returns a Python object if successful or NULL on error
*/
PyObject *pypff_message_get_attachment_by_index(
pypff_item_t *pypff_item,
int attachment_index )
{
libcerror_error_t *error = NULL;
libpff_item_t *sub_item = NULL;
PyObject *sub_item_object = NULL;
static char *function = "pypff_message_get_attachment_by_index";
uint8_t sub_item_type = 0;
int result = 0;
if( pypff_item == NULL )
{
PyErr_Format(
PyExc_TypeError,
"%s: invalid item.",
function );
return( NULL );
}
Py_BEGIN_ALLOW_THREADS
result = libpff_message_get_attachment(
pypff_item->item,
attachment_index,
&sub_item,
&error );
Py_END_ALLOW_THREADS
if( result != 1 )
{
pypff_error_raise(
error,
PyExc_IOError,
"%s: unable to retrieve attachment: %d.",
function,
attachment_index );
libcerror_error_free(
&error );
goto on_error;
}
Py_BEGIN_ALLOW_THREADS
result = libpff_item_get_type(
sub_item,
&sub_item_type,
&error );
Py_END_ALLOW_THREADS
if( result != 1 )
{
pypff_error_raise(
error,
PyExc_IOError,
"%s: unable to retrieve attachment: %d type.",
function,
attachment_index );
libcerror_error_free(
&error );
goto on_error;
}
sub_item_object = pypff_item_new(
&pypff_message_type_object,
sub_item,
pypff_item->file_object );
if( sub_item_object == NULL )
{
PyErr_Format(
PyExc_MemoryError,
"%s: unable to create attachment object.",
function );
goto on_error;
}
return( sub_item_object );
on_error:
if( sub_item != NULL )
{
libpff_item_free(
&sub_item,
NULL );
}
return( NULL );
}
/* Retrieves a specific attachment
* Returns a Python object if successful or NULL on error
*/
PyObject *pypff_message_get_attachment(
pypff_item_t *pypff_item,
PyObject *arguments,
PyObject *keywords )
{
PyObject *sub_item_object = NULL;
static char *keyword_list[] = { "attachment_index", NULL };
int attachment_index = 0;
if( PyArg_ParseTupleAndKeywords(
arguments,
keywords,
"i",
keyword_list,
&attachment_index ) == 0 )
{
return( NULL );
}
sub_item_object = pypff_message_get_attachment_by_index(
pypff_item,
attachment_index );
return( sub_item_object );
}
/* Retrieves an items sequence and iterator object for the attachments
* Returns a Python object if successful or NULL on error
*/
PyObject *pypff_message_get_attachments(
pypff_item_t *pypff_item,
PyObject *arguments PYPFF_ATTRIBUTE_UNUSED )
{
libcerror_error_t *error = NULL;
PyObject *sub_items_object = NULL;
static char *function = "pypff_message_get_attachments";
int number_of_attachments = 0;
int result = 0;
PYPFF_UNREFERENCED_PARAMETER( arguments )
if( pypff_item == NULL )
{
PyErr_Format(
PyExc_TypeError,
"%s: invalid item.",
function );
return( NULL );
}
Py_BEGIN_ALLOW_THREADS
result = libpff_message_get_number_of_attachments(
pypff_item->item,
&number_of_attachments,
&error );
Py_END_ALLOW_THREADS
if( result != 1 )
{
pypff_error_raise(
error,
PyExc_IOError,
"%s: unable to retrieve number of attachments.",
function );
libcerror_error_free(
&error );
return( NULL );
}
sub_items_object = pypff_items_new(
pypff_item,
&pypff_message_get_attachment_by_index,
number_of_attachments );
if( sub_items_object == NULL )
{
PyErr_Format(
PyExc_MemoryError,
"%s: unable to create sub items object.",
function );
return( NULL );
}
return( sub_items_object );
}

Thanks in advance, and thanks already for all the great work!

@joachimmetz
Copy link
Member Author

Looks like it is already written in your code ?

I made a start, but this needs to be finalized and tested

@cdeil
Copy link

cdeil commented May 17, 2020

Is there a way to get at the attachment by now?

I'm working for the first time with Outlook PST files, and thanks to your library got to a list of messages and see this:

>>> msg.number_of_attachments
1

Is there any other way to extract all the attachments (Excel files in my case) from the emails in the PST?

@joachimmetz
Copy link
Member Author

Is there a way to get at the attachment by now?

Yes, but currently not with the Python bindings. Use pffexport

@RaulArtigues
Copy link

Hello,

It would be nice to be able to get the categorization of the emails that are in the .PST file.

@joachimmetz
Copy link
Member Author

It would be nice to be able to get the categorization of the emails that are in the .PST file.

Raul first of all don't hijack this issue. Second per side conversation please read up what a PST files is (a MAPI database).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants