-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix saving unicode MAM messages. #1748
Conversation
src/mod_mam_muc_odbc_arch.erl
Outdated
@@ -138,6 +138,8 @@ archive_message(_Result, Host, MessID, RoomID, | |||
try | |||
archive_message_unsafe(Host, MessID, RoomID, FromNick, Packet) | |||
catch _Type:Reason -> | |||
?ERROR_MSG("event=archive_message_failed reason='~p' stacktrace=~p", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want also see MessID, RoomID and FromNick in logs for debugging.
src/mod_mam_odbc_arch.erl
Outdated
@@ -212,6 +212,8 @@ archive_message(Result, Host, MessID, UserID, | |||
do_archive_message(Result, Host, MessID, UserID, | |||
LocJID, RemJID, SrcJID, Dir, Packet) | |||
catch _Type:Reason -> | |||
?ERROR_MSG("event=archive_message_failed reason='~p' stacktrace=~p", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, extra fields are good to have.
save_unicode_messages(Config) -> | ||
P = ?config(props, Config), | ||
F = fun(Alice, Bob) -> | ||
escalus:send(Alice, escalus_stanza:chat_to(Bob, <<"Hi! this is an unicode character ȥ"/utf8>>)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be information with a link to codepoints.
LATIN SMALL LETTER Z WITH HOOK
http://www.fileformat.info/info/unicode/char/0225/index.htm
P = ?config(props, Config), | ||
F = fun(Alice, Bob) -> | ||
escalus:send(Alice, escalus_stanza:chat_to(Bob, <<"Hi! this is an unicode character ȥ"/utf8>>)), | ||
escalus:send(Alice, escalus_stanza:chat_to(Bob, <<"this is another one ȸ"/utf8>>)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LATIN SMALL LETTER DB DIGRAPH
[Msg3] = respond_messages(Res2), | ||
#forwarded_message{message_body = Body3} = parse_forwarded_message(Msg3), | ||
?assert_equal(<<"this is another one ȸ"/utf8>>, Body3), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to test stuff that requires surrogates when encoded in UTF16.
For example:
𐀀 (Unicode Linear B Syllable B008 A)
unicode:characters_to_binary([65536]).
<<240,144,128,128>>
And smiles
https://en.wikipedia.org/wiki/Emoticons_(Unicode_block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tested characters are already multibyte, and the fix/test are about unicode characters storage not about (currently not-our-implementation-defined) search capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in mysql emoticons were broken without mb4 encoding (while lower multibyte code points were working just fine), so it's a real case.
It we have a test with them, than we would be sure that it works with all of our backends.
Search is completely different topic...
2cea6a8
to
9624cdc
Compare
Codecov Report
@@ Coverage Diff @@
## master #1748 +/- ##
==========================================
- Coverage 74.6% 74.57% -0.04%
==========================================
Files 283 283
Lines 26578 26569 -9
==========================================
- Hits 19829 19814 -15
- Misses 6749 6755 +6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
This PR allows storing and retrieving unicode messages in MAM. It changes a few places to properly handle unicode, as well as sidesteps escaping needed for some databases by using prepared queries also in synchronous writes (previously only in asynchronous)