Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix saving unicode MAM messages. #1748

Merged
merged 1 commit into from
Mar 13, 2018
Merged

Fix saving unicode MAM messages. #1748

merged 1 commit into from
Mar 13, 2018

Conversation

kzemek
Copy link
Contributor

@kzemek kzemek commented Mar 2, 2018

This PR allows storing and retrieving unicode messages in MAM. It changes a few places to properly handle unicode, as well as sidesteps escaping needed for some databases by using prepared queries also in synchronous writes (previously only in asynchronous)

@kzemek kzemek force-pushed the fix-unicode-mam branch from 2075fee to 25388fb Compare March 2, 2018 14:10
@@ -138,6 +138,8 @@ archive_message(_Result, Host, MessID, RoomID,
try
archive_message_unsafe(Host, MessID, RoomID, FromNick, Packet)
catch _Type:Reason ->
?ERROR_MSG("event=archive_message_failed reason='~p' stacktrace=~p",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want also see MessID, RoomID and FromNick in logs for debugging.

@@ -212,6 +212,8 @@ archive_message(Result, Host, MessID, UserID,
do_archive_message(Result, Host, MessID, UserID,
LocJID, RemJID, SrcJID, Dir, Packet)
catch _Type:Reason ->
?ERROR_MSG("event=archive_message_failed reason='~p' stacktrace=~p",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, extra fields are good to have.

save_unicode_messages(Config) ->
P = ?config(props, Config),
F = fun(Alice, Bob) ->
escalus:send(Alice, escalus_stanza:chat_to(Bob, <<"Hi! this is an unicode character ȥ"/utf8>>)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be information with a link to codepoints.

LATIN SMALL LETTER Z WITH HOOK
http://www.fileformat.info/info/unicode/char/0225/index.htm

P = ?config(props, Config),
F = fun(Alice, Bob) ->
escalus:send(Alice, escalus_stanza:chat_to(Bob, <<"Hi! this is an unicode character ȥ"/utf8>>)),
escalus:send(Alice, escalus_stanza:chat_to(Bob, <<"this is another one ȸ"/utf8>>)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LATIN SMALL LETTER DB DIGRAPH

@kzemek kzemek force-pushed the fix-unicode-mam branch from 25388fb to 0d20253 Compare March 2, 2018 14:59
[Msg3] = respond_messages(Res2),
#forwarded_message{message_body = Body3} = parse_forwarded_message(Msg3),
?assert_equal(<<"this is another one ȸ"/utf8>>, Body3),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to test stuff that requires surrogates when encoded in UTF16.
For example:
𐀀 (Unicode Linear B Syllable B008 A)

unicode:characters_to_binary([65536]).
<<240,144,128,128>>

And smiles
https://en.wikipedia.org/wiki/Emoticons_(Unicode_block)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tested characters are already multibyte, and the fix/test are about unicode characters storage not about (currently not-our-implementation-defined) search capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in mysql emoticons were broken without mb4 encoding (while lower multibyte code points were working just fine), so it's a real case.
It we have a test with them, than we would be sure that it works with all of our backends.

Search is completely different topic...

@fenek fenek added the WIP 🚧 label Mar 5, 2018
@kzemek kzemek force-pushed the fix-unicode-mam branch 3 times, most recently from 2cea6a8 to 9624cdc Compare March 6, 2018 16:42
@kzemek kzemek force-pushed the fix-unicode-mam branch from 9624cdc to 161ea50 Compare March 7, 2018 10:42
@codecov-io
Copy link

Codecov Report

Merging #1748 into master will decrease coverage by 0.03%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1748      +/-   ##
==========================================
- Coverage    74.6%   74.57%   -0.04%     
==========================================
  Files         283      283              
  Lines       26578    26569       -9     
==========================================
- Hits        19829    19814      -15     
- Misses       6749     6755       +6
Impacted Files Coverage Δ
src/mod_mam_riak_timed_arch_yz.erl 87.96% <100%> (ø) ⬆️
src/rdbms/mongoose_rdbms_pgsql.erl 90.9% <100%> (ø) ⬆️
src/rdbms/mongoose_rdbms_odbc.erl 84.78% <40%> (-8.08%) ⬇️
src/mod_mam_odbc_arch.erl 89.34% <75%> (-0.82%) ⬇️
src/mod_mam_muc_odbc_arch.erl 82.12% <80%> (-0.94%) ⬇️
src/mod_mam_utils.erl 87.41% <83.33%> (+0.04%) ⬆️
src/rdbms/mongoose_rdbms_mysql.erl 86.95% <0%> (-4.35%) ⬇️
src/rdbms/mongoose_rdbms.erl 71.22% <0%> (-1.44%) ⬇️
src/ejabberd_c2s.erl 84.86% <0%> (+0.15%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7be65c4...161ea50. Read the comment docs.

@esl esl deleted a comment from fenek Mar 12, 2018
Copy link
Contributor

@arcusfelis arcusfelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

@kzemek kzemek merged commit b076e4a into master Mar 13, 2018
@kzemek kzemek deleted the fix-unicode-mam branch March 13, 2018 10:03
@fenek fenek added this to the 3.0.0 milestone Mar 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants