Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serializer without tmp alloc. #5451

Closed
wants to merge 6 commits into from
Closed

Conversation

youngsofun
Copy link
Member

@youngsofun youngsofun commented May 18, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

  • verified with nullable and numbers,string
  • bench: index vs iterator
  • some refactor
  • block writer interface
  • other common types:, bool, date ..

current serializer return Vec of tmp strings.
arrow2 reuse buf in streaming iterator.
https://github.com/jorgecarleitao/arrow2/blob/47edf30128/src/io/csv/write/serialize.rs

the first commit is to findout a way to adopting streaming-iterator to our datavalues( especially for nullable column).

this pr reported a improvement of 20% -25%
jorgecarleitao/arrow2#382
need bench to know how much faster it will be for us.

Changelog

  • Performance Improvement

Related Issues

Fixes #issue

@vercel
Copy link

vercel bot commented May 18, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) May 24, 2022 at 2:50AM (UTC)

@mergify
Copy link
Contributor

mergify bot commented May 18, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@youngsofun youngsofun marked this pull request as draft May 18, 2022 15:23
&self,
column: &'a ColumnRef,
format: &FormatSettings,
) -> Result<Box<dyn StreamingIterator<Item = [u8]> + 'a>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this dyn iterator make virtual call next slow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's ok, it's more worthy than tmp alloc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little worried about it too, not have much experience of it, let me do a simple bench for comparing first

Copy link
Member Author

@youngsofun youngsofun May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple bench with Criterion show 4 to 5 times faster for not nullable i32
https://gist.github.com/youngsofun/75c6420bf51fe4c63e19aa303d2e4f9d
@sundy-li

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can't avoid virtual call, how about using row based API?

Refer to clickhouse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need another bench for the performance.

Copy link
Member Author

@youngsofun youngsofun May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundy-li ok, I`ll try it later, worked on a bugfix today, almost done

Copy link
Member Author

@youngsofun youngsofun May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundy-li
the output of the current bench:
https://gist.github.com/youngsofun/b7a2d3629e53aa6d6cdfb72d82a8a727

fn write_csv_field(&self, column, row_num, &mut buf) is faster then stream_iterator for short columns, but a little slow for long ones. I think both are ok.

only test int32 today, tided up the code, will go on with nullable and string tomorrow.
stream_iterator needs an extra copy from tmp buf inside iterator, may slow down when the string is long, but may not be too bad if it is in cache?

let v = col.get_data_owned(row_num);
lexical_to_bytes_mut_no_clear(v, buf);
Ok(())
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundy-li is this what you expected? can it be faster?

Copy link
Member

@sundy-li sundy-li May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might be faster I think. With row-based API we could write reusable codes without extra allocation. Need a bench to verify it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use Criterion::default().with_profiler(perf::FlamegraphProfiler::new(100); to have the flamegraph, thus we can get the bottle neck.

@youngsofun
Copy link
Member Author

youngsofun commented May 22, 2022


  • i32 nullable using index is slow, maybe because multi dyn call each row

lines

lines
lines

  • str[1000]: copy dominates so
    1. nullable is faster because no copy for 10% rows
    2. iter is slow for one more copy

lines

@youngsofun
Copy link
Member Author

youngsofun commented May 22, 2022

2^10, 2^12, 2^14

lines
lines
lines

lines

@youngsofun
Copy link
Member Author

youngsofun commented May 22, 2022

shortcoming

  • by iter
    • has one more copy,
    • is slow for columns of #row = 1024, 2048, not sure why
  • by index
    • dyn call for inner columns,may be more critical for containers like Array[Int]?

replace ? with unwrap can make func notebly faster

buf: &mut Vec<u8>,
_format: &FormatSettings,
) -> Result<()> {
let col: &<Vec<u8> as Scalar>::ColumnType = unsafe { Series::static_cast(&column) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&StringColumn is ok

@sundy-li
Copy link
Member

Great result of benchmarks, it seems row based API is more simple & efficient than iter API.

I just came up with another approach, we can hold a column reference inside the serializer (eg: &[i32] for ColumnI32), it might be faster than others ?

@youngsofun
Copy link
Member Author

Great result of benchmarks, it seems row based API is more simple & efficient than iter API.

I just came up with another approach, we can hold a column reference inside the serializer (eg: &[i32] for ColumnI32), it might be faster than others ?

ok, I`ll try it later.

@youngsofun
Copy link
Member Author

youngsofun commented May 23, 2022

lines
lines
lines

lines

@youngsofun
Copy link
Member Author

@sundy-li
Copy link
Member

detail gist.github.com/youngsofun/d1b9cf77418a1783f549f1339beb21fa

Seems the bottleneck is converting an integer to String, I would like to choose index-row-based API, what do you think ?

@youngsofun
Copy link
Member Author

youngsofun commented May 24, 2022

embeded(i.e. hold a column reference inside the serializer ) seems much faster in some common cases

>>> 25.993 / 16.196
1.6049024450481597
i32/i32(not_nullable)_index/1048576
                        time:   [17.058 ms 17.084 ms 17.112 ms]
i32/i32(not_nullable)_embedded/1048576
                        time:   [13.474 ms 13.497 ms 13.522 ms]

i32/i32(null=0.1)_index/1048576
                        time:   [25.957 ms 25.993 ms 26.034 ms]
i32/i32(null=0.1)_embedded/1048576
                        time:   [16.173 ms 16.196 ms 16.222 ms]

str[10]/str[10](not_nullable)_index/1048576
                        time:   [2.6830 ms 2.9024 ms 3.1395 ms]
str[10]/str[10](not_nullable)_embedded/1048576
                        time:   [1.8445 ms 1.8568 ms 1.8691 ms]

str[10]/str[10](null=0.1)_index/1048576
                        time:   [1.7512 ms 1.7602 ms 1.7694 ms]
str[10]/str[10](null=0.1)_embedded/1048576
                        time:   [1.7815 ms 1.7941 ms 1.8069 ms]

even worse for short column

i32/i32(null=0.1)_index/4096
                        time:   [78.906 us 79.127 us 79.402 us]
i32/i32(null=0.1)_embedded/4096
                        time:   [37.734 us 38.204 us 38.732 us]

@youngsofun
Copy link
Member Author

youngsofun commented May 24, 2022

agree that row-based is simpler than stream iterator, and is enough for serialization.

maybe we need another view type that contains both internal contents of the DataType and a Column obj and can access them by index without

  1. cast
  2. index check
  3. dyn call

and this type can impl serialization interface.

@youngsofun
Copy link
Member Author

replace ? with unwrap, compare index vs embeded
https://gist.github.com/youngsofun/e33f3a8e0d1fbd8882c74f4b7a7d2131

@youngsofun
Copy link
Member Author

youngsofun commented May 24, 2022

@sundy-li do you know why not-embeded is much slower than embeded for int32 but not for string?

https://gist.github.com/youngsofun/e33f3a8e0d1fbd8882c74f4b7a7d2131

@youngsofun
Copy link
Member Author

@sundy-li do you know why not-embeded is much slower than embeded for int32 but not for string?
my fault, there is bug in create_string_column

@youngsofun
Copy link
Member Author

new pr #5791

@youngsofun youngsofun closed this Jun 6, 2022
@youngsofun youngsofun deleted the format branch November 16, 2022 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants