Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamMapIterator (was asyncmap) #15058

Merged
merged 2 commits into from
Mar 9, 2016
Merged

Conversation

samoconnor
Copy link
Contributor

First step in Simplifying and generalising pmap (as per #14843).

Please limit comments on this PR to technical / implementation detail relating to the code in the PR.

Please make comments relating the interface and the overall pmap refactoring in issue #14843.


In-place version of [`amap`](:func:`amap`).
"""
function amap! end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have an actual signature so the docstring gets disambiguated.

@StefanKarpinski
Copy link
Member

These functions need clearer names. The pmap function gets a pass because that's a fairly standard distributed computing name.

@kshyatt kshyatt added packages Package management and loading parallelism Parallel or distributed computation and removed packages Package management and loading labels Feb 13, 2016
@samoconnor samoconnor changed the title imap and amap asyncmap Feb 14, 2016
@samoconnor
Copy link
Contributor Author

These functions need clearer names. The pmap function gets a pass because that's a fairly standard distributed computing name.

OK (the original names were simply an attempt to follow the pmap pattern).

@bjarthur suggested in #14843:

... get rid of amap and amap! and rename the proposed imap to amap. to get a collection, the user would have to manually say collect(amap), just like they have to do for a dict's keys() and values().

This seems sensible to me (shorthand for collect(asyncmap()) could still be added later if that was seen as useful).

This PR now adds only one function:

asyncmap(f, c...; ntasks=100) -> iterator

Apply f to each element of c using at most 100 asynchronous tasks. For multiple collection arguments, apply f elementwise. Note: collect(asyncmap(f, c...; ntasks=1)) is equivalent to map(f, c...).

@samoconnor
Copy link
Contributor Author

bump?

@amitmurthy
Copy link
Contributor

Looks good. While it is not exported, can we have a different name for asyncmap since it is returning an iterator and not a mapped collection? Especially with both map and pmap returning mapped collections.

#
#-------------------------------------------------------------------------------

import Base: start, done, next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required?

@samoconnor
Copy link
Contributor Author

... a different name for asyncmap since it is returning an iterator and not a mapped collection?

What about just calling it StreamMapIterator instead of having asyncmap(f, c...; ntasks=0) = StreamMapIterator(f, c...; ntasks=ntasks) ?

(I think map() should return an iterator to be consistent with zip(), keys(), filter() etc, but that is a seperate discussion.)

@amitmurthy
Copy link
Contributor

That sounds right. BTW, filter does return a collection, but I too was initially thrown off by zip and keys returning iterators.

samoconnor pushed a commit to samoconnor/julia that referenced this pull request Mar 4, 2016
@samoconnor
Copy link
Contributor Author

filter returns an iterator if the input is an iterator...

julia> typeof(filter(i->true, keys(Dict(1=>1))))
Filter{Function,Base.KeyIterator{Dict{Int64,Int64}}}

julia> typeof(filter(i->true, [1,2,3]))
Array{Int64,1}

(to me that seems really weird because you sometimes need to do collect(filter(f, c)) and sometimes not so you have to check the type of c before calling filter)

@bjarthur
Copy link
Contributor

bjarthur commented Mar 4, 2016

@samoconnor
Copy link
Contributor Author

@bjarthur:

functions are supposed to be lowercase according to the style conventions:

StreamMapIterator is a type ("type names use capitalization and camel case" ).

also, why no exports?

because #14843 (comment)

@amitmurthy
Copy link
Contributor

@StefanKarpinski, your thoughts?

@amitmurthy
Copy link
Contributor

I'll merge this in a day if there are no objections. It does not add any exports and will enable enhancing pmap to be used in place of code similar to

@sync for x in y
  @async f(x)
end


`collect(MapIterator(f, c...))` is equivalent to `map(f, c...)`
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no extra newlines after docstrings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a hard requirement to make the docstrings machine-readable? or is it an issue of style?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's not required to get docstrings to be associated with the following function, but stylistically it makes the association less obvious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any amount of whitespace, as well as commented lines, will get ignored. But as Tony mentions, stylistically it isn't quite as obvious. Maybe worth a mention in #15136.

Also missing a fullstop as the end of the line.

@bjarthur
Copy link
Contributor

bjarthur commented Mar 7, 2016

@amitmurthy when/who will enhance pmap thusly? i still feel that the full change to the pmap API should happen in one PR.

@samoconnor samoconnor changed the title asyncmap StreamMapIterator (was asyncmap) Mar 7, 2016
@amitmurthy
Copy link
Contributor

Both AV tests timed out. Known issue? How do I restart AV builds?

@samoconnor
Copy link
Contributor Author

@amitmurthy looks like AV is happy now.

amitmurthy added a commit that referenced this pull request Mar 9, 2016
StreamMapIterator (was asyncmap)
@amitmurthy amitmurthy merged commit 039f57b into JuliaLang:master Mar 9, 2016
@JeffBezanson
Copy link
Member

The MapIterator type is a near-exact duplicate of the Generator type that was already added. I would be ok with renaming Generator to MapIterator if folks prefer that name.

@tkelman
Copy link
Contributor

tkelman commented Mar 11, 2016

The Generator name is better. We could rename the others here AsyncGenerator and StreamGenerator if it's really the exact same underlying thing.

@amitmurthy
Copy link
Contributor

Yes, the Generator name is definitely better. @samoconnor will you take a shot at merging the implementations? Else, I can do it.

@samoconnor
Copy link
Contributor Author

As far as I can see the only difference is that MapIterator follows the map pattern of taking multiple collections c... and using zip(c...) to iterate over all of them in parallel.

If this was added to Generator you'd have:

immutable Generator{I,F}
    f::F
    iter::I
end

Generator(f, c...) = Generator(f, zip(c...))

start(g::Generator) = start(g.iter)
done(g::Generator, s) = done(g.iter, s)
function next(g::Generator, s)
    v, s2 = next(g.iter, s)
    g.f(v...), s2
end

@amitmurthy
Copy link
Contributor

Yes, that is the right thing do do. And if we are going with the Generator name, also move AsyncGenerator and StreamGenerator implementations to generator.jl

@JeffBezanson
Copy link
Member

I find it a bit odd to make the iterator wrapped by Generator always a Zip. It makes it harder to examine what the generator is iterating over.

@amitmurthy
Copy link
Contributor

Generator(f, c1, c...) = Generator(f, zip(c1, c...)) then? Only for mapping functions that take multiple arguments. Like pmap.

@JeffBezanson
Copy link
Member

Yes, I think

Generator(f, c1, c...) = Generator(a->f(a...), zip(c1, c...))

is probably ok.

@samoconnor
Copy link
Contributor Author

The thing is that g.f(v...), s2 depends on v being a Tuple.
So, if you're going to special-case single collection, you need a special-case next as well.

@JeffBezanson
Copy link
Member

With the definition I wrote above g.f(v) in next is sufficient.

@samoconnor
Copy link
Contributor Author

Ah, yes, our messages crossed somewhere under the sea near Honolulu.

@JeffBezanson
Copy link
Member

Light is so slow.

@samoconnor
Copy link
Contributor Author

Should I roll a change along these lines into #15409? Or will someone with commit access just do it without a PR?

@JeffBezanson
Copy link
Member

I'll change it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants