Skip to content

Commit

Permalink
add a parallel manifesto
Browse files Browse the repository at this point in the history
  • Loading branch information
willow-ahrens committed Feb 6, 2025
1 parent a8c2eb1 commit 2289266
Show file tree
Hide file tree
Showing 2 changed files with 99 additions and 0 deletions.
62 changes: 62 additions & 0 deletions docs/src/docs/internals/parallel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Parallel Processing in Finch

## Modelling the Architecture

Finch uses a simple, hierarchical representation of devices and tasks to model
different kind of parallel processing. An [`AbstractDevice`](@ref) is a physical or
virtual device on which we can execute tasks, which may each be represented by
an [`AbstractTask`](@ref).

```@docs
AbstractTask
AbstractDevice
```

The current task in a compilation context can be queried with
[`get_task`](@ref). Each device has a set of numbered child
tasks, and each task has a parent task.

```@docs
get_num_tasks
get_task_num
get_device
get_parent_task
```

## Data Movement

Before entering a parallel loop, a tensor may reside on a single task, or
represent a single view of data distributed across multiple tasks, or represent
multiple separate tensors local to multiple tasks. A tensor's data must be
resident in the current task to process operations on that tensor, such as loops
over the indices, accesses to the tensor, or `declare`, `freeze`, or `thaw`.
Upon entering a parallel loop, we must transfer the tensor to the tasks
where it is needed. Upon exiting the parallel loop, we may need to combine
the data from multiple tasks into a single tensor.

There are two cases, depending on whether the tensor is declared outside the
parallel loop or is a temporary tensor declared within the parallel loop.

If the tensor is a temporary tensor declared within the parallel loop, we call
`bcast` to broadcast the tensor to all tasks.

If the tensor is declared outside the parallel loop, we call `scatter` to
send it to the tasks where it is needed. Note that if the tensor is in `read` mode,
`scatter` may simply `bcast` the entire tensor to all tasks. If the device has global
memory, `scatter` may also be a no-op. When the parallel loop is exited, we call
`gather` to reconcile the data from multiple tasks back into a single tensor.

Each of these operations begins with a `_send` variant on one task, and
finishes with a `_recv` variant on the recieving task.

```@docs
bcast
bcast_send
bcast_recv
scatter
scatter_send
scatter_recv
gather
gather_send
gather_recv
```
37 changes: 37 additions & 0 deletions src/architecture.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,45 @@
"""
AbstractDevice
A datatype representing a device on which tasks can be executed.
"""
abstract type AbstractDevice end
abstract type AbstractVirtualDevice end

"""
AbstractTask
An individual processing unit on a device, responsible for running code.
"""
abstract type AbstractTask end
abstract type AbstractVirtualTask end

"""
get_num_tasks(dev::AbstractDevice)
Return the number of tasks on the device dev.
"""
function get_num_tasks end
"""
get_task_num(task::AbstractTask)
Return the task number of `task`.
"""
function get_task_num end
"""
get_device(task::AbstractTask)
Return the device that `task` is running on.
"""
function get_device end

"""
get_parent_task(task::AbstractTask)
Return the task which spawned `task`.
"""
function get_parent_task end

"""
aquire_lock!(dev::AbstractDevice, val)
Expand Down

0 comments on commit 2289266

Please sign in to comment.