Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Remove scope attribute from Buffer class #8463

Merged
merged 6 commits into from
Jul 20, 2021

Conversation

masahi
Copy link
Member

@masahi masahi commented Jul 13, 2021

A follow up to #8366. Right now, storage scope information are spread across three components:

  • AttrStmt with attr::storage_scope key
  • PointerType
  • Buffer class
    /*! \brief storage scope of the buffer, if other than global */
    String scope;

After #8366, storage scopes associated with AttrStmt and PointerType are identical. To consolidate storage scope information into one place, I'm proposing to remove storage scope in AttrStmt and Buffer class.

This PR is for the latter refactoring. I removed scope data member from Buffer class and added an alternative way to access the storage scope through its associated buffer variable.

@tqchen @vinx13 @kparzysz-quic @csullivan

also cc @Hzfengsy since the remove field is only used by TensorIR related code

@masahi masahi marked this pull request as ready for review July 14, 2021 10:55
@vinx13 vinx13 self-assigned this Jul 14, 2021
@tqchen
Copy link
Member

tqchen commented Jul 14, 2021

Thanks @masahi . Sorry was being late on this. I think it worthwhile to think a bit more. In general, there are two kinds of information that we normally carry throughout the program:

  • K0: The information in the declaration/allocation site
  • K1: The information that carries as part of type (of the Var) which is being used through out the usage site in the code base.

The Buffer object was intended for the declaration site information. There are in general two kinds of design choices here:

  • C0: On one hand, the information being present in the declaration site K0 can be duplicated with the information present in the allocation site, so we could remove the duplication (for example, dtype and scope are duplicated in the type atm).
  • C1: On the other hand, one could argue that the type annotation should be made independent from the type annotation that also shares with the usage site, so we could let the allocation site contain a duplicated set of information and check consistency during construction.

Right now this PR follows C0 partially. I wonder if we should also consider the choice between C0 and C1. In the particular case of Buffer, I feel that we should give C1 a serious consideration.

We can also use the text format to illustrate the potential differences between the two.

# Choice C0: the information was already duplicated on the lhs, so do not present on rhs
ptr : Pointer[float32, "gpu"] = allocate_buffer(32)

# Choice C1:
ptr : Pointer[float32, "gpu"] = allocate_buffer(32, "float32", "gpu")

@tqchen
Copy link
Member

tqchen commented Jul 14, 2021

cc @jroesch @junrushao1994 to also share some thoughts here.

@jroesch
Copy link
Member

jroesch commented Jul 14, 2021

@tqchen is there more context on what the design/end goal is?

@tqchen
Copy link
Member

tqchen commented Jul 14, 2021

related context PR #8366

@masahi
Copy link
Member Author

masahi commented Jul 14, 2021

@jroesch I've updated the PR description with more details.

@jroesch
Copy link
Member

jroesch commented Jul 14, 2021

@tqchen wrt to C0 vs. C1 is there any world in which these would differ? is the argument that you might not always have the type/lhs information in hand when you need to analyze the call site?

@masahi
Copy link
Member Author

masahi commented Jul 15, 2021

On the other hand, one could argue that the type annotation should be made independent from the type annotation that also shares with the usage site, so we could let the allocation site contain a duplicated set of information and check consistency during construction

I think if duplicated information are supposed to be consistent, then they are already not independent. So I don't see an advantage in keeping track of two essentially the same information.

To @jroesch's question, right now our code base uses two ways to create Buffer:

  1. Via decl_buffer function. Here, a buffer variable (with its PointerType) is constructed at the same time as Buffer. So we can make sure that two storage scopes agree (Although currently we don't pass storage_scope to Buffer constructor).
    https://github.com/apache/tvm/blob/main/src/tir/ir/buffer.cc#L48-L53

  2. Via Buffer constructor directly, using a buffer variable already constructed. https://github.com/apache/tvm/blob/main/src/tir/ir/buffer.cc#L386-L388 Here, it is possible to have a situation where the storage scope recorded in the buffer variable is different from the one in Buffer constructor. In particular, if the former one is empty while the latter one is non trivial scope (shared, local etc), I would say this is a bug in creating the original buffer variable with empty scope. If we want to allow such usage while keeping the two scope consistent, we need to update the scope in buffer variable (which is not simple). Right now we don't have such usage in the code base, so I chose to drop scope argument from the Buffer constructor` to prevent misuse.

One possible middle ground is to keep scope member of Buffer, but drop it from the constructor and instead initialize it with the scope in the buffer variable (data in the code). This might be better in terms of design, in that we also have dtype duplicated.

@tqchen
Copy link
Member

tqchen commented Jul 15, 2021

To clarify, we are moving toward a world where the type annotation in the lhs is always available.

The main thing we want to decide is whether to remove the additional info from the rhs.

Note that if it is the other way around (keep info in the rhs and remove lhs info) it will be less controversial. But in this case we want the info in the lhs so it is available in the future reference pt.

This asymmetry arises because we normally assume the information flows from rhs to lhs in the TIR. It can be a bit weird to infer the allocation type from the pointer type of that holds the allocation, of course they should be made consistent.

I just want us to think carefully and make such choice consistent

@csullivan
Copy link
Contributor

Another consideration is if the Type of a Buffer's Var is not a PointerType, e.g. PrimType, having the scope on the rhs could be necessary. Are there examples of this occurring / do we envision the need?

@kparzysz-quic
Copy link
Contributor

kparzysz-quic commented Jul 15, 2021

We should not duplicate information in the IR.

The flow of information from rhs to lhs is not necessarily the right way to frame it. In an assignment "a = b", there is information present both in a and in b, and the assignment has its own meaning as a whole. Depending on what kind of analysis we want to do, the inference of information can flow either way.

@masahi
Copy link
Member Author

masahi commented Jul 15, 2021

@csullivan As of #8366, all buffer vars should be of pointer type. If that doesn't hold, I'd consider it a bug. This PR has a check

tvm/src/tir/ir/buffer.cc

Lines 316 to 317 in 2128bd4

const auto* ptr_type = (*this)->data->type_annotation.as<PointerTypeNode>();
ICHECK(ptr_type) << "Buffer variable is not of pointer type";
before retrieving the storage scope (and all tests passed).

@masahi
Copy link
Member Author

masahi commented Jul 15, 2021

I'd say if we agree that having storage scope information in the type is a good idea, then we should exploit this information, even if that ends up being going from left to right.

If we want to strictly make the rhs (Buffer declaration) the source of truth, I think we might need to revisit the decision of putting storage information in the type of pointer.

@tqchen
Copy link
Member

tqchen commented Jul 15, 2021

Thanks everyone for sharing the thoghts. First of all, I think we all agree that we should put storage scope in the type, so that the information can flow clearly from to the use site.

On the other hand, there can be certain cases when duplicated information appear, say in the following two assignments, the additional b's type annotation was duplicated because it can be inferred from a, but nevertheless it can also appear in the IR as long as we have clear consistency checks.

a : int = some_value()
b : int = a 

That is why I bought up the C0 and C1 distinguishment. As @kparzysz-quic said, on this particular case the argument can also go the other way if we view the Buffer as the assignment(declaration) as a whole.

So if folks feel strongly that the scope information can be removed, I am not too attached to it. I think we should consider more seriously though if it is about the dtype information, since it is more inherently part of DLTensor spec and having a clear field that is checked consistently would help in most of the case.

@tqchen
Copy link
Member

tqchen commented Jul 20, 2021

Trying to move this convo forward and conclude:

  • I think we can remove the scope from buffer if we felt strongly about it
  • Ideally it would be great to keep the dtype field since it is part of the DLPack protocol

If we all agree, we can go ahead and merge this PR

@tqchen tqchen merged commit 1a1be09 into apache:main Jul 20, 2021
@tqchen
Copy link
Member

tqchen commented Jul 20, 2021

Thanks @masahi @jroesch @kparzysz-quic

ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants