Rule-based behaviour simulation powered by assisted learning #212

taras · 2022-08-28T21:16:56Z

taras
Aug 28, 2022
Maintainer

The ideals described in this write-up come from a nagging feeling that I’ve had about simulation for a while. We observed on several occasions that implementing a simulator could be a source of obstacles that can make implementing and maintaining the simulator a time-consuming and challenging activity. We saw this in two situations. The first was the complexity of creating related relationships that we observed in our work with Mirage. The second was the need for simulation-specific resolvers that we observed in our design of the GiraffeQL simulator.

We resolved the first problem by identifying a human-friendly way of describing relationships and creating an API that eliminated the need for writing implementation. The imperative method of constructing relationships that instructed the computer on how to wire up those relationships was replaced with a declarative human-friendly description of the relationship. For those unfamiliar with the origin of GraphGen probabilistic data generation, I’ll briefly summarize the difference between imperative and declarative approaches. The imperative approach required writing code that would manually connect related data. For example, if you wanted to create an author with three articles, you would have to write code that created three articles when creating an author. If we wanted to model having some authors with no articles, others with some articles, and a few with many, we’d need to write code to create each author. This approach to creating relationships doesn’t scale well to large data models. Writing code to describe different scenarios was too consuming to the point where the effort of wiring up the relationships became a barrier to writing complex test cases that needed data in various scenarios that could be found in real life. The solution to this problem was to introduce a way of describing relationships as a probability. With the probabilistic approach to creating relationships, we can say that 50% of authors have 1-3 articles, 30% have 0 articles, and 20% have more than ten articles. Using this description, we can automatically construct data representing an author based on this probability. With this system, there is a 20% chance that we’ll generate an author with ten or more articles. The system chooses which author will be one of the lucky ones based on the probability. Probabilistic data generation eliminates the need for writing the implementation to build this data, allowing the system to generate the data automatically.

How could we achieve a similar level of transformation when dealing with the need for writing simulation-specific resolvers? We can start by describing the attributes of the solution. The solution should have the following attributes: it should prioritize human understanding over machine instructions and eliminate the need for writing implementation. What is a resolver to a human? For a human, the resolver describes the behavior of an operation. For example, a resolver for a mutation that deletes a record usually deletes a record from the database. An imperative implementation of that resolver would delete a record from the database, but we do not need the actual implementation because that’s the concern of the machine. From a human perspective, we only care about the outcomes of that deletion. One of the outcomes is that the record is removed from the database, but we care that the system no longer returns that object if we attempt to retrieve it after it was deleted. As humans, we care that the system behaves as if the object was deleted rather than deleted.

I believe that we can design a way of describing the behavior of the resolver that we can use to generate simulators that contain a set of behaviors rather than their implementation. We can then reach machines to recognize the outcomes of deletion so they can create simulators exhibiting signs of those imperative operations. What does this look like? I don’t know the specific syntax, but we could create a set of abstract rules that represent certain behavior. For example, an abstract rule for deletion would have the observable quality of that record no longer being returned when queried directly or in a setting where it was previously observed. We could apply this rule set to any deletion mutation.

For example, if we have a mutation called deleteArticle(id) it returns true on success or error on failure. For such a mutation, we would automatically generate the following rules.

[delete Article id]: true
  - { article(id) } return { article: null }
  - { articles } returns { article: [] } without deleted article

^^ this would be something we’d generate for that mutation and make it editable by the user. The user should be able to modify this default behavior if necessary by adding a user observable outcome.

There will be cases where the generated rule is incorrect, but a human can read these to add the missing information. For example, if the user knows that an article can only be deleted by it’s owner, then they can modify the rules to introduce this knowledge. It might look something like this,

[delete Article id]: 
  - if { article(id) { owner { id == currentUser } }: true
      # in case it’ll follow 
      [delete Article id]: true rule
  - if { article(id) { owner { id != currentUser } }: Error
     # nothing is changed
  - { article(id) } return { article }
       - { articles } returns { article: [] } with article that attempted to be deleted

This last part is very hand-wavy, but it points to the fact that we have implicit rules that we do not think about. Surfacing these rules will be necessary for the system to understand how to should behave. In this case, still returning the value after the failed operation is observable behavior of the system after the failure.

We will need to be able to execute these rules to verify the behavior of the system. I believe that making these rules available can make simulations useful to describe the system’s design before implementation. Once we have the implementation, we can use these rules to verify the implementation.

The trick to making this system useful is that the system has to be able to learn from the feedback provided by the developer. Once it learns that these kinds of things are possible, then we can use them to generate tests to verify different cases that the developer didn’t think about.

cowboyd · 2022-08-29T16:11:01Z

cowboyd
Aug 29, 2022
Maintainer

I like the idea, and it reminds me of logic/constraints programming. It's worth looking into a solution that could support that paradigm.

1 reply

taras Aug 29, 2022
Maintainer Author

That does sound a lot like it. The constraints themselves would be very helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule-based behaviour simulation powered by assisted learning #212

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Rule-based behaviour simulation powered by assisted learning #212

taras Aug 28, 2022 Maintainer

Replies: 1 comment · 1 reply

cowboyd Aug 29, 2022 Maintainer

taras Aug 29, 2022 Maintainer Author

taras
Aug 28, 2022
Maintainer

Replies: 1 comment 1 reply

cowboyd
Aug 29, 2022
Maintainer

taras Aug 29, 2022
Maintainer Author