Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
abrichr committed Apr 17, 2023
1 parent 7969e98 commit 44cafc4
Showing 1 changed file with 57 additions and 20 deletions.
77 changes: 57 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
# PuterBot: GUI Process Automation with Transformers

Welcome to PuterBot: GUI Process Automation with Transformers! We are working with a dataset of user input events, screenshots, and window events. Our task is to generate the appropriate InputEvent(s) based on the previously recorded InputEvents and associated Screenshots, such that the task in the recording is accomplished, while accounting for differences in screen resolution, window size, application behavior, etc.
Welcome to PuterBot: GUI Process Automation with Transformers! This library
implements AI-First Process Automation for GUI applications. It:

- Records screenshots and associated user input,
- Converts screenshots and user input into tokenized format,
- Feeds tokenized screenshots and user input into transformer models
- Converts transformer output into replayable input events

The goal is similar to that of Robotic Process Automation, except that we use
transformers instead of RPA tools.

The approach is similar to [adept.ai](https://adept.ai/), except that instead
of requiring the user to prompt the model directly, we prompt it behind the
scenes by watching the user's activities.

## Setup

Expand Down Expand Up @@ -64,55 +77,79 @@ More ReplayStrategies coming soon! (see [Contributing](#Contributing)).

### Problem Statement

Given a new Screenshot, we want to generate the appropriate InputEvent(s) based on the previously recorded InputEvents in order to accomplish the task specified in the `Recording.task_description`. Each Screenshot is taken immediately before its associated InputEvent. InputEvents contain raw mouse and keyboard data which have been aggregated to remove unnecessary events.
Our goal is to automate the task described and demonstrated in a `Recording`.
That is, given a new Screenshot, we want to generate the appropriate
InputEvent(s) based on the previously recorded InputEvents in order to
accomplish the task specified in the `Recording.task_description`, while
accounting for differences in screen resolution, window size, application
behavior, etc.

### Dataset

The dataset consists of the following entities:
1. `Recording`: Contains information about the screen dimensions, platform, and other metadata.
2. `InputEvent`: Represents a user input event such as a mouse click or key press. Each InputEvent has an associated Screenshot taken immediately before the event.
3. `Screenshot`: Contains the PNG data of a screenshot taken during the recording.
4. `WindowEvent`: Represents a window event such as a change in window title, position, or size.
1. `Recording`: Contains information about the screen dimensions, platform, and
other metadata.
2. `InputEvent`: Represents a user input event such as a mouse click or key
press. Each `InputEvent` has an associated `Screenshot` taken immediately
before the event occurred. `InputEvent`s are aggregated to remove
unnecessary events (see [visualize][#visualize].)
3. `Screenshot`: Contains the PNG data of a screenshot taken during the
recording.
4. `WindowEvent`: Represents a window event such as a change in window title,
position, or size.

You can assume that you have access to the following functions:
- `get_recording()`: Gets the latest recording.
- `get_events(recording)`: Returns a list of `InputEvent` objects for the given recording.
- `get_latest_recording()`: Gets the latest recording.
- `get_events(recording)`: Returns a list of `InputEvent` objects for the given
recording.

### Instructions

1. Fork this repository and clone it to your local machine.
2. Get puterbot up and running by following the instructions under [Setup](#Setup).
3. Implement a Python function `generate_input_event(new_screenshot, recording)`, where:
- `new_screenshot`: A `Screenshot` object representing the new screenshot.
- `recording`: A `Recording` whose `.screenshots` property is a list of `InputEvent` objects from a previous recording, with each InputEvent having an associated Screenshot.
- This function should return a new `InputEvent` object that can be used to replay the recording, taking into account differences in screen resolution, window size, and/or application behavior.
3. Create a new file under `strategies` to contain your replay strategy. You
may base your implementation off of `naive.py`.
4. Write unit tests for your implementation.

See https://github.com/MLDSAI/puterbot/issues for ideas on where to start.

Instead of Segment Anything, you can also try converting screenshots to text via ASCII art, e.g. https://github.com/LeandroBarone/python-ascii_magic.

### Evaluation Criteria

Your submission will be evaluated based on the following criteria:

1. **Functionality** : Your implementation should correctly generate the new `InputEvent` objects that can be replayed in order to accomplish the task in the original recording.
1. **Functionality** : Your implementation should correctly generate the new
`InputEvent` objects that can be replayed in order to accomplish the task in
the original recording.

2. **Code Quality** : Your code should be well-structured, clean, and easy to understand.
2. **Code Quality** : Your code should be well-structured, clean, and easy to
understand.

3. **Scalability** : Your solution should be efficient and scale well with large datasets.
3. **Scalability** : Your solution should be efficient and scale well with
large datasets.

4. **Testing** : Your tests should cover various edge cases and scenarios to ensure the correctness of your implementation.
4. **Testing** : Your tests should cover various edge cases and scenarios to
ensure the correctness of your implementation.

### Submission

1. Commit your changes to your forked repository.

2. Create a pull request to the original repository with your changes.

3. In your pull request, include a brief summary of your approach, any assumptions you made, and how you integrated external libraries.
3. In your pull request, include a brief summary of your approach, any
assumptions you made, and how you integrated external libraries.

4. *Bonus*: interacting with ChatGPT and/or other language transformer models
in order to generate code and/or evaluate design decisions is encouraged. If
you choose to do so, please include the full transcript.


## We're hiring!

If you're interested in getting paid for your work, please address one or more
of the issues labelled "Internship" (full-time hires will also be considered.)

4. *Bonus*: interacting with ChatGPT and/or other language transformer models in order to generate code and/or evaluate design decisions is encouraged. If you choose to do so, please include the full transcript.
https://github.com/MLDSAI/puterbot/issues?q=is%3Aissue+is%3Aopen+label%3AInternship

## Troubleshooting

Expand Down

0 comments on commit 44cafc4

Please sign in to comment.