update README

OpenAdaptAI · Apr 17, 2023 · 44cafc4 · 44cafc4
1 parent 7969e98
commit 44cafc4
Showing 1 changed file with 57 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,19 @@
 # PuterBot: GUI Process Automation with Transformers
 
-Welcome to PuterBot: GUI Process Automation with Transformers! We are working with a dataset of user input events, screenshots, and window events. Our task is to generate the appropriate InputEvent(s) based on the previously recorded InputEvents and associated Screenshots, such that the task in the recording is accomplished, while accounting for differences in screen resolution, window size, application behavior, etc.
+Welcome to PuterBot: GUI Process Automation with Transformers! This library
+implements AI-First Process Automation for GUI applications. It:
+
+- Records screenshots and associated user input,
+- Converts screenshots and user input into tokenized format,
+- Feeds tokenized screenshots and user input into transformer models
+- Converts transformer output into replayable input events
+
+The goal is similar to that of Robotic Process Automation, except that we use
+transformers instead of RPA tools.
+
+The approach is similar to [adept.ai](https://adept.ai/), except that instead
+of requiring the user to prompt the model directly, we prompt it behind the
+scenes by watching the user's activities.
 
 ## Setup
 
@@ -64,55 +77,79 @@ More ReplayStrategies coming soon! (see [Contributing](#Contributing)).
 
 ### Problem Statement
 
-Given a new Screenshot, we want to generate the appropriate InputEvent(s) based on the previously recorded InputEvents in order to accomplish the task specified in the `Recording.task_description`. Each Screenshot is taken immediately before its associated InputEvent. InputEvents contain raw mouse and keyboard data which have been aggregated to remove unnecessary events.
+Our goal is to automate the task described and demonstrated in a `Recording`.
+That is, given a new Screenshot, we want to generate the appropriate
+InputEvent(s) based on the previously recorded InputEvents in order to
+accomplish the task specified in the `Recording.task_description`, while
+accounting for differences in screen resolution, window size, application
+behavior, etc.
 
 ### Dataset
 
 The dataset consists of the following entities: 
-1. `Recording`: Contains information about the screen dimensions, platform, and other metadata. 
-2. `InputEvent`: Represents a user input event such as a mouse click or key press. Each InputEvent has an associated Screenshot taken immediately before the event. 
-3. `Screenshot`: Contains the PNG data of a screenshot taken during the recording. 
-4. `WindowEvent`: Represents a window event such as a change in window title, position, or size.
+1. `Recording`: Contains information about the screen dimensions, platform, and
+   other metadata.
+2. `InputEvent`: Represents a user input event such as a mouse click or key
+   press. Each `InputEvent` has an associated `Screenshot` taken immediately
+   before the event occurred. `InputEvent`s are aggregated to remove
+   unnecessary events (see [visualize][#visualize].)
+3. `Screenshot`: Contains the PNG data of a screenshot taken during the
+   recording.
+4. `WindowEvent`: Represents a window event such as a change in window title,
+   position, or size.
 
 You can assume that you have access to the following functions: 
-- `get_recording()`: Gets the latest recording. 
-- `get_events(recording)`: Returns a list of `InputEvent` objects for the given recording.
+- `get_latest_recording()`: Gets the latest recording.
+- `get_events(recording)`: Returns a list of `InputEvent` objects for the given
+  recording.
 
 ### Instructions
 
 1. Fork this repository and clone it to your local machine. 
 2. Get puterbot up and running by following the instructions under [Setup](#Setup).
-3. Implement a Python function `generate_input_event(new_screenshot, recording)`, where:
-- `new_screenshot`: A `Screenshot` object representing the new screenshot. 
-- `recording`: A `Recording` whose `.screenshots` property is a list of `InputEvent` objects from a previous recording, with each InputEvent having an associated Screenshot.
-- This function should return a new `InputEvent` object that can be used to replay the recording, taking into account differences in screen resolution, window size, and/or application behavior.
+3. Create a new file under `strategies` to contain your replay strategy. You
+may base your implementation off of `naive.py`.
 4. Write unit tests for your implementation.
 
 See https://github.com/MLDSAI/puterbot/issues for ideas on where to start.
 
-Instead of Segment Anything, you can also try converting screenshots to text via ASCII art, e.g. https://github.com/LeandroBarone/python-ascii_magic.
-
 ### Evaluation Criteria
 
 Your submission will be evaluated based on the following criteria: 
 
-1. **Functionality** : Your implementation should correctly generate the new `InputEvent` objects that can be replayed in order to accomplish the task in the original recording.
+1. **Functionality** : Your implementation should correctly generate the new
+   `InputEvent` objects that can be replayed in order to accomplish the task in
+   the original recording.
 
-2. **Code Quality** : Your code should be well-structured, clean, and easy to understand. 
+2. **Code Quality** : Your code should be well-structured, clean, and easy to
+   understand.
 
-3. **Scalability** : Your solution should be efficient and scale well with large datasets. 
+3. **Scalability** : Your solution should be efficient and scale well with
+   large datasets.
 
-4. **Testing** : Your tests should cover various edge cases and scenarios to ensure the correctness of your implementation.
+4. **Testing** : Your tests should cover various edge cases and scenarios to
+   ensure the correctness of your implementation.
 
 ### Submission
 
 1. Commit your changes to your forked repository.
 
 2. Create a pull request to the original repository with your changes.
 
-3. In your pull request, include a brief summary of your approach, any assumptions you made, and how you integrated external libraries.
+3. In your pull request, include a brief summary of your approach, any
+   assumptions you made, and how you integrated external libraries.
+
+4. *Bonus*: interacting with ChatGPT and/or other language transformer models
+   in order to generate code and/or evaluate design decisions is encouraged. If
+   you choose to do so, please include the full transcript.
+
+
+## We're hiring!
+
+If you're interested in getting paid for your work, please address one or more
+of the issues labelled "Internship" (full-time hires will also be considered.)
 
-4. *Bonus*: interacting with ChatGPT and/or other language transformer models in order to generate code and/or evaluate design decisions is encouraged. If you choose to do so, please include the full transcript.
+https://github.com/MLDSAI/puterbot/issues?q=is%3Aissue+is%3Aopen+label%3AInternship
 
 ## Troubleshooting