OpenPAI is an open source platform that provides complete AI model training and resource management capabilities, it is easy to extend and supports on-premise, cloud and hybrid environments in various scale.
- When to consider OpenPAI
- Why choose OpenPAI
- How to deploy
- How to use
- Resources
- Get Involved
- How to contribute
- When your organization needs to share powerful AI computing resources (GPU/FPGA farm, etc.) among teams.
- When your organization needs to share and reuse common AI assets like Model, Data, Environment, etc.
- When your organization needs an easy IT ops platform for AI.
- When you want to run a complete training pipeline in one place.
The platform incorporates the mature design that has a proven track record in Microsoft's large-scale production environment.
OpenPAI is a full stack solution. OpenPAI not only supports on-premises, hybrid, or public Cloud deployment but also supports single-box deployment for trial users.
Pre-built docker for popular AI frameworks. Easy to include heterogeneous hardware. Support Distributed training, such as distributed TensorFlow.
OpenPAI is a most complete solution for deep learning, support virtual cluster, compatible Hadoop / kubernetes eco-system, complete training pipeline at one cluster etc. OpenPAI is architected in a modular way: different module can be plugged in as appropriate.
Before start, you need to meet the following requirements:
- Ubuntu 16.04
- Assign each server a static IP address. Network is reachable between servers.
- Server can access the external network, especially need to have access to a Docker registry service (e.g., Docker hub) to pull the Docker images for the services to be deployed.
- All machines' SSH service is enabled, share the same username / password and have sudo privilege.
- Need to enable NTP service.
- Recommend no Docker installed or a Docker with api version >= 1.26.
- How to write PAI jobs
- How to submit PAI jobs
- How to request on-demand resource for in place training
- The OpenPAI user documentations provides in-depth instructions for using OpenPAI
- Visit the release notes to read about the new features, or download the release today.
- StackOverflow: If you have questions about OpenPAI, please submit question at Stackoverflow under tag: openpai
- Report an issue: If you have issue/ bug/ new feature, please submit it at Github
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
- Folks who want to add support for other ML and DL frameworks
- Folks who want to make OpenPAI a richer AI platform (e.g. support for more ML pipelines, hyperparameter tuning)
- Folks who want to write tutorials/blog posts showing how to use OpenPAI to solve AI problems
One key purpose of PAI is to support the highly diversified requirements from academia and industry. PAI is completely open: it is under the MIT license. This makes PAI particularly attractive to evaluate various research ideas, which include but not limited to the components.
PAI operates in an open model. It is initially designed and developed by Microsoft Research (MSR) and Microsoft Search Technology Center (STC) platform team. We are glad to have Peking University, Xi'an Jiaotong University, Zhejiang University, and University of Science and Technology of China join us to develop the platform jointly. Contributions from academia and industry are all highly welcome.