A framework for distributed graph computation and machine learning at wechat scale, for more details, see 柏拉图简介 | Plato Introduction.
Authors(In alphabetical order): Benli Li, Conghui He, Donghai Yu, Pin Gao, Shijie Sun, Wenqiang Wu, Wanjing Wei, Xing Huang, Xiaogang Tu, Yangzihao Wang, Yongan Li.
Contact: [email protected]
Special thanks to Xiaowei Zhu and many for their work Gemini[1]. Several basic utility functions in Plato is derived from Gemini, the design principle of some dual-mode based algorithms in Plato is also heavily influenced by Gemini's dualmode-engine. Thanks to Ke Yang and many for their work KnightKing[2] which served as foundation of plato's walk-engine.
To simplify installation, Plato currently downloads and builds most of its required dependencies by calling following commands. You should call it at least once before any build operations.
# install compile dependencies.
sudo ./docker/install-dependencies.sh
# download and build staticlly linked libraries.
./3rdtools.sh distclean && ./3rdtools.sh install
Plato was developed and tested on x86_64 cluster and Centos 7.0. Theoretically, it can be ported to other Linux distribution easily.
./build.sh
./scripts/run_pagerank_local.sh
Prerequisite:
- A cluster which can submit MPI programs(Hydra is a feasible solution).
- An accessible HDFS where Plato can find its input and put output on it.
A sample submit script was locate in here, modify it based on your cluster's environment and run.
./scripts/run_pagerank.sh
[1] Xiaowei Zhu, Wenguang Chen, Weimin Zheng, Xiaosong Ma. Gemini: A computation-centric distributed graph processing system. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16)
[2] Ke Yang, Mingxing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, Yong Jiang. KnightKing: A Fast Distributed Graph Random Walk Engine. In ACM SIGOPS 27th Symposium on Operating Systems Principles (SOSP ’19)
在 Plato 系统上实现 PersonalizedPageRank(个性化pagerank)、TrustRank(信任指数)、BeliefPropagation(置信度传播)三个图算法。
- 核心算法文件:
plato/algo/ppr/personalized_pagerank.hpp
- 算法 CLI 应用文件:
example/personalized_pagerank.cc
- 算法运行脚本:
scripts/run_ppr_local.sh
- 基于 PUSH-PULL 切换优化的算法版本:
example/pushpull_ppr.cc
和scripts/run_pushpull_ppr_local.sh
- 算法正确性验证: 参照 Spark-GraphX 和 Neo4j PageRank 相关实现
- 核心算法文件:
plato/algo/trustrank/trustrank.hpp
- 算法 CLI 应用文件:
example/trustrank.cc
- 算法运行脚本:
scripts/run_trustrank_local.sh
- 算法正确性验证: 参照 TrustRank 论文 及 bhaveshgawri/PageRank 相关实现
- 核心算法文件:
plato/algo/bp/belief_propagation.hpp
- 算法 CLI 应用文件:
example/belief_propagation.cc
- 算法运行脚本:
scripts/run_bp_local.sh
- 算法正确性验证: 参照 HewlettPackard/sandpiper 和 mbforbes/py-factorgraph 相关实现