CogDL now supports the following datasets for different tasks:
- Network Embedding (Unsupervised node classification): PPI, Blogcatalog, Wikipedia, Youtube, DBLP, Flickr
- Semi/Un-superviesd Node classification: Cora, Citeseer, Pubmed, Reddit, PPI, PPI-large, Yelp, Flickr, Amazon
- Heterogeneous node classification: DBLP, ACM, IMDB
- Link prediction: PPI, Wikipedia, Blogcatalog
- Multiplex link prediction: Amazon, YouTube, Twitter
- graph classification: MUTAG, IMDB-B, IMDB-M, PROTEINS, COLLAB, NCI, NCI109, Reddit-BINARY
Dataset | #Nodes | #Edges | #Features | #Classes | #Train/Val/Test | Degree | #Name in Cogdl | |
---|---|---|---|---|---|---|---|---|
Transductive | ||||||||
Cora | 2,708 | 5,429 | 1,433 | 7(s) | 140 / 500 / 1000 | 2 | cora | |
Citeseer | 3,327 | 4,732 | 3,703 | 6(s) | 120 / 500 / 1000 | 1 | citeseer | |
PubMed | 19,717 | 44,338 | 500 | 3(s) | 60 / 500 / 1999 | 2 | pubmed | |
Chameleon | 2,277 | 36,101 | 2,325 | 5 | 0.48 / 0.32 / 0.20 | 16 | chameleon | |
Cornell | 183 | 298 | 1,703 | 5 | 0.48 / 0.32 / 0.20 | 1.6 | cornell | |
Film | 7,600 | 30,019 | 932 | 5 | 0.48 / 0.32 / 0.20 | 4 | film | |
Squirrel | 5201 | 217,073 | 2,089 | 5 | 0.48 / 0.32 / 0.20 | 41.7 | squirrel | |
Texas | 182 | 325 | 1,703 | 5 | 0.48 / 0.32 / 0.20 | 1.8 | texas | |
Wisconsin | 251 | 515 | 1,703 | 5 | 0.48 / 0.32 / 0.20 | 2 | Wisconsin | |
Inductive | ||||||||
PPI | 14,755 | 225,270 | 50 | 121(m) | 0.66 / 0.12 / 0.22 | 15 | ppi | |
PPI-large | 56,944 | 818,736 | 50 | 121(m) | 0.79 / 0.11 / 0.10 | 14 | ppi-large | |
232,965 | 11,606,919 | 602 | 41(s) | 0.66 / 0.10 / 0.24 | 50 | |||
Flickr | 89,250 | 899,756 | 500 | 7(s) | 0.50 / 0.25 / 0.25 | 10 | flickr | |
Yelp | 716,847 | 6,977,410 | 300 | 100(m) | 0.75 / 0.10 / 0.15 | 10 | yelp | |
Amazon-SAINT | 1,598,960 | 132,169,734 | 200 | 107(m) | 0.85 / 0.05 / 0.10 | 83 | amazon-s |
Dataset | #Nodes | #Edges | #Classes | #Degree | #Name in Cogdl |
---|---|---|---|---|---|
PPI | 3,890 | 76,584 | 50(m) | 20 | ppi-ne |
BlogCatalog | 10,312 | 333,983 | 40(m) | 32 | blogcatalog |
Wikipedia | 4.777 | 184,812 | 39(m) | 39 | wikipedia |
Flickr | 80,513 | 5,899,882 | 195(m) | 73 | flickr-ne |
DBLP | 51,264 | 2,990,443 | 60(m) | 2 | dblp-ne |
Youtube | 1,138,499 | 2,990,443 | 47(m) | 3 | youtube-ne |
Dataset | #Nodes | #Edges | #Features | #Classes | #Train/Val/Test | #Degree | #Edge Type | #Name in Cogdl |
---|---|---|---|---|---|---|---|---|
DBLP | 18,405 | 67,946 | 334 | 4 | 800 / 400 / 2857 | 4 | 4 | gtn-dblp(han-acm) |
ACM | 8,994 | 25,922 | 1,902 | 3 | 600 / 300 / 2125 | 3 | 4 | gtn-acm(han-acm) |
IMDB | 12,772 | 37,288 | 1,256 | 3 | 300 / 300 / 2339 | 3 | 4 | gtn-imdb(han-imdb) |
Amazon-GATNE | 10,166 | 148,863 | - | - | - | 15 | 2 | amazon |
Youtube-GATNE | 2,000 | 1,310,617 | - | - | - | 655 | 5 | youtube |
10,000 | 331,899 | - | - | - | 33 | 4 |
Dataset | #Nodes | #Edges | #Train/Val/Test | #Relations Types | #Degree | #Name in Cogdl |
---|---|---|---|---|---|---|
FB13 | 75,043 | 345,872 | 316,232 / 5,908 / 23,733 | 12 | 5 | fb13 |
FB15k | 14,951 | 592,213 | 483,142 / 50,000 / 59,071 | 1345 | 40 | fb15k |
FB15k-237 | 14,541 | 310,116 | 272,115 / 17,535 / 20,466 | 237 | 21 | fb15k237 |
WN18 | 40,943 | 151,442 | 141,442 / 5,000 / 5,000 | 18 | 4 | wn18 |
WN18RR | 86,835 | 93,003 | 86,835 / 3,034 / 3,134 | 11 | 1 | wn18rr |
TUdataset from https://www.chrsmrrs.com/graphkerneldatasets
Dataset | #Graphs | #Classes | #Avg. Size | #Name in Cogdl |
---|---|---|---|---|
MUTAG | 188 | 2 | 17.9 | mutag |
IMDB-B | 1,000 | 2 | 19.8 | imdb-b |
IMDB-M | 1,500 | 3 | 13 | imdb-m |
PROTEINS | 1,113 | 2 | 39.1 | proteins |
COLLAB | 5,000 | 5 | 508.5 | collab |
NCI1 | 4,110 | 2 | 29.8 | nci1 |
NCI109 | 4,127 | 2 | 39.7 | nci109 |
PTC-MR | 344 | 2 | 14.3 | ptc-mr |
REDDIT-BINARY | 2,000 | 2 | 429.7 | reddit-b |
REDDIT-MULTI-5k | 4,999 | 5 | 508.5 | reddit-multi-5k |
REDDIT-MULTI-12k | 11,929 | 11 | 391.5 | reddit-multi-12k |
BBBP | 2,039 | 2 | 24 | bbbp |
BACE | 1,513 | 2 | 34.1 | bace |