https://t500-aggregator.aryankothari.dev/
T500 Aggregator is a suite of tools and a web service to collect and display data on the Overwatch 2 Top 500 leaderboards.
Data is collected from the Overwatch 2 leaderboards directly. This data is then processed by a hashing algorithm called "dhash". Each hero present in a single leaderboard image is hashed and compared using a hamming distance comparison algorithm to determine its similarity to a known asset of a hero.
The data is stored in the ./data
directory within the reposiroty.
The final processed dataset has a 100% accuracy to what is observed. This analysis is done by running the comparison algorithm on benchmarks, which can be found in the ./assets/benchmark/
directory.
These benchmarks are updated regularly to identify changes in the leaderboard images, as well as to performance test the impact of adding a new hero.
It is possible that the data is not 100% accurate, as I'm unable to fully verify each hero scanned to its actual counterpart.
The leaderboards provide a snapshot of the heroes that are being played in the game. More specifically, it shows you the top 3 heroes for any given player. This prevents you from truely understanding the pickrates of a hero.
However, the actual data can be approximated.
For example, given a player with 3 top hero playtime, you can approximate the second and third most played heroes as a percentage of the most played hero. These values can be found in ./frontend/app/server/actions.ts
. The weights are determined by scraping the public top 500 profiles and determining an estimation of the actual ratios. A sample size of 280, spanning all regions and seasons up to season 14 is used for this calculation.
In a normal scenario, the top 3 would each recieve a full point. So in a chart, a player with Juno as their most played, would account for 1 entry, and another player with Kiriko as their second most played, would also acccount for 1 entry.
Weihgting the values will transform that into Juno=1 and Kiriko=0.5 (sample value, see actions.ts
for real weight)
Both the raw and weighted values are available in the frontend.
The Gini Coefficient is a measure of inequality.
In short, this value allows you to determine how evenly a distribution of values is spread. In this case, it represents how equally the heroes are played.
A value of 0 means that the distribution is perfectly even, for example, each hero is picked the same number of times, while a value of 1 indicated there is a very large spread.
In top 500, the occurrence of lower picked heroes is disproportionately low as they simply do not appear in the top 3 very often. For this reason, when calculating the Gini Coefficient, the 10th percentile is excluded from the calculation.
The top 500 leaderboards presents some challenges in terms of data collection and validity.
- The top 500 leaderboards get pruned of banned players once the season ends. This means the data must be collected hours before the season ends to get a full 4500 sample.
- Due to a bug with connected accounts, some players appear twice in the leaderboards. This is not accounted for in this project as it impacts approximately 1% of cases, and would skew the data either way.