-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some contributors appear several times under a different name #47
Comments
I have the same issue. I tried working with the .mailmap file, but there is no difference. |
weird, i thought .mailmap would do the trick feel free to investigate |
Ok thx. What I found out is, if you just have one entry in your .mailmap, it will be recognized. Also my output with |
weird, maybe gitpython doesn't parse .mailmap? |
No, they don't: gitpython-developers/GitPython#764 |
feel free to commit a fix for this! |
Does this problem persist? Any solution. |
Pretty sure the problem still exists, so feel free to try to fix it! |
Workaround: fix-authors.js const fs = require("fs");
const authors = JSON.parse(fs.readFileSync("./authors.json"));
const labels = authors.labels;
const output = {
...authors,
};
const mailMap = {
Houlbreque: "Hugo Masclet",
"Hugo Masclet": "Hugo Masclet",
Hugoo: "Hugo Masclet",
"Masclet Hugo": "Hugo Masclet",
"Vincent Houlbr\u00e8que": "Vincent Houlbr",
Vinzeebreak: "Vincent Houlbr",
adizout: "adizout",
mathrb: "mathrb",
srdadian: "srdadian",
vinzeebreak: "Vincent Houlbr",
};
let memo = {},
memoIndex = 0;
const map = labels.map((name, index) => {
const toName = mailMap[name];
if (!memo[toName]) {
memo[toName] = memoIndex++;
}
return memo[toName];
});
output.y = output.y.reduce((output, item, index) => {
const toMap = map[index];
item.forEach((value, i2) => {
output[toMap] = output[toMap] || [];
output[toMap][i2] = output[toMap][i2] || 0;
output[toMap][i2] += value;
});
return output;
}, []);
output.labels = Object.keys(memo);
fs.writeFileSync("./authors.out.json", JSON.stringify(output, null, 4)); Then you can plot with:
|
I tried @dht 's script, but ended up with some authors getting mixed up. I wrote a comparable script in Python, that could probably be converted into a PR without too much effort (I just ran out of time to figure out how to integrate file paths with the CLI and the complexities of the Expand to see full script (120 lines)"""
Aggregates contribution data from the `authors.json` file generated
by the `git-of-theseus` tool using an `authors_map.json` file.
The `authors_map.json` file must have the following format:
{
"authorA": ["aliasA", "aliasA2", ...],
"authorB": ["aliasB", "aliasB2", ...],
}
"""
import json
def read_authors_map(path):
with open(path, "r") as f:
authors_map = json.load(f)
return authors_map
def read_authors_json(path):
with open(path, "r") as aj:
authors_json = json.load(aj)
return authors_json
def parse_raw_contributions(authors_json):
"""
The `authors.json` has the following format
{
"y": [
[<line_count1>, <line_count2>, ...],
[<line_count1>, <line_count2>, ...],
...
],
"ts": ["date1", "date2", ...]
"labels": ["aliasA", "aliasB", ...]
}
Each author's line count over time is stored separately
from the author list. The association is made by index.
This function parses the `authors.json` into the following
format:
{
"aliasA": [<line_count1>, <line_count2>, ...],
"aliasB": [<line_count1>, <line_count2>, ...],
...
}
"""
raw_contributions = {}
for idx, alias in enumerate(authors_json["labels"]):
raw_contributions[alias] = authors_json["y"][idx]
return raw_contributions
def aggregate_contributions(authors_map, raw_contributions):
"""
Aggregates the contribution data from each `alias` in the
`raw_contributions` based on the `authors_map`.
Returns a dictionary of the following format:
{
"authorA": [<line_count1>, <line_count2>, ...],
"authorB": [<line_count1>, <line_count2>, ...],
}
where the values of each `author` are the sum of the contribution
data for each author's corresponding aliases in the `authors_map`.
For example, if the author `authorA` has aliases `aliasA` and `aliasA2`,
and the `raw_contributions` data looks like this:
{
"aliasA": [10, 20],
"aliasA2": [5, 20],
}
then the aggregated contribution data will look like this:
{
"authorA": [15, 40],
}
"""
contributions = {}
for author, aliases in authors_map.items():
alias_contributions = [
raw_contributions[a] for a in aliases if a in raw_contributions
]
if len(alias_contributions) > 0:
contributions[author] = [
sum(ac[idx] for ac in alias_contributions)
for idx in range(len(alias_contributions[0]))
]
return contributions
def format_new_authors_json(authors_map, authors_json, contributions):
"""
Formats the `contributions` data into the `authors.json` format.
"""
return {
"y": [
contributions[author]
for author in authors_map.keys()
if author in contributions
],
"ts": authors_json["ts"],
"labels": [author for author in authors_map.keys() if author in contributions],
}
def write_authors_json(path, authors_json):
with open(path, "w") as f:
json.dump(authors_json, f)
if __name__ == "__main__":
authors_map = read_authors_map("authors_map.json")
authors_json = read_authors_json("authors.json")
raw_contributions = parse_raw_contributions(authors_json)
contributions = aggregate_contributions(authors_map, raw_contributions)
new_authors_json = format_new_authors_json(authors_map, authors_json, contributions)
write_authors_json("authors.out.json", new_authors_json) |
I think a mailmap file might resolve it, but I'm not sure |
@erikbern I tried:
But, the created graphs still don't disambiguate between authors using what is specified in |
@Whathecode It doesn't look like git-of-theseus currently considers a |
I thought I guess not? Would be nice to support .mailmap files! Thanks for checking @Whathecode – really appreciate it! |
I also just ran into this. The .mailmap issue is still unresolved at GitPython and apparently that repo is now in maintenance mode and no longer actively maintained. Not sure if that means that dependency will ultimately need to be swapped out although I have no idea how big that job would be or what alternatives exist. |
@owenlamont The maintainer of GitPython actively responds to PRs, including PRs for new features (I had one merged in a few months ago). If someone contributed |
Good to know, cheers. I kind of got mixed messages from the README as to how much it was still supported. I'll try to have a look at what is involved. |
I tried this project on https://github.com/vinzeebreak/ironcar
What I did:
And I get this:
But, several authors are the same person (and they appear under only one name in github's list of commits):
Shouldn't they appear under the same name ?
The text was updated successfully, but these errors were encountered: