Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for the GitHub Search API limit #276

Merged
merged 21 commits into from
Nov 11, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ static ArgumentParser getArgumentParser() {
.help("regex of repository names to exclude from pull request generation");
parser.addArgument("-B")
.help("additional body text to include in pull requests");
parser.addArgument("-l", "--" + Constants.GIT_API_SEARCH_LIMIT)

This comment was marked as resolved.

.help("limit the search results for github api (default: 1000)");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add .setDefault(1000)...that way you don't need those constants you added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you won't need the null check you added either ;)

return parser;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,22 @@ public void execute(final Namespace ns, final DockerfileGitHubUtil dockerfileGit
Multimap<String, String> pathToDockerfilesInParentRepo = ArrayListMultimap.create();

Set<Map.Entry<String, JsonElement>> imageToTagStore = parseStoreToImagesMap(ns.get(Constants.STORE));
Integer gitApiSearchLimit;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for any of the below integer parsing b/c of adding the .type above

if (ns.get(Constants.GIT_API_SEARCH_LIMIT) == null || Integer.parseInt(ns.get(Constants.GIT_API_SEARCH_LIMIT)) > Constants.GIT_API_SEARCH_LIMIT_NUMBER) {
gitApiSearchLimit = Constants.GIT_API_SEARCH_LIMIT_NUMBER;
} else {
gitApiSearchLimit = Integer.parseInt(ns.get(Constants.GIT_API_SEARCH_LIMIT));
}
for (Map.Entry<String, JsonElement> imageToTag : imageToTagStore) {
String image = imageToTag.getKey();
log.info("Repositories with image {} being forked.", image);
imageToTagMap.put(image, imageToTag.getValue().getAsString());
PagedSearchIterable<GHContent> contentsWithImage =
this.dockerfileGitHubUtil.findFilesWithImage(image, ns.get(Constants.GIT_ORG));
forkRepositoriesFound(pathToDockerfilesInParentRepo,
imagesFoundInParentRepo, contentsWithImage, image);
List<PagedSearchIterable<GHContent>> contentsWithImage =
this.dockerfileGitHubUtil.findFilesWithImage(image, ns.get(Constants.GIT_ORG), null, gitApiSearchLimit);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use Optional.empty() here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to change your method signature as well, but it is worth it to avoid null checks

for (int i = 0; i < contentsWithImage.size(); i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: cleaner to do this the modern java way:

contentsWithImage.forEach(pagedSearchIterable -> {
                forkRepositoriesFound(pathToDockerfilesInParentRepo,
                        imagesFoundInParentRepo, contentsWithImage.get(i), image);
            });

forkRepositoriesFound(pathToDockerfilesInParentRepo,
imagesFoundInParentRepo, contentsWithImage.get(i), image);
}
}

GHMyself currentUser = this.dockerfileGitHubUtil.getMyself();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,29 +58,41 @@ public void execute(final Namespace ns, DockerfileGitHubUtil dockerfileGitHubUti
new GitForkBranch(ns.get(Constants.IMG), ns.get(Constants.TAG), ns.get(Constants.GIT_BRANCH));

log.info("Finding Dockerfiles with the given image...");
Optional<PagedSearchIterable<GHContent>> contentsWithImage = dockerfileGitHubUtil.getGHContents(ns.get(Constants.GIT_ORG), img);

Integer gitApiSearchLimit;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here as well...this just boils down to Integer gitApiSearchLimit = ns.get(Constants.GIT_API_SEARCH_LIMIT); w/ the above recommendations

if (ns.get(Constants.GIT_API_SEARCH_LIMIT) == null || Integer.parseInt(ns.get(Constants.GIT_API_SEARCH_LIMIT)) > Constants.GIT_API_SEARCH_LIMIT_NUMBER) {
gitApiSearchLimit = Constants.GIT_API_SEARCH_LIMIT_NUMBER;
} else {
gitApiSearchLimit = Integer.parseInt(ns.get(Constants.GIT_API_SEARCH_LIMIT));
}
Optional<List<PagedSearchIterable<GHContent>>> contentsWithImage = dockerfileGitHubUtil.getGHContents(ns.get(Constants.GIT_ORG), img, gitApiSearchLimit);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is longer than 100 characters (found 161).

if (contentsWithImage.isPresent()) {
Multimap<String, GitHubContentToProcess> pathToDockerfilesInParentRepo =
pullRequestSender.forkRepositoriesFoundAndGetPathToDockerfiles(contentsWithImage.get(), gitForkBranch);
List<IOException> exceptions = new ArrayList<>();
List<String> skippedRepos = new ArrayList<>();

for (String currUserRepo : pathToDockerfilesInParentRepo.keySet()) {
Optional<GitHubContentToProcess> forkWithContentPaths =
pathToDockerfilesInParentRepo.get(currUserRepo).stream().findFirst();
if (forkWithContentPaths.isPresent()) {
try {
changeDockerfiles(ns, pathToDockerfilesInParentRepo, forkWithContentPaths.get(), skippedRepos);
} catch (IOException e) {
log.error(String.format("Error changing Dockerfile for %s", forkWithContentPaths.get().getParent().getFullName()), e);
exceptions.add(e);
List<PagedSearchIterable<GHContent>> contentsFoundWithImage = contentsWithImage.get();
for (int i = 0; i < contentsFoundWithImage.size(); i++ ) {
Multimap<String, GitHubContentToProcess> pathToDockerfilesInParentRepo =
pullRequestSender.forkRepositoriesFoundAndGetPathToDockerfiles(contentsFoundWithImage.get(i), gitForkBranch);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is longer than 100 characters (found 133).



List<IOException> exceptions = new ArrayList<>();
List<String> skippedRepos = new ArrayList<>();

for (String currUserRepo : pathToDockerfilesInParentRepo.keySet()) {
Optional<GitHubContentToProcess> forkWithContentPaths =
pathToDockerfilesInParentRepo.get(currUserRepo).stream().findFirst();
if (forkWithContentPaths.isPresent()) {
try {
changeDockerfiles(ns, pathToDockerfilesInParentRepo, forkWithContentPaths.get(), skippedRepos);
} catch (IOException e) {
log.error(String.format("Error changing Dockerfile for %s", forkWithContentPaths.get().getParent().getFullName()), e);
exceptions.add(e);
}
} else {
log.warn("Didn't find fork for {} so not changing Dockerfiles", currUserRepo);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is longer than 100 characters (found 102).

}
} else {
log.warn("Didn't find fork for {} so not changing Dockerfiles", currUserRepo);
}
}

ResultsProcessor.processResults(skippedRepos, exceptions, log);
ResultsProcessor.processResults(skippedRepos, exceptions, log);
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,6 @@ private Constants() {
public static final String GIT_PR_BODY = "B";
public static final String GIT_ADDITIONAL_COMMIT_MESSAGE = "c";
public static final String GIT_REPO_EXCLUDES = "excludes";
public static final String GIT_API_SEARCH_LIMIT = "git-api-search-limit";
public static final Integer GIT_API_SEARCH_LIMIT_NUMBER = 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need based on above suggestion

}
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@
import org.slf4j.LoggerFactory;

import java.io.*;
import java.util.List;
import java.util.Optional;
import java.util.*;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the '.' form of import should be avoided - java.util..

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong lexicographical order for 'java.util.*' import. Should be before 'org.slf4j.LoggerFactory'.

import java.util.concurrent.TimeUnit;

/**
Expand Down Expand Up @@ -83,28 +82,78 @@ public GHRepository getRepo(String repoName) throws IOException {
return gitHubUtil.getRepo(repoName);
}

public PagedSearchIterable<GHContent> findFilesWithImage(String image, String org) throws IOException {
public List<PagedSearchIterable<GHContent>> findFilesWithImage(String image, String orgToInclude, String orgToExclude, Integer gitApiSearchLimit) throws IOException {
GHContentSearchBuilder search = gitHubUtil.startSearch();
// Filename search appears to yield better / more results than language:Dockerfile
// Root cause: linguist doesn't currently deal with prefixes of files:
// https://github.com/github/linguist/issues/4566
search.filename("Dockerfile");
if (org != null) {
search.user(org);
if (orgToInclude != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if orgToInclude.equals(orgToExclude). Can that ever happen?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should pass in a Map<String, Boolean> instead of orgToInclude and orgToInclude...if map is empty, then we do search.q for each entry that has true value, we do search.user otherwise, we build a string with multiple -org:

search.user(orgToInclude);
}
if (orgToExclude != null) {
String queryExcludingOrg = String.format("-org:{}", orgToExclude);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String.format takes %s not {}

search.q(queryExcludingOrg);
}

if (image.substring(image.lastIndexOf(' ') + 1).length() <= 1) {
throw new IOException("Invalid image name.");
}
List<String> terms = GitHubImageSearchTermList.getSearchTerms(image);
log.info("Searching for {} with terms: {}", image, terms);
terms.forEach(search::q);
PagedSearchIterable<GHContent> files = search.list();
List<PagedSearchIterable<GHContent>> allFiles = new ArrayList<>();
int totalCount = files.getTotalCount();
if (totalCount > 1000) {
log.warn("Number of search results is above 1000! The GitHub Search API will only return around 1000 results - https://developer.github.com/v3/search/#about-the-search-api");
}
log.info("Number of files found for {}: {}", image, totalCount);
return files;
if (totalCount > gitApiSearchLimit && orgToInclude != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this warning any more?

log.warn("Number of search results is above {}! The GitHub Search API will only return around 1000 results - https://developer.github.com/v3/search/#about-the-search-api", Constants.GIT_API_SEARCH_LIMIT_NUMBER.toString());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: our style is not multi-line if else blocks

else if (totalCount > gitApiSearchLimit) {
return getSearchResultsExcludingOrgWithMostHits(image, files, gitApiSearchLimit);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not totally convinced that this will filter out multiple orgs if we need two several cycles of excludes.

For example 3 orgs:
OrgA - 1000 results
OrgB - 1000 results
OrgC - 1000 results

We'll need to do something like:
T0: query with all orgs, 3k total results, keep results for OrgA assuming those appeared and not others.
T1: query with all orgs minus OrgA, 2k total results, keep results for OrgB assuming those appeared and not others.
T2: query with all orgs minus OrgB and OrgC, 1k total results, we are finished.

I don't think we do that here b/c we don't seem to make the org exclude str additive, but I might be missing something.

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an else clause to warn that we encountered an unexpected code path?

allFiles.add(files);
return allFiles;
}

protected String getOrgNameWithMaximumHits(PagedSearchIterable<GHContent> files) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern java way of doing this:

return files.toList().stream()
                .collect(Collectors.groupingBy(ghContent -> ghContent.getOwner().getOwnerName(), Collectors.counting()))
                .entrySet()
                .stream()
                .max(Map.Entry.comparingByValue())
                .get()
                .getKey();

GHRepository org;
String orgName;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for expanded scope of these

String orgWithMaximumHits = "";
int maxCount = 0;
HashMap<String, Integer> orgHitsMap = new HashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HashMap<String, Integer> orgHitsMap = new HashMap<>();
Map<String, Integer> orgHitsMap = new HashMap<>();

for (GHContent ghContent : files) {
org = ghContent.getOwner();
orgName = org.getOwnerName();
if (orgHitsMap.containsKey(orgName)) {
int hits = orgHitsMap.get(orgName);
orgHitsMap.put(orgName, hits + 1);
} else {
orgHitsMap.put(orgName, 1);
}
}
for (Map.Entry<String, Integer> orgNameHits : orgHitsMap.entrySet()) {
orgName = orgNameHits.getKey();
int numberOfHits = orgNameHits.getValue();
if (numberOfHits > maxCount) {
orgWithMaximumHits = orgName;
maxCount = numberOfHits;
}
}
return orgWithMaximumHits;
}

protected List<PagedSearchIterable<GHContent>> getSearchResultsExcludingOrgWithMostHits(String image, PagedSearchIterable<GHContent> files, Integer gitApiSearchLimit) throws IOException {
List<PagedSearchIterable<GHContent>> contentsForOrgWithMaximumHits;
List<PagedSearchIterable<GHContent>> contentsExcludingOrgWithMaximumHits;
List<PagedSearchIterable<GHContent>> allContentsWithImage = new ArrayList<>();
String orgWithMaximumHits = getOrgNameWithMaximumHits(files);
contentsForOrgWithMaximumHits = findFilesWithImage(image, orgWithMaximumHits, null, gitApiSearchLimit);
contentsExcludingOrgWithMaximumHits = findFilesWithImage(image, null, orgWithMaximumHits, gitApiSearchLimit);
allContentsWithImage.add(contentsForOrgWithMaximumHits.get(0));
allContentsWithImage.add(contentsExcludingOrgWithMaximumHits.get(0));

return allContentsWithImage;
}

/* Workaround: The GitHub API caches API calls for up to 60 seconds, so back-to-back API calls with the same
Expand Down Expand Up @@ -319,19 +368,21 @@ public boolean thisUserIsOwner(GHRepository repo) throws IOException {
* @param org GitHub organization
* @param img image to find
*/
public Optional<PagedSearchIterable<GHContent>> getGHContents(String org, String img)
public Optional<List<PagedSearchIterable<GHContent>>> getGHContents(String org, String img, Integer gitApiSearchLimit)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviation in name 'getGHContents' must contain no more than '2' consecutive capital letters.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is longer than 100 characters (found 122).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method getGHContents has 28 lines of code (exceeds 25 allowed). Consider refactoring.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method getGHContents has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring.

throws IOException, InterruptedException {
PagedSearchIterable<GHContent> contentsWithImage = null;
List<PagedSearchIterable<GHContent>> contentsWithImage = null;
for (int i = 0; i < 5; i++) {
contentsWithImage = findFilesWithImage(img, org);
if (contentsWithImage.getTotalCount() > 0) {
contentsWithImage = findFilesWithImage(img, org, null, gitApiSearchLimit);
if (contentsWithImage.get(0).getTotalCount() > 0) {
break;
} else {
getGitHubUtil().waitFor(TimeUnit.SECONDS.toMillis(1));
}
}

int numOfContentsFound = contentsWithImage.getTotalCount();
int numOfContentsFound = 0;
for (int i = 0; i < contentsWithImage.size(); i++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "NullPointerException" could be thrown; "contentsWithImage" is nullable here.

numOfContentsFound += contentsWithImage.get(i).getTotalCount();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More idiomatic these days to do:
contentsWithImage.stream().mapToInt(PagedSearchIterable::getTotalCount).sum()

}
if (numOfContentsFound <= 0) {
log.info("Could not find any repositories with given image: {}", img);
return Optional.empty();
Expand Down
Loading