Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple languages for an extension #1

Merged
merged 2 commits into from
Jun 23, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion src/languages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,32 @@ export class Languages {
private loadExtensionMap = () => {
const extensions: ExtensionsTypes = {};

/**
* The extension map can contain multiple languages with the same extension,
* but we only want a single one. For the moment, these clashes are resolved
* by the simple heuristic below listing high-priority languages. We may want
* to consider smarter heuristics to correctly identify languages in cases
* where the extension is ambiguous. The ordering of the list matters and
* languages earlier on will get a higher priority when resolving clashes.
*/
const importantLanguages = ["javascript", "typescript", "ruby", "python", "java", "c", "c++", "c#", "rust", "scala", "perl", "go"];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this out to a top-level constant?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just curious, can you give an example of where the ordering here is important?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only example I found where the ordering actually mattered was the case of .spec that I described in the PR description as this extension appears in a few languages but (to the best of my understanding) is mostly important for Ruby where it is common in unit tests. I still ordered the other languages roughly by popularity in case any ambiguities arise in their extensions in future, but at the moment there are no other ambiguities among these languages (except .h being shared between c and c++ but in CodeQL we don't care about that because they both map to the same thing).


Object.keys(languageMap).forEach((language) => {
const languageMode = languageMap[language];
const languageExtensions = (languageMode && languageMode.extensions) || [];
languageExtensions.forEach((extension: string) => {
extensions[extension.toLowerCase()] = language.toLowerCase();
if (!extensions[extension.toLowerCase()]) {
edoardopirovano marked this conversation as resolved.
Show resolved Hide resolved
extensions[extension.toLowerCase()] = language.toLowerCase();
} else {
const currentLanguagePriority = importantLanguages.indexOf(extensions[extension.toLowerCase()]);
if (currentLanguagePriority == -1) {
edoardopirovano marked this conversation as resolved.
Show resolved Hide resolved
extensions[extension.toLowerCase()] = language.toLowerCase();
} else {
const otherPriority = importantLanguages.indexOf(language.toLowerCase());
if (otherPriority != -1 && otherPriority < currentLanguagePriority)
edoardopirovano marked this conversation as resolved.
Show resolved Hide resolved
extensions[extension.toLowerCase()] = language.toLowerCase();
}
}
});
});

Expand Down
16 changes: 16 additions & 0 deletions test/data2/x.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
using System;

// A comment
/*
* A multi-line comment
*/
namespace HelloWorldApp {
class Geeks {
static void Main(string[] args) {
if (true) {
Console.WriteLine("Hello World!");
}
}
}
}

13 changes: 10 additions & 3 deletions test/directory.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,24 @@ describe('Directory', () => {
cwd: __dirname + '/data2'
}, {
files: [
'data2/x.cs',
'data2/x.html',
'data2/x.js',
'data2/x.py',
'data2/x.rb',
],
info: {
code: 23,
comment: 23,
total: 61,
code: 33,
comment: 27,
total: 78,
},
languages: {
"c#": {
code: 10,
comment: 4,
sum: 1,
total: 17,
},
html: {
code: 4,
comment: 4,
Expand Down
13 changes: 13 additions & 0 deletions test/file.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,19 @@ describe('File', () => {
});
});

it('should calculate info for a csharp file', async () => {
await doTest('x.cs', {
languages: 'c#',
lines: {
code: 10,
comment: 4,
total: 17
},
name: 'x.cs',
size: 253
});
});

async function doTest(fileName: string, expectedFileInfo: FileInfo) {
const fullPath = slash(path.join(__dirname, `/data2/${fileName}`));
const actualFileInfo = await new LocFile(fullPath).getFileInfo();
Expand Down