Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LWTA abbreviation support #12109

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Thomas-Bridgart
Copy link

(This is a slight remake of another pull request which I made from main by accident)
See issue koppor#215:

Currently I've written the code to read the .csv file containing all lwta abbreviations, and to abbreviate a journal name. I note the following problems with what I've written:

-- it doesn't work properly on some edge cases of lwta abbreviations (occasionally, the standard requires context clues -- e.g. distinguishing between 'real' (english) and 'real' (spanish), or how '&' should be removed if it means 'and', but not in abbreviations)

-- lwta abbreviations require the elimination of articles and prepositions (in most cases), which here I've only implemented by writing a (currently very in-exhaustive) list of such.

-- I'm a little unsure of the class structure I've written for the implementation

Furthermore, it will not quite be possible to write a fully correct unabbreviate method, as the abbreviation is not quite injective.

If the above problems are deal-breakers, then I won't bother continuing to write the frontend / mvstore code (hence why this incomplete pr exists).

Mandatory checks

  • I own the copyright of the code submitted and I licence it under the MIT license
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@Thomas-Bridgart Thomas-Bridgart changed the title Add Lwta abbreviation support Add LWTA abbreviation support Oct 27, 2024
@Thomas-Bridgart Thomas-Bridgart mentioned this pull request Oct 27, 2024
7 tasks
@Thomas-Bridgart Thomas-Bridgart marked this pull request as draft October 27, 2024 08:26
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@koppor
Copy link
Member

koppor commented Oct 29, 2024

src/main/resources/ltwa_abb.csv is somewhere downloaded the net - koppor#215 (comment)

We MUST NOT include this file in the source code tree.

We need a gradle action downloading the file to build\resources\mainltwa_abb.csv. I think, hook into generateSources or smilar task.

@@ -22,6 +22,8 @@ public class AbbreviationParser {
// Ensures ordering while preventing duplicates
private final LinkedHashSet<Abbreviation> abbreviations = new LinkedHashSet<>();

private final LinkedHashSet<LwtaAbbreviation> lwtaAbbreviations = new LinkedHashSet<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subclass the AbbreviationParser, because LWTA is a separate functionality (too little coupoling with other methods). someone will say that there should be composition over inheritance - not sure which method you really need of this class.

String name = csvRecord.size() > 0 ? csvRecord.get(0) : "";
String abbreviation = csvRecord.size() > 1 ? csvRecord.get(1) : "";

// Check name and abbreviation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment - this is clear from the statement.

Comment on lines +117 to +119
if (string.endsWith("-")) {
string = string.substring(0, string.length() - 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use

string = org.jabref.model.strings.StringUtil.removeStringAtTheEnd(string, "-");

(replace by normal import - i just want tos show you the package)

private final boolean allowsPrefix;

enum Position {
ENDS_WORD, STARTS_WORD, IN_WORD, FULL_WORD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort more logically: full, starts, in, end

@@ -0,0 +1,41 @@
package org.jabref.logic.journals;

public class LwtaAbbreviation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor to record


private final Map<String, LwtaAbbreviation> lwtaToAbbreviationObject;

// incomplete list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that for an "incomplete" list. What should one do to add words? Either state that or remove the comment.

Comment on lines +20 to +22
/**
* instantiates this class with a csv file
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment - or replace it by an explenation of the @param file parameter


public class LwtaAbbreviationRepository {

private final Map<String, LwtaAbbreviation> lwtaToAbbreviationObject;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Object at the end. Nearly everything is an object in Java.

Comment on lines +13 to +24
static final String[] JOURNAL_NAMES = new String[]{"international journal", "Journal of Medicine", "journal of medicine", "journal", "Physics & geobiology"};
static final String[] ABBREVIATED_NAMES = new String[]{"int. j.", "J. Medicine", "j. medicine", "journal", "Phys. geobiol."};

@Test
void abbreviateJournalNameTest() throws IOException {
Path path1 = Paths.get("src", "main", "resources", "ltwa_abb.csv");
LwtaAbbreviationRepository lwtaAbbreviationRepository = new LwtaAbbreviationRepository(path1);
for (int i = 0; i < JOURNAL_NAMES.length; i++) {
assertEquals(lwtaAbbreviationRepository.abbreviateJournalName(JOURNAL_NAMES[i]), ABBREVIATED_NAMES[i]);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to @ParameterizedTest.

@koppor koppor added the status: changes required Pull requests that are not yet complete label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: changes required Pull requests that are not yet complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants