Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I exclude eg. code examples and/or tables from translation? #41

Closed
jtippett opened this issue May 12, 2023 · 4 comments
Closed
Labels
bug Something isn't working enhancement New feature or request

Comments

@jtippett
Copy link

jtippett commented May 12, 2023

I've tried to use the exclude context tools to ensure the plugin doesn't try to translate code, but it doesn't seem to work. I've tried regex, all combinations I can think of. For example, with the structure

<table class="processedcode">
  <tr>
    <td class="codeinfo">​<span class="codeprefix">1:&nbsp;</span></td>
    <td class="codeline">puts 1.0 + 2.0 </td>
  </tr>
  <tr>
    <td class="codeinfo">​<span class="codeprefix">2:&nbsp;</span></td>
    <td class="codeline">puts 2.0 * 3.0</td>
  </tr>
  <tr>
    <td class="codeinfo">​<span class="codeprefix">3:&nbsp;</span></td>
    <td class="codeline">puts 5.0 - 8.0</td>
  </tr>
  <tr>
    <td class="codeinfo">​<span class="codeprefix">4:&nbsp;</span></td>
    <td class="codeline">puts 9.0 / 2.0</td>
  </tr>
</table>

There's nothing I have been able to do to avoid all of this being submitted for translation - with nonsensical results. I have also tried using a glossary for certain programming symbols and it seems to be ignored. I would have thought that simply adding code as an exclude in HTML elements would do the trick, but it does nothing.

Any advice?

And also, which takes precedence, the cache or the ignore/glossary settings? Another possibility is that it has cached all the (erroneous) code translations and is skipping all the rules I'm trying to add. I see it's sqlite so I actually should be able to just load it and edit - could its location be output in the job log?

@jtippett jtippett changed the title How can I exclude eg. code examples from translation? How can I exclude eg. code examples and/or tables from translation? May 12, 2023
@jtippett
Copy link
Author

Here are the regexes I've tried in the HTML element mode

class="processedcode"
class="codeinfo"
class="codeprefix"
class="codeline"
class="about-pb"
<tr>
table
td
^<tr>
^<td
^<pre
</table>$

@bookfere
Copy link
Owner

bookfere commented May 12, 2023

There is an issue with the "Ignore Paragraph" feature when used in this way. Unfortunately, you cannot currently control which element is recognized as a "paragraph" since the plugin determines it.

With the example you provided, the plugin will extract each element with non-empty content as below:

<span class="codeprefix">1:&nbsp;</span>
<td class="codeline">puts 1.0 + 2.0 </td>
​<span class="codeprefix">2:&nbsp;</span>
<td class="codeline">puts 2.0 * 3.0</td>
​<span class="codeprefix">3:&nbsp;</span>
<td class="codeline">puts 5.0 - 8.0</td>
<span class="codeprefix">4:&nbsp;</span>
<td class="codeline">puts 9.0 / 2.0</td>

As you can see, the extracted elements do not include <table class="processedcode">. This is why you cannot ignore it by the regex.

Therefore, I will work on adding a feature that allows you to specify tag and class-like attributes to control which elements should not be extracted for translation, thus solving this problem.

The precedence of three features is: ignore > cache > glossary.

@bookfere bookfere added the enhancement New feature or request label May 12, 2023
@jtippett
Copy link
Author

Thanks so much for the explanation and looking forward to the enhancement. In the meantime, I was able to write a quick ruby script which is able to strip unwanted spans/tables/anything by simply exploding the processed epub, traversing it, and simply deleting any element with class="unwanted" and lang="lang". If anyone wants it as a stopgap until the real improvement is implemented, let me know and I can post it (programming knowledge required).

@bookfere bookfere added the bug Something isn't working label May 12, 2023
@bookfere
Copy link
Owner

bookfere commented May 12, 2023

Here are the regexes I've tried in the HTML element mode

class="processedcode"
class="codeinfo"
class="codeprefix"
class="codeline"
class="about-pb"
<tr>
table
td
^<tr>
^<td
^<pre
</table>$

I also found a bug after reviewing the relevant code. The regexes you provided were unable to catch the table element with class="processedcode" due to the reason I explained earlier. Other regexes such as class="codeinfo", however, were also not working as expected. This was due to the rules not being applied properly, but I quickly fixed the bug. You can try upgrading the plugin to the latest version, v1.3.8, to see if the issue has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants