How can I exclude eg. code examples and/or tables from translation? #41

jtippett · 2023-05-12T05:36:05Z

I've tried to use the exclude context tools to ensure the plugin doesn't try to translate code, but it doesn't seem to work. I've tried regex, all combinations I can think of. For example, with the structure

<table class="processedcode">
  <tr>
    <td class="codeinfo"><span class="codeprefix">1:&nbsp;</span></td>
    <td class="codeline">puts 1.0 + 2.0 </td>
  </tr>
  <tr>
    <td class="codeinfo"><span class="codeprefix">2:&nbsp;</span></td>
    <td class="codeline">puts 2.0 * 3.0</td>
  </tr>
  <tr>
    <td class="codeinfo"><span class="codeprefix">3:&nbsp;</span></td>
    <td class="codeline">puts 5.0 - 8.0</td>
  </tr>
  <tr>
    <td class="codeinfo"><span class="codeprefix">4:&nbsp;</span></td>
    <td class="codeline">puts 9.0 / 2.0</td>
  </tr>
</table>

There's nothing I have been able to do to avoid all of this being submitted for translation - with nonsensical results. I have also tried using a glossary for certain programming symbols and it seems to be ignored. I would have thought that simply adding code as an exclude in HTML elements would do the trick, but it does nothing.

Any advice?

And also, which takes precedence, the cache or the ignore/glossary settings? Another possibility is that it has cached all the (erroneous) code translations and is skipping all the rules I'm trying to add. I see it's sqlite so I actually should be able to just load it and edit - could its location be output in the job log?

The text was updated successfully, but these errors were encountered:

jtippett · 2023-05-12T08:45:43Z

Here are the regexes I've tried in the HTML element mode

class="processedcode"
class="codeinfo"
class="codeprefix"
class="codeline"
class="about-pb"
<tr>
table
td
^<tr>
^<td
^<pre
</table>$

bookfere · 2023-05-12T09:13:46Z

There is an issue with the "Ignore Paragraph" feature when used in this way. Unfortunately, you cannot currently control which element is recognized as a "paragraph" since the plugin determines it.

With the example you provided, the plugin will extract each element with non-empty content as below:

<span class="codeprefix">1:&nbsp;</span>
<td class="codeline">puts 1.0 + 2.0 </td>
<span class="codeprefix">2:&nbsp;</span>
<td class="codeline">puts 2.0 * 3.0</td>
<span class="codeprefix">3:&nbsp;</span>
<td class="codeline">puts 5.0 - 8.0</td>
<span class="codeprefix">4:&nbsp;</span>
<td class="codeline">puts 9.0 / 2.0</td>

As you can see, the extracted elements do not include <table class="processedcode">. This is why you cannot ignore it by the regex.

Therefore, I will work on adding a feature that allows you to specify tag and class-like attributes to control which elements should not be extracted for translation, thus solving this problem.

The precedence of three features is: ignore > cache > glossary.

jtippett · 2023-05-12T10:02:18Z

Thanks so much for the explanation and looking forward to the enhancement. In the meantime, I was able to write a quick ruby script which is able to strip unwanted spans/tables/anything by simply exploding the processed epub, traversing it, and simply deleting any element with class="unwanted" and lang="lang". If anyone wants it as a stopgap until the real improvement is implemented, let me know and I can post it (programming knowledge required).

bookfere · 2023-05-12T13:27:58Z

Here are the regexes I've tried in the HTML element mode

class="processedcode"
class="codeinfo"
class="codeprefix"
class="codeline"
class="about-pb"
<tr>
table
td
^<tr>
^<td
^<pre
</table>$

I also found a bug after reviewing the relevant code. The regexes you provided were unable to catch the table element with class="processedcode" due to the reason I explained earlier. Other regexes such as class="codeinfo", however, were also not working as expected. This was due to the rules not being applied properly, but I quickly fixed the bug. You can try upgrading the plugin to the latest version, v1.3.8, to see if the issue has been resolved.

jtippett changed the title ~~How can I exclude eg. code examples from translation?~~ How can I exclude eg. code examples and/or tables from translation? May 12, 2023

bookfere added the enhancement New feature or request label May 12, 2023

bookfere added the bug Something isn't working label May 12, 2023

bookfere closed this as completed in cd08a6b Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I exclude eg. code examples and/or tables from translation? #41

How can I exclude eg. code examples and/or tables from translation? #41

jtippett commented May 12, 2023 •

edited

Loading

jtippett commented May 12, 2023

bookfere commented May 12, 2023 •

edited

Loading

jtippett commented May 12, 2023

bookfere commented May 12, 2023 •

edited

Loading

How can I exclude eg. code examples and/or tables from translation? #41

How can I exclude eg. code examples and/or tables from translation? #41

Comments

jtippett commented May 12, 2023 • edited Loading

jtippett commented May 12, 2023

bookfere commented May 12, 2023 • edited Loading

jtippett commented May 12, 2023

bookfere commented May 12, 2023 • edited Loading

jtippett commented May 12, 2023 •

edited

Loading

bookfere commented May 12, 2023 •

edited

Loading

bookfere commented May 12, 2023 •

edited

Loading