translationNotes TSV Format

Update: the official README has been updated with a section on Structure and a section for GL Translators.

Background

The goal of this document is to describe a change to the translationNotes (tN) files that accomplished the following objectives:

  1. Tie tNs to Greek and Hebrew instead of an English text
  2. Make it easier for GL translation teams to translate the tNs

A minor goal was to make the format easier to export and import by various software tools.

TSV Format Overview

A Tab Separated Value (TSV) file is like a Comma Separated Value file except that the tab character is what divides the values instead of a comma. This makes it easier to include prose text in the files because many languages require the use of commas, single quotes, and double quotes in their sentences and paragraphs.

The tNs consist of one file per book of the bible and encoded in TSV format, for example, 01-GEN.tsv. The columns are Book, Chapter, Verse, ID, SupportReference, OrigQuote, Occurrence, GLQuote, and OccurrenceNote.

tN TSV Column Description

The following lists each column with a brief description and example.

  • Book - USFM book code abbreviation (e.g. TIT)
  • Chapter - Chapter number (e.g. 1)
  • Verse - Verse number (e.g. 3)
  • ID - Four character alphanumeric string unique within the verse for the resource, e.g. a8n4
    • This can be helpful in identifying which notes are translations of the original English tNs and which notes have been added by GLs.
    • This would also be a useful way to unambiguously refer to specific notes. An RC link could resolve to a specific note like this: rc://en/tn/help/tit/01/01/a8n4.
  • SupportReference (OPTIONAL) - A link to a supporting reference text
    • This will usually be a link to a translationAcademy article, like rc://*/ta/man/translate/figs-metaphor (where the asterisk tells the processing software to look for the tA article in the same language as is used for these tNs)
  • OrigQuote - Original language quote from the UHB or UGNT, e.g. ἐφανέρωσεν … τὸν λόγον αὐτοῦ
    • Note that the relation fields in the file manifest.yaml indicate the specific versions of those resources that were quoted
    • Software such as translationCore uses this for highlighting rather than using the GLQuote field
    • A Unicode ellipsis character (…) indicates that the quote is discontinuous; software should interpret this in a non-greedy manner
  • Occurrence - Specifies which occurrence in the original language text the entry applies to.
    • -1: entry applies to every occurrence of OrigQuote in the verse
    • 0: entry does not occur in original language (for example, “Connecting Statement:”)
    • 1: entry applies to first occurrence of OrigQuote only
    • 2: entry applies to second occurrence of OrigQuote only
    • etc.
  • GLQuote (OPTIONAL) - Gateway language quote, e.g. he revealed his word
    • Software such as translationCore should not use this field
    • This field is mostly as a reference text for GL translators
    • For certain notes, this field represents the display text for notes that do not relate to a specific word or phrase in the text. For example, “Connecting Statement:” and “General Information:” are used in several instances.
  • OccurrenceNote - The Markdown formatted note itself. For example, Paul speaks of God's message as if it were an object that could be visibly shown to people. AT: "He caused me to understand his message"
    • The text should be Markdown formatted, which means the following are also acceptable:
      • Plaintext - if you have no need for extra markup, just use plain text in this column
      • HTML - if you prefer to use inline HTML for markup, that works because it is supported in Markdown

There is an example of what this could look like available in this tN - TSV Prototype Google Spreadsheet.

GL Translations of this Format

If GL translation teams want to use this format then they could load the files in a spreadsheet editor and all they would need to do is translate the OccurrenceNote column. All of the other columns can remain unmodified.

For publishing, the translated tNs plus the alignment data would be needed.

Feedback

What do you think of this proposal? Does it make sense? Would it be helpful for you or your team if we made these changes? What are the drawbacks of this approach? Please comment below with any thoughts you have.

Do we want to make explicit that the OrigQuote is quoting the UGNT?

Note that I updated Occurrence to have a value possibility of 0, which means that the note does not relate to a specific string in the original language.

I also updated the OccurrenceNote field to indicate that the field should be markdown formatted.