How to Prepare a Translation Words Project for Publishing

This article is one in what may be a series of articles describing how to get GL projects ready to publish so that others can benefit from them. This article deals with Translation Words (tW) projects. A tW project consists of about 1000 files, each defining one term, with variants, definition, facts, references, translation suggestions, etc. Together, they constitute a basic Bible lexicon. When published, these will help translators make the best possible translation decisions. Here is how to publish on Door43.

Prerequisites

  • The tW files are in markdown format, and named with .md file extension.
  • The tW files are organized in a directory structure with a bible folder at the top level and three folders underneath: kt, names, and other.
  • The reader knows how to copy files between Door43 and a file system on a computer.

tW files uploaded directly from tS will not be in markdown format. In that case, extra conversion steps (outside the scope of this article) are required to achieve these prerequisites.

Verify the .md files, and make corrections

This section represents the main work of preparing a tW project for publication. You should perform as many of these steps as you reasonably can, in advance of submitting the project for publication.

  • Count approximately 1,020 .md files in the three folders.
  • No empty .md files.
  • Each .md file must begin with a valid level 1 <h1> heading. A valid <h1> heading starts with a single hash symbol at the beginning of the first line, followed by a space, followed by the word or words which are the subject of that file.
  • The second line in the file must be blank.
  • The third line must be a valid level 2 <h2> heading. It must start with two hash symbols at the beginning of the line, followed by a space, followed by some word like “Definition”.
  • Headings should not contain any explicit markdown formatting, like bold or italic (**, __, *, or _).
  • The .md file must use UTF-8 character encoding, with no Byte Order Mark (BOM).
  • Blank lines before and efter each <h2> heading.
  • No gratuitous headings. tW doesn’t need any <h3> or higher level headings.
  • Uses asterisks, not any other character, for unordered list items.
  • Asterisks marking list items should be placed at the beginning of a line. Exception: multi-level lists, which are rare in tW.
  • A space character must follow the asterisk marking a list item.
  • Ordered list items begin with a number, followed by a period, followed by a space character.
  • Blank line before the first item in a list.
  • Blank line after the last item in a list – marks the end of the list.
  • References to tA articles are properly formed, for example: [How to Translate Names](rc://plt/ta/man/translate/translate-names)
  • All such references must contain the correct language code (“plt” in the above example).
  • References to other words are properly formed, for example [Gad](../names/gad.md)
  • Verify every http or https URL reference.
  • No references of any kind in headings.
  • No HTML code, such as comments <!-- -->, <b>, <br>, and &nbsp;

There are scripts that can do almost all of the above checks or corrections. See https://github.com/unfoldingWord-dev/tools/tree/develop/usfm . The ones listed below will be the most useful for tW projects. Of course, to use them you must adapt the scripts to your own computing environment.

  • tatw_md2md.py
  • tw-addH2headers.py
  • verifyMd.py

Verify the manifest.yaml file and make corrections

  • Borrow a known good manifest.yaml file from another project as a template, but review every line in it.
  • Follow the specifications in Manifest File — Resource Container 0.2 documentation .
  • Must use UTF-8 character encoding, with no BOM.
  • Copy contributor names from the possible manifest.json file, and any other source of names that you have.
  • Ensure quotes around version number strings.
  • Update the issued and modified dates when any content changes.
  • Modify only the modified date if just metadata (manifest) changes. If it is just a cosmetic change of no value to end users, do not even modify the modified date.
  • Increment the version string whenever the issued date changes.
  • The language | title should be localized if possible.
  • The subject field must say “Translation Words”.
  • With the exception of the English resources, the publisher field should never say “unfoldingWord”.
  • The projects section should have only one entry, looking like this:

-
categories:
identifier: ‘bible’
path: ‘./bible’
sort: 0
title: ‘translationWords’
versification:

  • Check manifest.yaml using with the verifyManifest.py script. That script reports errors and potential errors on stderr and stdout. If the manifest is not valid YAML, the script should crash.

Upload to Door43

Create a repository in Door43. The repository name should include the language code, an underscore, and “tW”. The name may also include other identifying information, such as the checking level. Upload your tW directory structure and files to this repository.

As a final verification step, check the rendering on Door43 by using the Preview button on the repository where you stored the tW project. Look for general appearance, indexing, and read the warnings that were generated. Address whatever doesn’t look right.

Submit a Source Text Request (STR)

Notify the UnfoldingWord team to publish the material by creating a Source Text Request (STR) form: Sign In - Door43 Content Service . Once your form is submitted, the unfoldingWord team will verify the license release forms, and will perform all the checks and corrections described above. Any issues requiring translator intervention will be noted in the STR, and will block publication until resolved. Check back often on your STRs to monitor their progress toward publication.

1 Like

@lversaw This is an excellent write up, thank you! I especially like verifyMd.py, I may have to give that script a try–it might be good to compare that with what @RobH has in the tX linting system.

One note regarding:

  • With the exception of the English resources, the publisher field should say “Door43”.

That is true only if the requester hasn’t entered anything else for the publisher field. In the STR form the requestor can insert who they want listed as the publisher in the case that it is another organization.

Thanks again, this is really helpful!

Yes, looks very helpful @lversaw. A link to Manifest File — Resource Container 0.2 documentation might also be nice.

The linter in the tX code is far more generic than this specific checking. I think it would be good to include some of the verifyMD.py code for tW repos @jag3773.

One thing I have hit with people’s .yaml manifests is whether or not dates are put in quotes? I don’t think Date — Resource Container 0.2 documentation is clear. Certainly my top link shows them as strings. But for processing, a date object is quite different from a string that happens to contain a date. Would be nice to clarify that both here and in the RC documentation (and maybe to check in some of the above-mentioned tools).

1 Like