USFM Information

USFM Documentation

Categories of Tags

There are 3 broad categories of USFM tags, “paragraph styles”, “character styles” and “note styles.”

  1. Paragraph styles are what the name suggests - when it is printed there will be a line break at the end. These tags cannot be nested inside other tags. A paragraph style tag always closes the previous paragraph style block.
  2. Character styles are inline elements that are contained within a paragraph style block. Unlike paragraph style tags, character style tags can be nested inside other tags. When this occurs, a closing tag is required. The closing tag is formed by appending an asterisk to the opening tag, for example, the \em tag is closed by \em*.
  3. Note styles are used to identify and markup extra-biblical material like footnotes and tables.

Opening tags are always followed by a single white space that is not considered to be part of the text of scripture. This can be a new line or a space character.

Closing tags do not need to be followed by white space.

Paratext implementation

The most obvious deviation from the published USFM standard (version 2.4) is Paratext does not require ending or closing markers in many situations. For paragraph styles (\p, \q etc) closing tags are never used since paragraphs cannot be nested; a paragraph style marker automatically closes the previous paragraph block.

However, closing markers are required for most character styles. Since they can span paragraphs, character markers need to have an explicit closer. The 3.0 documentation has made this more clear.

Note styles are a little different. The opening note tag requires a closing tag, but any tags inside it do not because they are automatically closed by another opening tag, or by the closing tag for the whole note block. For example:

\f + \fr 5.12: \fk make me clean: \ft This disease was considered to make a person ritually unclean.\f*

The most notable exception to these rules is the verse marker, \v. Even though the verse marker is a character style, there is no closing marker. A verse is closed by the next verse marker, a chapter marker, or the end of the book.

I consider the c and v tags metadata tags, not paragraph or open only character styles.

These tags DO indicate the boundary of ‘database’ fields. That is, In xml, they DO properly indicate the presense of the div markers. That is, the placement and location of c and v are 100% the boundaries of the chapter and verses.

These tags DO NOT represent with any degree of reliability exactly where or what text should appear in the final markup seen by the user. The c tag is grossly misplaced before all the section head information, and may not appear at all if a \cp \ca or only \c 1 is in the text. The \v tag also may be out of place and may not appear if a \vac \va or \vp tag follows it.

Any rendering engine already deals with this, I find the logic is liberated a bit to stop trying to make the \c and \v into a paragraph or character then move it around or remove it if needed. It’s easier to make it metadata and then provide the metadata as visible if needed when no alternate is provided.

I agree @Michael_H . I think USFM refers to the \c marker as a paragraph type of marker, but they don’t intend it to be a semantic paragraph marker, nor even that there will be any sort of visual distinction associated with it. The language is confusing!