Catalog Next: Populating & Accessing the DCS Resource Catalog

Sections

What is Catalog Next?

Catalog Next is the “next” innovation of how unfoldingWord does its resource catalog.

For the “current” version and implementation of the Door43 catalog (what most apps are “currently” using), v3, publishing one’s resource/repository (repo) to the catalog required staging the repo in the STR org on DCS, then have it approved via a pull request, and then merged into a fork of the repo in the Door43-Catalog org. It required cataloged resources to all be “owned” by one organization, Door43-Catalog, rather than an actual owner’s account or organization, and had to be reviewed, merge and published by someone on the unfoldingWord staff. We also had no way to search DCS repos by language, subject, and other useful fields of our resource manifests, nor a way to easily see in both the UI and repo API endpoints what valid releases versions & stages were available.

Catalog Next, which is has been implemented and put into production, has simply been built in DCS so:

  • anyone can have a quick and easy way to release their resources and version them, including pre-releases and drafts
  • we can leverage the 3 different types of releases Gitea can make: a Release ["prod’], a Pre-Release (“pre-prod”, a check box when making a release), a “Draft” (“draft”, a Save Draft button when making a release), and simply master (“latest”). Those allow us to test different version in an app
  • fields from the manifest are are now searchable and displayed in repo API endpoint results

Adding a Resource (RC v0.2 Repository) to Catalog Next

Adding the master Branch to Catalog Next

stage: latest

  • User makes a commit to the master branch of a user or organization repository, either via the DCS website or locally via git push on the command line
    DCS website:


    git push:

    NOTE: There must be a manifest.yaml file in the root of the master branch that has a valid badge (i.e. is a RC v0.2 manifest file). Example:

  • This triggers a procedure in DCS to process the manifest.yaml file of the master branch, verifying if it is valid. If it is, the master branch is added to the DCS Database in the catalog table as the “latest” stage entry for this resource. If it is not, the “latest” stage entry for this repo is removed (thus not, or no longer, in Catalog Next)

Adding a Release to Catalog Next

stage: prod, pre-prod, draft

  • User makes a release, or updates a pre-prod or draft release of a user or organization repository
    NOTE: These are all standard release types that both Gitea & Github provide. See buttons & checkboxes at the bottom of the form when making a release)
  • This triggers an action a procedure in DCS to process the manifest.yaml file of the given release, validating the manifest file. If it validates, the release is added to the DCS database in the catalog table with the manifest and release info, setting it to the corresponding stage (“prod”, “pre-prod”, or “draft”)

Understanding Release Stages

Production Release (“prod”)

When making a release, say it is good to go and make it a Production Release (prod) by clicking the “Publish Release”. This will point the release the the latest commit of the branch you select in setting up the release via the tag you give the release. Once saved, the Release can’t be can’t be converted to a Pre-Release or Draft. It could be deleted, but that is not advised to do, as an app may have already grabbed this release.

Pre-Release (“preprod”)

Draft (“draft”)

Example of setting up a draft release:

Valid master branch (“latest”)

Validity badges

All of the above stages/releases must have a valid manifest.yaml file, making it a proper resource container (RC v0.2). Its validity, stage and tag name can all be seen in a badge on both the release page by the release and in the repo’s title on the repo page.

Accessing/Using Catalog Next (Users & Apps)

User Access

  • A user can get information on resource entries in the catalog from the DCS website by clicking on the Catalog tab at the top of any page.
    NOTE: Hover over the ? in the search bar for how powerful it is in getting results (defaults to just showing resources with a prod release). You can actually get even the repos with a master branch that has a valid manifest.yaml file by putting stage:latest in the search bar. This is the way users can see what is in the catalog.

App Access

{
  ...
  "language": "en",
  "subject": "TSV Translation Notes",
  "books": [...],
  "title": "unfoldingWord® Translation Notes",
  "checking_level": "3",
  "catalog": {
    "prod": {
      "branch_or_tag_name": "v45",
      "release_url": "https://git.door43.org/api/v1/repos/unfoldingWord/en_tn/releases/10729",
      "released": "2021-04-07T06:55:20Z",
      "zipball_url": "https://git.door43.org/unfoldingWord/en_tn/archive/v45.zip",
      "tarball_url": "https://git.door43.org/unfoldingWord/en_tn/archive/v45.tar.gz"
    },
    "preprod": null,
    "draft": null,
    "latest": {
      "branch_or_tag_name": "master",
      "release_url": null,
      "released": "2021-04-21T13:27:50Z",
      "zipball_url": "https://git.door43.org/unfoldingWord/en_tn/archive/master.zip",
      "tarball_url": "https://git.door43.org/unfoldingWord/en_tn/archive/master.tar.gz"
    }
  }
}
  • An app can use the Catalog Next API to search catalog entries by subject, owner, stage, etc. or get a release entry by owner/repo/tag.

A Note on Stage Querying

Stages, whether you’re querying them in the API or the UI, do not mean you are going to ONLY get the stage you give it, but any resources of the given stage or higher that qualify.

By “higher”, we mean stages go in this order, highest to lowest:

prod, pre-prod, draft, latest

So, for example, if I search for stage:pre-prod this does not mean I want ONLY resources with a pre-prod release, but ALL resources that have a prod or pre-prod release. So in sense you’re saying resources with a prod or pre-prod release are ok.

A resource will only show up in the results as a pre-prod release if the pre-prod release is NEWER than the prod release. If the resource has a pre-prod and a prod release, and the prod release is newer than the pre-prod release, the result will show a prod release.

Use case:

Querying resources this way, by giving the lowest level of stage your app is willing to work with, allows your app to always have the latest release desired. For example, if you want your app to allow drafts and higher, you set the stage argument of your API query to “draft” and then you can be assured to get drafts, pre-prod releases and prod releases.

A Note about Subjects

The subject parameter of a query is used to find resources by their type. This is a string of words separated by spaces. Valid subjects can be found here in the validation schema

Wild Wild West - Precautions & Issues

One things to keep in mind, as you probably already realized, is that ANYONE can publish ANY repo/resource as long as it has a valid manifest.yaml file. This can make things a little crazy and thus you need to take some precautions in your app when using the catalog.

First, as listed below in the Future Work section, we hope to do more validating than just the manifest.yaml file. Once our content validation app is run as a service and has an API endpoint that takes a repo tag or commit URL and validates all the content of that resource, that will be ran when determining if a resource can be added to the catalog on any of the stages.

Until then, it is necessary that you take precaution to what resources you app uses as a source. You can filter by owner, such as only unfoldingWord, ru_gl, hi_gl, etc. Or you let the user pick the source with them knowing they have free reign in picking the sources they use.

Technical Info & Links

Future Work

Still to add to Catalog Next are the following processes and properties:

  • Use the Content Validation App to validate all resources, rather than just validating the manifest.yaml file
  • Make the manifest.json file (or scripture burrito metadata when implemented) editable in the repo’s settings under its own page/form, editing in a JSON Schema Form via https://github.com/rjsf-team/react-jsonschema-form/
  • Sign the release tarballs and zipballs
  • Have PDF URLs for the release (also signed)
  • Use Scripture Burrito instead of RC v0.2
  • Be able to filter Bibles by literal, simplified or other (requires Scripture Burrito)

Here is my contribution to the “how do I” information…

Q. Given an org/owner, how do discover all languages used by that org?
A. Given translate_test as the org, the languages used are in the repo_languages property in the JSON returned from this API: https://qa.door43.org/api/v1/orgs/translate_test (this is for QA DCS). The returned value is something like this:

{
   "id":24384,
   "username":"translate_test",
   "full_name":"",
   "avatar_url":"https://qa.door43.org/user/avatar/translate_test/-1",
   "description":"",
   "website":"",
   "location":"",
   "visibility":"public",
   "repo_admin_change_team_access":false,
   "repo_languages":[
      "en",
      "es",
      "es-419",
      "hi",
      "kn",
      "ru",
      "vi"
   ],
   "repo_subjects":[
      "Aligned Bible",
      "OBS Study Notes",
      "OBS Study Questions",
      "OBS Translation Notes",
      "OBS Translation Questions",
      "Open Bible Stories",
      "Translation Academy",
      "Translation Notes",
      "Translation Questions",
      "Translation Words",
      "TSV Translation Notes"
   ]
}

Notes
The languages shown are a combination of both content in the manifest and the language code used in the org’s repo names. This means that if a repo does not yet have a manifest for say, ru_gst, the API call will still report “ru” as a language.

1 Like

Scenario 1: brand new repo, say tlh_tn. Until a manifest is created by hand, it will not show in Catalog Next (CN)… correct?

Scenario 2: Let’s say the manifest is created, but isn’t “valid” (see next question). Then it will not show in CN… correct?

Scenario 3: Let’s say the manifest is corrected. So this means a commit and push/merge into master. What process runs to apply the valid badge? And what is checked to make it valid?

Scenario 4: Assuming all changes to the manifest are made by hand (is this true?), then what happens if an update to the manifest makes it invalid? Will only “latest” drop out of CN? In other words, will the other stages remain in CN?

@birch I’m thinking this part would be in the future admin app?

Scenario 1:

Correct. The Catalog Next is not about finding repos on DCS, as that is what the API v1 is for. Catalog Next, just like Catalog v3 (i.e. repos in the Door43-Catalog org), is about what resources have been completed and can be used for either source text or printable material. Usually you’d just be using “prod” stage, but if you want to make sure the app can handle the newest version of the content, such as for testing a new TSV format, you could set the stage it pulls from the catalog to “pre-prod” (a hidden release) or “draft” (a work in progress). Or “latest” if you are really adventurous.

So in the admin tool, to find projects that have started, you wouldn’t use Catalog Next, you’d just use the repo and org API endpoints as you probably have been doing. And use the admin tool to make releases.

Scenario 2:

Correct. If this invalid manifest.yaml file is in the master branch, there will be no “latest” stage of this resource in the catalog.

Scenario 3:

This process is ran in the Go code of DCS: dcs/door43.go at 926f9e515780dbc301cda58cd9017ffffdf04f62 · unfoldingWord/dcs · GitHub

The RC v0.2 schema is used to check it: rc-schema/rc.schema.json at master · unfoldingWord/rc-schema · GitHub

Scenario 4:

Yes, this “latest” stage will no longer exist in the catalog database table, thus the master branch will NOT show up in a query of stage=latest. However, as mentioned in the article in “A Note On Stage Querying”, if this resource/repo has a valid prod, pre-prod, or draft release, it will show up in a Catalog Next query of stage=latest as that just means the lowest stage you want is “latest”.

Very good to have this. I wasn’t sure if this article was where this should be or not but we don’t have anywhere else so great to have it here.

It is kind of hard to divide what is simply the results of “Door43 Metadata” processing (that is actually what my Go mode, module, and database table are called in handling the process of the manifest file as you can see from link in my other reply) and where Catalog Next comes in (this is more the API endpoints and the processing of the Door43 Metadata to determine what to show in the catalog due to the search query).

So having the manifest properties and values now in repo and org queries and results, I wasn’t sure if that really is Catalog Next, but or just seen as an add-on to the Gitea API features to make it DCS-specific.

Mentioning this as I do want us to make sure there is a fine line between how we have what is called Door43 Metadata and what we call Catalog Next. With that said, can’t separate them in the code, as if the Door43 Metadata wasn’t processed on each commit or release of a repo, Catalog Next would have nothing to go by.