Overview
This document briefly describes our current catalog/API configuration and offers a brief proposal for a new configuration that aims to simplify usage for diverse development needs.
Context
Documentation for our current APIs is at https://api-info.readthedocs.io/
Fun fact, one of my first public contributions to uW in 2013 was turning the DokuWiki formated OBS into JSON for consumption by mobile apps and PDF generators.
Current API Setup
Currently, we have 2 active endpoints as it relates to translation work:
- Door43 Resource Catalog (v3)–A read only catalog described at Door43 Resource Catalog (v3) — uW and Door43 API Information 0.1 documentation
- Door43 Content Service API–A read/write API described at Door43 Content Service (DCS) — uW and Door43 API Information 0.1 documentation
Proposed Changes
In a nutshell, the proposal is to merge the two endpoints above into a single API which provides access to all the data we intend to serve.
API
REST
The functionality provided by the Door43 Resource Catalog (v3) will be merged into the Door43 Content Service REST API.
Of course, one big advantage here is that developers only need to know about and code for a single endpoint. Ideally, we can provide a higher level component library that goes even further in easing implementation (e.g. Gitea React Toolkit).
The key here is that the API provides access to 100% of the necessary information.
GraphQL
We also intend to add a GraphQL layer which will provide access to 100% of the information in the REST API, but in a more configurable manner. We may use one of these GraphQL Go libraries for that.
Database and API Diagram
Staging
We are planning to provide three stages in the new catalog.
- Default/Normal Stage: The normal stage is production which will serve resources matching the Minimum Requirements listed below.
- Pre-Release Stage: Subject to the same Minimum Requirements except that any Release marked as Pre-Release is included.
- Experimental Stage: Serves the default branch (usually
master
) of the repository.
This configuration will allow consistency checks to be made between multiple resources before they are marked for release.
Catalog Features
Following the above proposal means that we need clear ways to replicate the features of the preexisting Catalog, namely:
- Publishing workflow
- Versioning of resources
- Digital signing of resources
- Metadata for resource
- Pivoting/Querying for resources based on metadata
We’ll look at each of these features below.
Publishing Workflow
Previous Workflow
The overall workflow for publishing a resource will change significantly. The v3 catalog requires that a Source Text Request form be filled out which starts a set of steps that looks like this:
- User fills out Source Text Request form
- Verify license agreements
- Fork/copy the data into the STR organization
- Massage data and metadata to meet publishing standards
- Move repository to Door43-Catalog organization
- Fork project back into STR organization so that future updates can be staged easily
Minimum Requirements
What is absolutely necessary to publish a resource is that it meets these requirements:
- The metadata validates against the Scripture Burrito schema.
- A valid Release version tag is in the repository (this needs further specified, is it simply
v1
orv1.3
that counts, or should be enforce something likecatalog_v1
?)
Effectively, anyone could make the above happen if they had the technical know-how. Of course, we are anticipating that most people will need some expertise to make those two things happen. So, keep reading…
Interim Workflow
The data and a metadata manipulation will still need to happen, but we’ll no longer need the approval process nor the moving of repositories out of their original locations. Instead, a process like the following will (typically) occur:
- User fills out Publishing Aid form
- A tech/developer forks the repository and massages data and metadata to meet publishing standards
- A PR is issued against original repository
- User merges PR
- User creates a Release in DCS with an appropriate version tag
At that point the resource will published.
Ideal Workflow
Ideally, tC will actually manage this publishing process. This will require that we will have a Scripture Burrito aware React Component library that is capable of reading and writing SB metadata. Naturally, the library will need to be able to migrate Resource Container and previous metadata to SB compliant metadata.
At that point, tC (or any tool) can “mix in” in the SB aware Component Library and the DCS Toolkit component library and it will then be capable of,
- writing valid SB metadata, and
- creating a valid Release in DCS,
which will automagically publish the resource.
Building this functionality as Component Libraries means that publishing no longer has to be an extraneous step that only a few can master. Instead, anyone, using many apps, can publish their resource with a few clicks!
Versioning
Scripture Burrito Metadata
Scripture Burrito metadata will replace our previous Resource Container metadata (and previous tS, tC, uW app metadata that is not app specific). There are at least two steps to making this a reality:
- We need to add a Scripture Burrito aware API to DCS (see following section)
- We need to create Scripture Burrito aware React Component library that all of our tools can use to read and write SB metadata.
The proposal is that our applications manage the SB metadata directly and DCS merely processes and presents the projects in the API.
Querying for Resources
The DCS API will create new catalog endpoints as needed to fill the needs of software applications looking for published content (ironically, this might be similar to Github’s new package registry). This means that we’ll be able to replicate something similar to the preexisting v3 Catalog endpoints.
In addition to the existing DCS API endpoints, we should consider something like the following:
/catalog/
- GET–returns all valid SB projects
/catalog/search
- GET–returns valid SB projects according to search criteria (likely similar to repos search criteria). However, would be ideal to support searching based on SB metadata fields too (e.g Flavor, FlavorType, etc).
/catalog/owners/{owner}
- GET–returns all valid SB projects for specified
{owner}
(either DCS user or organization)
We also need to keep in mind that there may be some jointly created code for a Scripture Burrito Aware API, which might be exactly what we are looking for. Depending on how and when this code is written, we may be able to “bolt” this on to the side of DCS, possibly integrated via default webhooks or git hooks.
Git Related Workflow Notes
Much of this is covered above, but here is a bullet point list of some things that may be different than preexisting conventions:
- Repositories will remain under the control of the User or Organization that created/owns it.
- The Door43-Catalog Organization will be replaced. Projects that need “generic” hosting may persist in this organization or a similarly named one.
- A release/version will always be a tag
- Branches may be used for work in progress (as in Protected Branch Workflow)
- A translation project is always a hard fork (copy) of a version tag (e.g. a translation will never be merged to it’s upstream since it is a different language, so don’t attempt to preserve the fork upstream in the database relation)
- Create new repo → es_tn
- Use latest tag for source content (en_tn)
- Create a new branch from latest version tag
-
master
branch will be blank - …translation happens…
- … Create release with a git tag
- Tag points at master branch
- (old RC idea, needs updated for SB) Under source, add location to each entry in manifest.yaml:
-
- Create new repo → es_tn
Interesting Stuff
Follow git flow loosely: A successful Git branching model » nvie.com ?
Continue using our versioning schema: Versioning — unfoldingWord
Future Work
Signing
The current catalogs provide cryptographic signing of all content presented. This is great and will be difficult to replicate in DCS. However, it seems to make more sense to recreate the same sort of functionality in git and Scripture Burrito aware methods. There are two specific needs that must be addressed here:
Need 1: Verify content hasn’t changed during transit
Essentially, this amounts to checksumming. Both DCS and SB have methods of identifying files and checksums for those files.
- Pretty easily accomplished using SHAs from DCS and locally computed versions on a per file basis
- Possibly to download versioned tag zip/tar.gz AND the tree from the API which provides the SHAs for all the files – both from same TAG
- Solves for happy path
- Doesn’t solve bad actor path where someone unpacks and repacks zip and tree
- Solves for happy path
Need 2: Verify that org I love has signed my content
- Signing tags/commits
- Create DCS api mechanism allows you to sign a tag/release
- Signature for each one
- Public key for each org/user for verification purposes (already built?)
- Possibly create an easy to use method for storing private keys for users in DCS / also recommend something like Keybase
- Possibly use something like https://www.zdnet.com/article/googles-web-packaging-standard-arises-as-a-new-tool-for-privacy-enthusiasts/ ?
Implementation Plan (Rough)
List of things that we’ll need to do to move in this direction:
- Write scripture_burrito.go in the DCS models directory (this will define the database table(s) for all the SB metadata that we need to store)
- We need a translation type scale identifier field for how to use the resource (e.g. ULT vs. UST) (translation strategy, form centric, or meaning centric)
- Maybe same as above, but migrate
subject
field, possibly maps ontoflavor
- Probably add a Scripture Burrito tab to Repo page to provide view/edit access to metadata
- Maybe add SB export button on this page
- Include a SB validation badge on this page
- Add REST API endpoints for SB information
- Code to migrate from RC metadata to SB database
- Undecided: do we save SB metadata in the repo? Jesse leans toward no.
- Add SB download option to the Download button
- Add a catalog badge to each valid Release entry
- Add a graphql layer to DCS, direct access to DB
- Figure out what do with OLD catalog endpoints and get rid of LAMBDA functions