Door43.org (tX) Development Architecture

#1

Document Status: [Draft|Proposal|Accepted]

Background

This is intended to be a description of the revised architecture for the door43.org website, especially the back end components often referred to as translationConverter (tX). The big picture as shown in the image below is still valid.

A project on the Door43 Content Service (DCS) at https://git.door43.org/unfoldingWord/en_obs would be converted to a page on Door43 at https://door43.org/u/unfoldingWord/en_obs.

Components

There are two the top level components of the conversion system:

graph LR; subgraph Door43 Jobs B[Door43 Enqueue]-->C[rq] C-->D[Door43 Job Handler] end D-->E subgraph tX Jobs E[tX Enqueue]-->F[rq] F-->G[tX Job Handler] end

The system illustrated above is deployed via four discrete docker containers (all of which can be deployed multiple times). The rq library isolates the components and allows for multiple job handler processes to pick up and run jobs.

If we add in the external components and simplify the detail, the end to end system looks like this:

graph LR; A[DCS]-->B[Door43 Jobs] B-->C[tX Jobs] B---D[Redis] C---D B-->E[S3] C-->E

This document will primarily describe Door43 Jobs and the tX Jobs components.

A simplified sequence diagram (omitting tX Jobs internals) looks like this:

sequenceDiagram DCS->>Door43 Enqueue: JSON POST Door43 Enqueue->>Redis: Queue JSON Payload Redis-->>Door43 Enqueue: ACK Door43 Enqueue-->>DCS: 200 OK loop rq library polls Redis->>Door43 Job Handler: Picks up Job end Door43 Job Handler->>tX: Request job from tX tX-->>Door43 Job Handler: 200 OK Door43 Job Handler->>S3: Template and Upload to S3 Note over Door43 Enqueue,Redis: This decouples long running processes

DCS

Code: https://github.com/unfoldingWord-dev/gogs
Service: https://git.door43.org/
Status: Stable

The first component is the Door43 Content Service (DCS) which is the location of our content to be converted. A list of example source content repositories is at https://github.com/unfoldingWord-dev/door43.org/wiki/tX-Conversion-Types .

A webhook configuration on the DCS repositories notifies the door43-enqueue service when a change is made to the repository. This notification is in the form of a JSON payload sent over HTTPS.

Door43 Jobs

The Door43 Jobs component handles interaction with DCS and the customization of content needed for the Door43 website(s). The container architecture for Door43 Jobs looks like this:

graph LR; subgraph Docker Container A[Door43 Enqueue] end A-->B subgraph Redis B[rq Queues] end B-->C subgraph Docker Container C[Door43 Job Handler] end

Door43 Enqueue

Code: https://github.com/unfoldingWord-dev/door43-enqueue-job
Service: https://api.door43.org/client/webhook to start a conversion job
Service callback: https://git.door43.org/client/webhook/tx-callback/ to complete the conversion job
Status: In Development
Stack: Docker, Nginx, Gunicorn, Python 3, Flask API, rq

The door43-enqueue process accepts a JSON payload via a POST request and performs a set of checks on the request before adding it to the Redis queue via the rq package.

The Docker container is automatically built by Travis-CI upon successful completion of the tests – see Docker Deployment Model below. The service is configured to respond to POST requests on /. The master branch listens on port 8000 and the develop branch on port 8001. Nginx is setup as reverse proxy to map the desired service-URL (above) to those ports.

Redis

Code: Hosted service
Service: gogs-sessions.zwea7b.0001.usw2.cache.amazonaws.com:6379
Status: Stable

This is simply the database back end for the queue service. No setup is needed outside of rq package.

The queues used in this component are:

  • Door43_webhook
  • dev-Door43_webhook
  • Door43_webhook_callback
  • dev-Door43_webhook_callback
  • failed

Door43 Job Handler

Code: https://github.com/unfoldingWord-dev/door43-job-handler
Service: Not public, only listens to rq queue(s)
Status: In Development
Stack: Docker, Python 3, rq

The door43-job-handler manages the whole Door43 conversion job. It breaks down into these primary steps:

  1. Preprocesses the data from DCS if needed
  2. Interact with tx to initiate a conversion job and receive the results
  3. Post processes the output files if needed (e.g. styling HTML)
  4. Upload the resultant files to the door43.org/u/ S3 bucket

Note that this component is an HTTP “client” to the tX conversion service.

tX Jobs

tX is a dedicated conversion service with as little Door43 custom code as possible. The intent is that any service can request a job and get mostly vanilla output which it could then customize if desired (e.g. styling HTML output in a specific manner).

The container architecture for tX looks like this:

graph LR; subgraph Docker Container A[tX Enqueue] end A-->B subgraph Redis B[rq Queues] end B-->C subgraph Docker Container C[tX Job Handler] end

The above architecture supports launching multiple Job Handler containers that all subscribe to the job queue, allowing for scaling of the long running conversion services. Unifying all the converters into one worker container also provides even scaling, as opposed to dedicated containers for certain conversion types.

tX Enqueue

Code: https://github.com/unfoldingWord-dev/tx-enqueue-job
Service:
Status: Planning Update
Stack: Docker, Python 3, Flask API or https://github.com/encode/apistar

The enqueue component for tX handles all incoming and outgoing HTTP traffic. This component validates requests and queues them for conversion.

This service monitors the failed jobs queue and retries jobs or reports failures as needed.

Another function of the tX Enqueue service is to monitor the job metadata to notify clients via a callback when jobs are completed. In addition, it updates the job metadata in the S3 bucket where the job information is stored.

Redis

Code: Hosted service
Service: gogs-sessions.zwea7b.0001.usw2.cache.amazonaws.com:6379
Status: Stable

This is simply the database back end for the queue service. No setup is needed outside of rq package.

The queues used in this component are:

  • For starting jobs:
    • High priority queue
    • Medium priority queue (default)
    • Low priority queue
  • A failed job queue

tX Job Handler

Code: https://github.com/unfoldingWord-dev/tx-job-handler
Service: Not public, only listens to rq queue(s)
Status: Planning Update
Stack: Docker, Python 3, rq, converter libraries

The tX Job Handler service contains all the conversion and linter libraries necessary for executing a job. This service listens to the rq queues and runs jobs that the manager service adds. The jobs are ran first from the high priority queue, second from the medium priority queue, and third from the low priority queue.

Conversion modules include:

  • md2html - Converts Markdown to HTML (obs, ta, tn, tw, tq, misc)
  • md2pdf - Converts Markdown to PDF (obs, ta, tn, tw, tq)
  • md2docx - Converts Markdown to DOCX (obs, ta, tn, tw, tq)
  • md2epub - Converts Markdown to ePub (obs, ta, tn, tw, tq)
  • usfm2html - Converts USFM to HTML (Bible)
  • usfm2pdf - Converts USFM to PDF (Bible)
  • usfm2docx - Converts USFM to DOCX (Bible)
  • usfm2epub - Converts USFM to ePub (Bible)

S3

Code: Hosted service
Service: door43.org
Status: Stable

The eventual resting place of the project’s files is on the door43.org S3 bucket, in the /u/ directory. Publicly, these files are served by CloudFront. Assets are stored on the cdn.door43.org bucket.

Door43.org

Code: https://github.com/unfoldingWord-dev/door43.org
Service: https://door43.org/
Status: Stable
Stack: Travis CI, JS, Jekyll

The door43.org website is a static website, meaning there is no server side code behind it. The site is generated with Jekyll, but this is not run on every conversion job, only when site wide JS or template changes need to be made.

Docker Deployment Model

Each Docker container in the above list is tested, built and deployed automatically. This happens for the develop branch and the master branches of each Github repository. The develop branch is deployed into a development environment and the master branch is deployed into a production environment:

graph TB; subgraph Develop Environment A[Github Develop] F[Watchtower Pulls Container] end subgraph Travis CI A-->B B[Unit Tests Pass]-->C[Build Containers] C-->D[Integration Tests Pass] D-->E[Deploy to Docker Hub] E-->F end subgraph Production Environment G-->B G[Github Master] E-->H[Watchtower Pulls Container] end

Sample Flow

The working system uses mostly new (rq queuing/job-handling package using REDIS) code running on an AWS EC2 instance but still using old AWS lambda call for markdown linting (except for very large markdown files). Here is an example of how a DCS conversion proceeds:

  1. A user makes updates to their project on DCS. When they push their new work, a DCS (Gitea) webhook posts a JSON bundle to https://(dev-)git.door43.org/client/webhook/.

  2. The Door43-Enqueue-Job process (running in a Docker container on the AWS EC2 instance) vets the JSON payload, and if it’s all good and consistent, simply places it in a rq queue named (dev-)Door43_webhook.

  3. The Door43-Job-Handler worker (currently just one, running in a separate Docker container on the same EC2 instance) fetches the JSON payload from the queue, and does the following:

    • Sets up a temp folder in the AWS S3 bucket.

    • Gathers details from the JSON payload.

    • Downloads a zip file from the DCS repo to a temp folder and unzips the files,
      and then creates a ResourceContainer (RC) object.

    • Creates a manifest_data dictionary,
      gets a TxManifest from the DB and updates it with the details gleaned above,
      or creates a new one if none existed.

    • It then gets and runs a preprocessor on the files in the temp folder.
      A preprocessor has a ResourceContainer (RC) and source and output folders.
      It copies the file(s) from the RC in the source folder, over to the output folder,
      assembling chunks/chapters if necessary.
      The preprocessors can detect some particular source data errors – if so these will be appended later to the job dictionary so they can eventually be displayed to the user.

    • The preprocessed files are zipped up in a temp folder
      and then uploaded to the pre-convert bucket in S3.
      (README.md will be copied across if no other files are found
      or if that fails, a small explanatory markdown file is generated.)

    • A job dictionary is now created with the important job details
      and stored in a Redis dictionary (keyed by job_id).
      (The former TxJob and TxModule from TxManager are no longer used.)

    • An S3 CDN folder is now named and emptied.

    • A tx_payload dictionary is created, including our callback URL.
      This is then POSTed to tX-enqueue at https://git.door43.org/(dev-)tx/.
      The webhook code is finished once the tX job request is successfully submitted.

    • The given payload will be appended to the ‘failed’ queue
      if an exception is thrown in this webhook code, i.e., if there’s a problem getting the job submitted.

  4. tX-Enqueue listens at https://git.door43.org/(dev-)tx/ and accepts a post request with the following fields:

    • job_id: a unique identifier for the linting/conversion job
    • identifier (optional): further information to identify this job to the Door43 callback
    • user_token: a DCS user token (files in source don’t have to come via DCS, but the tX sub-system can’t just be open to every script-boy)
    • resource_type: the case-sensitive subject field (with underlines not spaces) exactly as defined in https://api.door43.org/v3/subjects
    • input_format: one of ‘md’, ‘tsv’, ‘txt’, or ‘usfm’
    • output_format: ‘html’
    • source: URL for the zipped source file(s) to be downloaded
    • options (optional): converter options
    • callback (optional): the URL for the callback (usually https://git.door43.org/(dev-)client/webhook/tx-callback/)
    • door43_webhook_received at (optional): a datetime string – maybe be unnecessary or disabled in the future

    The requested job is added to the (dev-)tX_webhook queue and tX-Enqueue returns a json dictionary which includes:
    * status: ‘queued’
    * output: The URL where the converted result zip file will be located
    * expires_at: The datetime string for when the above link will expire (1 day later)
    * eta: The datetime string for the estimated time of arrival of the result
    * tx_retry_count: 0

  5. tX-Job-Handler worker(s) accept the jobs one by one from the (dev-)tX_webhook queue. The worker has the code for running all converters/linters and selects the correct functions using the resource_type (subject) field.

    • A build_log dictionary is created
    • The source data zip file is downloaded from the given source URL
    • The correct linter and converter are chosen (by resource_type (subject), input_format, and output_format).
    • The linter is run (markdown linting may still invoke an AWS lambda function)
    • The converter is run
    • The converted results are uploaded in a zip file to the previously advised ‘output’ URL
  6. tx-Job-Handler will POST the log file to the callback URL if given in step #4 above. The posted fields include:

    • Most of the fields (except ‘callback’) echoed from the submission (#4) above
    • lint_module: (name string)
    • linter_success: ‘true’ or ‘false’
    • linter_warnings: (list)
    • convert_module: (name string)
    • converter_success: ‘true’ or ‘false’
    • converter_info: (list)
    • converter_warnings: (list)
    • converter_errors: (list)
    • status: ‘finished’
    • success: ‘true’
    • message: ‘tX job completed’

STEPS 4,5, & 6 above form the basis of the stand-alone tX service. Anyone with a Door43 user token can post files there for linting and conversion, although this will be rate-limited for protection of the service.

  1. Door43-Enqueue-Job will accept the log POSTed at its callbackURL https://git.door43.org/(dev-)client/webhook/tx-callback/

    • The job info is retrieved from Redis and matched & checked
    • The zip file containing the converted file(s) is downloaded
    • Templating is done
    • The results are uploaded to the S3 CDN bucket
    • The final log is uploaded to the S3 CDN bucket
    • The new revision is deployed to the Door43 site.
  2. The separate Watchtower docker container is configured to check for new updates to the above containers becoming available. If an update is detected, it us automatically pulled by Watchtower, the existing container is given a warm shut-down and then deleted, and the new updated container is started,

Stats

We gather stats on system performance to ensure that the service(s) are operating as expected. Secondarily, there may be insights to be gleaned later on about how to optimize the system or what features may be needed.

Door43 Jobs

Here is a list of stats that are being built into the Door43 Jobs side of the system:

gauges.door43.[prod|dev].enqueue-job.[webhook|callback].workers.available
gauges.door43.[prod|dev].enqueue-job.[webhook|callback].queue.length.failed
gauges.door43.[prod|dev].enqueue-job.[webhook|callback].queue.length.current

door43.[prod|dev].enqueue-job.[webhook|callback].posts.attempted
door43.[prod|dev].enqueue-job.[webhook|callback].posts.succeeded
door43.[prod|dev].enqueue-job.[webhook|callback].posts.failed


set.door43.[prod|dev].job-handler.webhook.repo_ids
set.door43.[prod|dev].job-handler.webhook.owner_ids
set.door43.[prod|dev].job-handler.webhook.pusher_ids

door43.[prod|dev].job-handler.[webhook|callback].jobs.attempted
door43.[prod|dev].job-handler.[webhook|callback].jobs.completed

door43.[prod|dev].job-handler.webhook.users.invoked.(username)
~~gauges.door43.[prod|dev].job-handler.user-projects.invoked.(username)/(reponame)~~
gauges.door43.[prod|dev].job-handler.types.invoked.(type/subject)
    Set to 0 for success, 1 for fail.

timers.door43.[prod|dev].job-handler.[webhook|callback|total].job.duration

tX Jobs

Here is a list of stats that are being built into the tX Jobs side of the system:

gauges.tx.[prod|dev].enqueue-job.workers.available
gauges.tx.[prod|dev].enqueue-job.queue.length.failed
gauges.tx.[prod|dev].enqueue-job.queue.length.current

tx.[prod|dev].enqueue-job.posts.attempted
tx.[prod|dev].enqueue-job.posts.succeeded
tx.[prod|dev].enqueue-job.posts.failed

tx.[prod|dev].job-handler.jobs.attempted
tx.[prod|dev].job-handler.callbacks.attempted
tx.[prod|dev].job-handler.jobs.completed

timers.tx.[prod|dev].job-handler.job.duration
0 Likes