Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code changes and documentation for end-to-end demo #52

Merged
merged 41 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
21b51e5
Fixed JDP file suffix trimmer.
jeff-cohere Jan 31, 2024
6b06146
Minor comment fix.
jeff-cohere Feb 1, 2024
37f5a14
Added a "debug" flag for the service.
jeff-cohere Feb 1, 2024
dd4cc8f
Improved logging and error checking for prototype.
jeff-cohere Feb 1, 2024
7b515c3
Removing an extraneous log message.
jeff-cohere Feb 1, 2024
730587f
Check for staged files before ordering staging.
jeff-cohere Feb 6, 2024
a601628
Troubleshooting check for staged files.
jeff-cohere Feb 6, 2024
32fe668
Improved error handling for checking staged files on Globus endpoints.
jeff-cohere Feb 6, 2024
0ce6f03
Implemented Globus re-authentication flow to handle consent/scope iss…
jeff-cohere Feb 6, 2024
207297e
Errors in tasks now result in failure. Improved error handling.
jeff-cohere Feb 7, 2024
500b6dd
Hardening the recording and playback of JAMO queries.
jeff-cohere Feb 7, 2024
948c438
Adding more error handling for file staging, etc.
jeff-cohere Feb 19, 2024
025e5fe
Endpoint root is "/" by default.
jeff-cohere Feb 20, 2024
e8cc0b1
Fixing an oversight in Globus transfer submissions.
jeff-cohere Feb 26, 2024
1b83bad
Improved error handling for active Globus transfers.
jeff-cohere Feb 26, 2024
c702a8b
Destination folder is now set properly for transfers.
jeff-cohere Feb 27, 2024
6ade1b2
Added a basic static website setup for documentation.
jeff-cohere Feb 27, 2024
222634d
Filling out documentation structure a bit.
jeff-cohere Feb 27, 2024
80bbacf
Fixing the user for the manifest file.
jeff-cohere Feb 27, 2024
45d204a
More error handling improvements.
jeff-cohere Feb 27, 2024
e9b5733
More doc updates, including deployment instructions.
jeff-cohere Feb 27, 2024
e497b1f
Updating .gitignore
jeff-cohere Feb 27, 2024
bb36d51
Reversing a logging decision.
jeff-cohere Feb 28, 2024
d90f8bb
Some adjustments to Dockerfile and related documentation.
jeff-cohere Feb 28, 2024
e487b98
Commiting missing Docker deployment config file.
jeff-cohere Feb 28, 2024
69c789f
Updated deployment materials and instructions.
jeff-cohere Feb 28, 2024
a7eb1b9
Fixed some glitches in deployment config file.
jeff-cohere Feb 28, 2024
d8f16e4
Fixed another config file glitch.
jeff-cohere Feb 28, 2024
8bbb9ba
Adjusting the location of kbase local user mapping file.
jeff-cohere Feb 28, 2024
15c2c8c
Adding more descriptive manifest-related error messages.
jeff-cohere Feb 29, 2024
d2e05dc
Reformatting.
jeff-cohere Feb 29, 2024
99da68a
Separating manifest directory from data directory.
jeff-cohere Feb 29, 2024
c660296
Some minor fixes to transfer/manifest logic.
jeff-cohere Feb 29, 2024
50fe4cb
More documentation.
jeff-cohere Mar 1, 2024
afcdf49
Added a word of caution about MANIFEST_DIRECTORY.
jeff-cohere Mar 5, 2024
a61a621
Fixing some test failures.
jeff-cohere Mar 5, 2024
9358f9b
Fixed a hanging test.
jeff-cohere Mar 5, 2024
058e771
Relaxed a testing constraint.
jeff-cohere Mar 5, 2024
5946d0c
Consolating a set of Docker RUN commands.
jeff-cohere Mar 7, 2024
a53a7aa
Updating documentation regarding KBase developer tokens (and user fed…
jeff-cohere Mar 7, 2024
fe4d930
Scrubbed biological matter from data (ew).
jeff-cohere Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/gh-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Build and deploy gh-pages branch with Mkdocs

on:
# Runs every time main branch is updated
push:
branches: ["main"]
# Runs every time a PR is open against main
pull_request:
branches: ["main"]
workflow_dispatch:

concurrency:
# Prevent 2+ copies of this workflow from running concurrently
group: dts-docs-action

jobs:
Build-and-Deploy-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
show-progress: false
fetch-depth: 0 # Needed, or else gh-pages won't be fetched, and push rejected
submodules: false # speeds up clone and not building anything in submodules
- name: Show action trigger
run: echo "= The job was automatically triggered by a ${{github.event_name}} event."
- name: Set up Python 3.10
uses: actions/[email protected]
with:
python-version: "3.10"
- name: Install python deps
run: python3 -m pip install mkdocs-material pymdown-extensions mkdocs-monorepo-plugin mdutils
# build every time (PR or push to main)
- name: Build
run: mkdocs build --strict --verbose
# deploy only when it is a push
- if: ${{ github.event_name == 'push' }}
name: GitHub Pages action
uses: JamesIves/github-pages-deploy-action@v4
with:
# Do not remove existing pr-preview pages
clean-exclude: pr-preview
folder: ./site/
# If it's a PR from within the same repo, deploy to a preview page
# For security reasons, PRs from forks cannot write into gh-pages for now
- if: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository }}
name: Preview docs
uses: rossjrw/pr-preview-action@v1
with:
source-dir: ./site/
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.DS_Store
dts
dts.yaml

site/
data/
23 changes: 19 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,24 @@ to do:
capabilities
* `DTS_JDP_SECRET`: a string containing a shared secret that allows the DTS to
authenticate with the JGI Data Portal
* `DTS_ON_LBL_VPN`: set this environment variable to any value (e.g. "1") to
indicate that the DTS is running on Lawrence Berkeley Lab's Virtual Private
Network. This enables the DTS to get information about files from JAMO that
are not available from the JGI Data Portal itself.

### Recording JAMO queries for testing in GitHub's CI environment

Currently, the JGI Data Portal does not provide a way to retrieve detailed
file information, so the DTS uses the JAMO service instead. This service is
only available from within LBNL's virtual private network, so the DTS provides
a way to "record" JAMO queries when it's run within this network. These recorded
queries can then be automatically played back in any testing environment in
which JAMO is unavailable.

To record the JAMO queries needed by the testing environment, run the unit
tests for the JDP database with the `-record-jamo` argument:

```
go test ./databases/jdp/... -args -record-jamo
```

This places one or more "cassette" files in the `databases/jdp/fixtures` folder,
where they can be accessed by the testing system. Make sure to commit this
folder to the repository after recording the JAMO queries. You should also
delete any old fixture replaced by a new one.
16 changes: 14 additions & 2 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,15 @@ type serviceConfig struct {
// (for generating and transferring manifests)
Endpoint string `json:"endpoint" yaml:"endpoint"`
// name of existing directory in which DTS can store persistent data
// default: none (persistent storage disabled)
DataDirectory string `json:"data_dir,omitempty" yaml:"data_dir,omitempty"`
DataDirectory string `json:"data_dir" yaml:"data_dir,omitempty"`
// name of existing directory in which DTS writes manifest files (must be
// visible to endpoints)
ManifestDirectory string `json:"manifest_dir" yaml:"manifest_dir"`
// time after which information about a completed transfer is deleted (seconds)
// default: 7 days
DeleteAfter int `json:"delete_after" yaml:"delete_after"`
// flag indicating whether debug logging and other tools are enabled
Debug bool `json:"debug" yaml:"debug"`
}

// global config variables
Expand Down Expand Up @@ -86,7 +90,15 @@ func readConfig(bytes []byte) error {

// copy the config data into place, performing any needed conversions
Service = conf.Service

Endpoints = conf.Endpoints
for name, endpoint := range Endpoints {
if endpoint.Root == "" {
endpoint.Root = "/"
Endpoints[name] = endpoint
}
}

Databases = conf.Databases
MessageQueues = conf.MessageQueues

Expand Down
16 changes: 10 additions & 6 deletions databases/jdp/database.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ var suffixToFormat = map[string]string{
"fasta.gz": "fasta",
"fastq": "fastq",
"fastq.gz": "fastq",
"fna": "fna",
"fna": "fasta",
"gff": "gff",
"gff3": "gff3",
"gz": "gz",
Expand Down Expand Up @@ -120,13 +120,17 @@ func formatFromFileName(fileName string) string {
}
}

// determine whether the file matches any of the supported suffixes
// determine whether the file matches any of the supported suffixes,
// selecting the longest matching suffix
format := "unknown"
longestSuffix := 0
for _, suffix := range supportedSuffixes {
if strings.HasSuffix(fileName, suffix) {
return suffixToFormat[suffix]
if strings.HasSuffix(fileName, suffix) && len(suffix) > longestSuffix {
format = suffixToFormat[suffix]
longestSuffix = len(suffix)
}
}
return "unknown"
return format
}

// extracts the file format from the name and type of the file
Expand Down Expand Up @@ -219,7 +223,7 @@ func creditFromIdAndMetadata(id string, md Metadata) credit.CreditMetadata {

func trimFileSuffix(filename string) string {
for _, suffix := range supportedSuffixes {
trimmedFilename, trimmed := strings.CutSuffix(filename, suffix)
trimmedFilename, trimmed := strings.CutSuffix(filename, "."+suffix)
if trimmed {
return trimmedFilename
}
Expand Down
6 changes: 6 additions & 0 deletions databases/jdp/database_test.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package jdp

import (
"flag"
"os"
"testing"

Expand Down Expand Up @@ -39,6 +40,11 @@ func setup() {
config.Init([]byte(jdpConfig))
databases.RegisterDatabase("jdp", NewDatabase)
endpoints.RegisterEndpointProvider("globus", globus.NewEndpoint)

// check for a "record-jamo" flag and stash the result in the recordJamo
// global package variable
flag.BoolVar(&recordJamo, "record-jamo", false, "records JAMO test queries for use in CI system")
flag.Parse()
}

// this function gets called after all tests have been run
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
version: 2
interactions:
- id: 0
request:
proto: HTTP/1.1
proto_major: 1
proto_minor: 1
content_length: 389
transfer_encoding: []
trailer: {}
host: jamo-dev.jgi.doe.gov
remote_addr: ""
request_uri: ""
body: '{"query":"select _id, file_name, file_path, metadata.file_format, file_size, md5_sum where _id in ( 57f9e03f7ded5e3135bc069e, 57f9d2b57ded5e3135bc0612, 57f9bcb77ded5e3135bc05a5, 584486367ded5e2d305c9a76, 582511047ded5e2d305af605, 582511047ded5e2d305af606, 57f7320c7ded5e3135bbd14d, 584486377ded5e2d305c9a77, 584486397ded5e2d305c9a7a, 582511037ded5e2d305af603 )","requestor":"[email protected]"}'
form: {}
headers:
Content-Type:
- application/json; charset=utf-8
url: https://jamo-dev.jgi.doe.gov/api/metadata/pagequery
method: POST
response:
proto: HTTP/2.0
proto_major: 2
proto_minor: 0
transfer_encoding: []
trailer: {}
content_length: -1
uncompressed: true
body: '{"start": 1, "end": 10, "cursor_id": "L5ZHYRZFNS", "record_count": 10, "records": [{"_id": "57f7320c7ded5e3135bbd14d", "file_name": "10914.1.183618.CTCTCTA-TACTCCT.fastq.gz", "file_size": 2113188802, "file_path": "/global/dna/dm_archive/sdm/illumina/01/09/14", "metadata": {}}, {"_id": "57f9bcb77ded5e3135bc05a5", "file_name": "10927.1.183804.CTCTCTA-AGGCTTA.fastq.gz", "file_size": 1966895000, "file_path": "/global/dna/dm_archive/sdm/illumina/01/09/27", "metadata": {}}, {"_id": "57f9d2b57ded5e3135bc0612", "file_name": "10927.1.183804.CTCTCTA-AGGCTTA.filter-SAG.fastq.gz", "file_size": 2225019092, "file_path": "/global/dna/dm_archive/rqc/filtered_seq_unit/00/01/09/27", "metadata": {}}, {"_id": "57f9e03f7ded5e3135bc069e", "file_name": "10927.1.183804.CTCTCTA-AGGCTTA.QC.pdf", "metadata": {}, "file_size": 227745, "file_path": "/global/dna/dm_archive/rqc"}, {"_id": "582511037ded5e2d305af603", "file_name": "sag_decontam_output_clean.fna", "file_size": 405150, "file_path": "/global/dna/dm_archive/qaqc/analyses/AUTO-33725", "metadata": {"file_format": "fasta"}}, {"_id": "582511047ded5e2d305af605", "file_name": "sag-BHAPS-screen.txt", "file_size": 4080, "file_path": "/global/dna/dm_archive/qaqc/analyses/AUTO-33725", "metadata": {"file_format": "txt"}}, {"_id": "582511047ded5e2d305af606", "file_name": "10914.1.183618.CTCTCTA-TACTCCT.filter-SAG.norm.subsample.fastq.gz", "file_size": 73043045, "file_path": "/global/dna/dm_archive/qaqc/analyses/AUTO-33725", "metadata": {"file_format": "fastq"}}, {"_id": "584486367ded5e2d305c9a76", "file_name": "101345.assembled.faa", "file_size": 131265, "file_path": "/global/dna/dm_archive/img/submissions/101345", "metadata": {}}, {"_id": "584486377ded5e2d305c9a77", "file_name": "101345.assembled.names_map", "file_size": 1470, "file_path": "/global/dna/dm_archive/img/submissions/101345", "metadata": {}}, {"_id": "584486397ded5e2d305c9a7a", "file_name": "101345.pipeline_version.info", "file_size": 673, "file_path": "/global/dna/dm_archive/img/submissions/101345", "metadata": {}}], "fields": ["_id", "file_name", "file_path", "metadata.file_format", "file_size", "md5_sum"], "timeout": 540}'
headers:
Access-Control-Allow-Headers:
- Content-Type, Authorization, X-Requested-With
Access-Control-Allow-Methods:
- GET, PUT, POST, DELETE, OPTIONS
Access-Control-Allow-Origin:
- '*'
Access-Control-Max-Age:
- "1000"
Cf-Cache-Status:
- DYNAMIC
Cf-Ray:
- 85fcf008182b983d-SJC
Content-Type:
- application/json;charset=utf-8
Date:
- Tue, 05 Mar 2024 20:43:19 GMT
Server:
- cloudflare
Vary:
- Accept-Encoding
Via:
- 1.1 jamo-dev.jgi.doe.gov
status: 200 OK
code: 200
duration: 94.148234ms
56 changes: 0 additions & 56 deletions databases/jdp/fixtures/dts-jamo-cassette.yaml

This file was deleted.

Loading
Loading