Application Package
The Application Package defines the internal script definition and configuration that will be executed by a Process. This package is based on Common Workflow Language (CWL). Using the extensive CWL Specification as backbone for internal execution of the process allows it to run multiple type of applications, whether they are referenced to by Docker image, scripts (bash, python, etc.), some remote Process and more.
Note
The large community and use cases covered by CWL makes it extremely versatile. If you encounter any issue running your Application Package in Weaver (such as file permissions for example), chances are that there exists a workaround somewhere in the CWL Specification. Most typical problems are usually handled by some flag or argument in the CWL definition, so this reference should be explored first. Please also refer to FAQ section as well as existing Weaver issues. Ultimately if no solution can be found, open an new issue about your specific problem.
All processes deployed locally into Weaver using a CWL package definition will have their full package
definition available with GET {WEAVER_URL}/processes/{processID}/package
(Package) request.
Note
The package request is a Weaver-specific implementation, and therefore, is not necessarily available on other ADES/EMS implementation as this feature is not part of OGC API - Processes specification.
Typical CWL Package Definition
CWL CommandLineTool
Following CWL package definition represents the weaver.processes.builtin.jsonarray2netcdf
process.
1#!/usr/bin/env cwl-runner
2cwlVersion: v1.0
3class: CommandLineTool
4# target the installed python pointing to weaver conda env to allow imports
5baseCommand: python
6arguments:
7 - "${WEAVER_ROOT_DIR}/weaver/processes/builtin/jsonarray2netcdf.py"
8 - "-o"
9 - "$(runtime.outdir)"
10inputs:
11 input:
12 type: File
13 format: iana:application/json
14 inputBinding:
15 position: 1
16 prefix: "-i"
17outputs:
18 output:
19 format: ogc:netcdf
20 type:
21 type: array
22 items: File
23 outputBinding:
24 glob: "*.nc"
25$namespaces:
26 iana: "https://www.iana.org/assignments/media-types/"
27 ogc: "http://www.opengis.net/def/media-type/ogc/1.0/"
The first main components is the class: CommandLineTool
that tells Weaver it will be an atomic process
(contrarily to CWL Workflow presented later).
The other important sections are inputs
and outputs
. These define which parameters will be expected and
produced by the described application. Weaver supports most formats and types as specified by CWL Specification.
See Inputs/Outputs Type for more details.
Script Application
When deploying a CommandLineTool
that only needs to execute script or shell commands, it is recommended
to define an appropriate DockerRequirement to containerize the Process, even though no advanced
operation is needed. The reason for this is because there is no way for Weaver to otherwise know for sure
how to provide all appropriate dependencies that this operation might need. In order to preserve processing
environment and results separate between any Process and Weaver itself, the executions will either be
automatically containerized (with some default image), or blocked entirely when Weaver cannot resolve the
appropriate execution environment. Therefore, it is recommended that the Application Package provider
defines a specific image to avoid unexpected failures if this auto-resolution changes across versions.
Below are minimalistic Application Package samples that make use of a shell command and a custom Python script for quickly running some operations, without actually needing to package any specialized Docker image.
The first example simply outputs the contents of a file
input using the cat
command.
Because the Docker image debian:stretch-slim
is specified, we can guarantee that the command will be
available within its containerized environment. In this case, we also take advantage of the stdout.log
which
is always collected by Weaver (along with the stderr
) in order to obtain traces produced by any
Application Package when performing Job executions.
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: cat
requirements:
DockerRequirement:
dockerPull: "debian:stretch-slim"
inputs:
- id: file
type: File
inputBinding:
position: 1
outputs:
- id: output
type: File
outputBinding:
glob: output.txt
stdout: output.txt
The second example takes advantage of the InitialWorkDirRequirement to generate a Python script dynamically
(i.e.: script.py
), prior to executing it for processing the received inputs and produce the output file.
Because a Python runner is required, the DockerRequirement specification defines a basic Docker image that
meets our needs. Note that in this case, special interpretation of $(...)
entries within the definition can be
provided to tell CWL how to map Job input values to the dynamically created script.
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand:
- python3
- script.py
inputs:
- id: amount
type: int
- id: cost
type: float
outputs:
- id: quote
type: File
outputBinding:
glob: report.txt
requirements:
DockerRequirement:
dockerPull: "python:3.7-alpine"
InitialWorkDirRequirement:
listing:
# below script is generated dynamically in the working directory, and then called by the base command
- entryname: script.py
entry: |
amount = $(inputs.amount)
cost = $(inputs.cost)
with open("report.txt", mode="w", encoding="utf-8") as report:
report.write(f"Order Total: {amount * cost:0.2f}$\\n")
See also
See the Python Applications section for more utilities to help create an Application Package from Python.
See also
For other programing languages, see CWL Development Tools for a list of related utilities that helps working with CWL, some of which offering convertion capabilities.
Python Applications
When the Application Package to be generated consists of a Python script, which happens to make use of
the builtin argparse
package, it is possible to employ the argparse2tool
utility, which will automatically
generate a corresponding CWL definition using the specified CLI arguments and their types.
The argparse2tool
utility can help quickly generate a valid CWL definition, but it is the responsibility
of the user to validate that converted arguments have the appropriate types, or any additional metadata required to
properly describe the intended Process. Notably, users might find the need to add appropriate format
definitions to the I/O, since those will generally be missing descriptive Media-Types.
Note
Although argparse2tool
can help in the initial CWL generation procedure, it is recommended to apply
additional containerization best-practices, such as described in Script Application, to increase chances to
obtain a replicable and reusable Application Package definition.
See also
For pure Python scripts not using argparse
, the scriptcwl
utility can be considered instead.
See also
For Python code embedded in |jupyter-notebooks|_, refer to Jupyter Notebook Applications for more details.
Jupyter Notebook Applications
When working on experimental or research applications, a Jupyter Notebook is a popular development environment,
due to its convenient interface for displaying results, interacting with visualization tools, or the larger plugin
ecosystem that it can offer. However, a Jupyter Notebook is typically insufficient by itself to describe a complete
application. To help developers transition from a Jupyter Notebook to Dockerized Applications, which ensures the
Application Package can be deployed and reused, the IPython2CWL
utility can be employed.
Using jupyter repo2cwl
(after installing IPython2CWL
in the Python environment), it is possible to
directly convert a Git repository reference containing a Jupyter Notebook into deployable CWL leveraging
a Docker container. To do this, the utility uses two strategies under the hood:
jupyterhub/repo2docker
is employed to convert a Git repository into a Docker container, with any applicable package requirements, project metadata, and advanced configuration details.Python typing annotations provided by
IPython2CWL
define the CWL I/O from variables and results located within the Jupyter Notebook.
Note
Because jupyterhub/repo2docker
is employed, which is highly adaptable to many use cases, all typical Python
project Configuration Files,
such as requirements.txt
, environment.yml
, setup.py
, pyproject.toml
, etc. can be employed.
The Docker container dependencies can be provided with an explicit Dockerfile
as well.
Please refer to the official documentation for all advanced configuration options.
Because Python type annotations are employed with jupyter repo2cwl
to indicate which variables will contain the CWL I/O references, it is actually possible
to annotate a Jupyter Notebook without any additional package dependencies. To do so, one only needs
to employ string annotations as follows.
import csv
import json
from typing import TYPE_CHECKING
if TYPE_CHECKING:
# This block is only evaluated by type checkers (and jupyter-repo2cwl).
# Therefore, it is not executed when running hte notebook.
# In other words, 'ipython2cwl' does not even need to be installed!
from ipython2cwl.iotypes import CWLFilePathInput, CWLFilePathOutput
input_file: "CWLFilePathInput" = "data.csv"
with open(input_file, mode="r", encoding="utf-8") as f:
csv_reader = csv.reader(f)
data = [line for line in csv_reader if line]
headers = data[0]
values = data[1:]
items = [{k: v} for val in values for k, v in zip(headers, val)]
output_file: "CWLFilePathOutput" = "output.json"
with open(output_file, mode="w", encoding="utf-8") as f:
json.dump(items, f)
See also
See IPython2CWL Supported Types for more details about the mapping from a Python annotation to the resulting CWL Inputs/Outputs Type.
When the above code is saved in a Jupyter Notebook and committed to a Git repository, the jupyterhub/repo2docker
utility can automatically clone the repository, parse the Python code, extract the CWL annotations, and
generate the Application Package with a Docker container containing all of their respective definitions.
All of this is accomplished with a single call to obtain a deployable CWL in Weaver, which can then take over
from the Process Deployment to obtain an OGC API - Process definition.
Jupyter Notebook to CWL Example: NCML to STAC Application
For a more concrete example of a Jupyter Notebook convertion to CWL, see the crim-ca/ncml2stac
GitHub
repository, which contains a sample NCML to STAC Jupyter Notebook.
This script, as indicated by its name, converts NCML XML metadata with CMIP6 attributes into the
corresponding SpatioTemporal Asset Catalog (STAC) definition and extensions.
It uses the same IPython2CWL
type annotation strategy as presented
above to indicate which NCML File
variable is to be employed as
as the CWL input reference, and the expected STAC File
as output to be collected by CWL.
Using jupyter repo2cwl
and the Weaver CLI in combination, as shown below,
it is possible to automatically convert the Jupyter Notebook script into a Dockerized CWL and
deploy it to a OGC API - Processes server supporting Application Package such as Weaver.
jupyter-repo2cwl "https://github.com/crim-ca/ncml2stac" -o /tmp
weaver deploy -u http://example.com/weaver -i ncml2stac --cwl /tmp/notebooks_ncml2stac.cwl
See also
Refer to the
crim-ca/ncml2stac
repository’s README for more details about the utilities.Refer to the NCML to STAC Jupyter Notebook for the implementation of the Application Package script.
Dockerized Applications
When advanced processing capabilities and more complicated environment preparation are required, it is recommended to package and push pre-built Docker images to a remote registry. In this situation, just like for Script Application examples, the DockerRequirement is needed. The definitions would also be essentially the same as previous examples, but with more complicated operations and possibly larger amount of inputs or outputs.
Whenever a Docker image reference is detected, Weaver will ensure that the application will be pulled using CWL capabilities in order to run it.
Because Application Package providers could desire to make use of Docker images hosted on private registries, Weaver offers the capability to specify an authorization token through HTTP request headers during the Process deployment. More specifically, the following definition can be provided during a Deploy request.
POST /processes HTTP/1.1
Host: weaver.example.com
Content-Type: application/json;charset=UTF-8
X-Auth-Docker: Basic <base64_token>
{ "processDescription": { }, "executionUnit": { } }
The X-Auth-Docker
header should be defined exactly like any typical Authorization
headers (HTTP Authentication Schemes).
The name X-Auth-Docker
is inspired from existing implementations that employ X-Auth-Token
in a similar fashion.
The reason why Authorization
and X-Auth-Token
headers are not themselves employed in this case is to ensure
that they do not interfere with any proxy or server authentication mechanism, which Weaver could be located behind.
For the moment, only Basic
(RFC 7617) authentication is supported.
To generate the base64 token, following methods can be used:
echo -n "<username>:<password>" | base64
import base64
base64.b64encode(b"<username>:<password>")
When the HTTP X-Auth-Docker
header is detected in combination of a DockerRequirement entry within
the Application Package of the Process being deployed, Weaver will parse the targeted Docker
registry defined in dockerPull
and will attempt to identify it for later authentication towards it with the
provided token. Given a successful authentication, Weaver should then be able to pull the Docker image
whenever required for launching new Job executions.
Note
Weaver only attempts to authenticate itself temporarily at the moment when the Job is submitted to retrieve the Docker image, and only if the image is not already available locally. Because of this, the provided authentication token should have a sufficient lifetime to run the Job at later times, considering any retention time of cached Docker images on the server. If the cache is cleaned, and the Docker image is made unavailable, Weaver will attempt to authenticate itself again when receiving the new Job. It is left up to the developer and Application Package provider to manage expired tokens in Weaver according to their needs. To resolve such cases, the Update Token request or an entire re-deployment of the Process could be accomplished, whichever is more convenient for them.
Added in version 4.5: Specification and handling of the X-Auth-Docker
header for providing an authentication token.
GPU and Resource dependant Applications
When an Application Package requires GPU or any other minimal set of hardware capabilities, such as in the case of machine learning or high-performance computing tasks, the submitted CWL must explicitly indicate those requirements to ensure they can be met for performing its execution. Similarly, an Application Package that must obtain external access to remote contents must not assume that the connection would be available, and must therefore request network access. Below are examples where such requirements are demonstrated and how to define them.
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
baseCommand: nvidia-smi
requirements:
cwltool:CUDARequirement:
cudaVersionMin: "11.2"
cudaComputeCapability: "7.5"
cudaDeviceCountMin: 1
cudaDeviceCountMax: 4
$namespaces:
cwltool: "http://commonwl.org/cwltool#"
inputs: {}
outputs:
output:
type: File
outputBinding:
glob: output.txt
stdout: output.txt
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
baseCommand: "<high-compute-algorithm>"
requirements:
ResourceRequirement:
coresMin: 8
coresMax: 16
ramMin: 1024
ramMax: 2048
tmpdirMin: 128
tmpdirMax: 1024
outdirMin: 1024
outdirMax: 2048
inputs: {}
outputs:
output:
type: File
outputBinding:
glob: output.txt
stdout: output.txt
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
baseCommand: curl
requirements:
NetworkAccess:
networkAccess: true
inputs:
url:
type: string
outputs:
output:
type: File
outputBinding:
glob: "output.txt"
stdout: "output.txt"
Above requirements can be combined in any fashion as needed. They can also be combined with any other requirements employed to define the core components of the application.
Whenever possible, requirements should be provided with values that best match the minimum and maximum amount of resources that the Application Package operation requires. More precisely, over-requesting resources should be avoided as this could lead to failing Job execution if the server or worker node processing it deems it cannot fulfill the requirements because they are too broad to obtain proper resource assignation, because it has insufficient computing resources, or simply for rate-limiting/fair-share reasons.
Although definitions such as ResourceRequirement and cwltool:CUDARequirement are usually applied for atomic operations,
they can also become relevant in the context of CWL Workflow execution. Effectively, providing the
required hardware capabilities for each atomic application can allow the Workflow engine to better schedule
Job steps. For example, if two computationally heavy steps happened to have no restriction for parallelization
based on the Workflow steps definition alone, but that running both of them simultaneously on the same machine
would necessarily end up causing an OutOfMemory
error due to insufficient resources, those requirements could help
preemptively let the engine know to wait until reserved resources become available. As a result, execution of the
second task could be delayed until the first task is completed, therefore avoiding the error.
Added in version 4.17: Support of ResourceRequirement.
Added in version 4.27: Support of NetworkAccess and cwltool:CUDARequirement.
Changed in version Deprecated: DockerGpuRequirement
.
Warning
Any Application Package that was making use of DockerGpuRequirement
should be updated to employ
the official DockerRequirement in combination with cwltool:CUDARequirement. For backward compatibility, any detected
DockerGpuRequirement
definition will be updated automatically with a minimalistic cwltool:CUDARequirement definition
using a very lax set of CUDA capabilities. It is recommended to provide specific configurations for your needs.
Remote Applications
To define an application that refers to a Remote Provider, an WPS-1/2, an OGC API - Processes (WPS-REST, WPS-T, WPS-3) or an ESGF-CWT endpoint, the corresponding Weaver-specific CWL-like requirements must be employed to indicate the URL where that remote resource is accessible. Once deployed, the contained CWL package and the resulting Process will be exposed as a OGC API - Processes (WPS-REST, WPS-T, WPS-3) resource.
Upon reception of a Process Execution request, Weaver will take care of resolving the indicated process URL from the CWL requirement and will dispatch the execution to the resource after applying any relevant I/O, parameter and Media-Type conversion to align with the target server standard for submitting the Job requests.
Below are examples of the corresponding CWL requirements employed for each type of remote application.
cwlVersion: "v1.0"
class: CommandLineTool
hints:
WPS1Requirement:
provider: "https://example.com/ows/wps/catalog"
process: "getpoint"
cwlVersion: "v1.0"
class: CommandLineTool
hints:
OGCAPIRequirement:
process: "https://example.com/ogcapi/processes/getpoint"
{
"cwlVersion": "v1.0",
"class": "CommandLineTool",
"hints": {
"ESGF-CWTRequirement": {
"provider": "https://edas.nccs.nasa.gov/wps/cwt",
"process": "xarray.subset"
}
}
}
CWL Workflow
Weaver also supports CWL class: Workflow
. When an Application Package is defined this way, the
Process deployment operation will attempt to resolve each step
as another process. The reference to the CWL
definition can be placed in any location supported as for the case of atomic processes
(see details about supported package locations).
The following CWL definition demonstrates an example Workflow
process that would resolve each step
with
local processes of match IDs.
1{
2 "cwlVersion": "v1.0",
3 "class": "Workflow",
4 "requirements": [
5 {
6 "class": "StepInputExpressionRequirement"
7 }
8 ],
9 "inputs": {
10 "tasmax": {
11 "type": {
12 "type": "array",
13 "items": "File"
14 }
15 },
16 "lat0": "float",
17 "lat1": "float",
18 "lon0": "float",
19 "lon1": "float",
20 "freq": {
21 "default": "YS",
22 "type": {
23 "type": "enum",
24 "symbols": ["YS", "MS", "QS-DEC", "AS-JUL"]
25 }
26 }
27 },
28 "outputs": {
29 "output": {
30 "type": "File",
31 "outputSource": "ice_days/output_netcdf"
32 }
33 },
34 "steps": {
35 "subset": {
36 "run": "ColibriFlyingpigeon_SubsetBbox.cwl",
37 "in": {
38 "resource": "tasmax",
39 "lat0": "lat0",
40 "lat1": "lat1",
41 "lon0": "lon0",
42 "lon1": "lon1"
43 },
44 "out": ["output"]
45 },
46 "json2nc": {
47 "run": "jsonarray2netcdf",
48 "in": {
49 "input": "subset/output"
50 },
51 "out": ["output"]
52 },
53 "ice_days": {
54 "run": "Finch_IceDays.cwl",
55 "in": {
56 "tasmax": "json2nc/output",
57 "freq": "freq"
58 },
59 "out": ["output_netcdf"]
60 }
61 }
62}
For instance, the jsonarray2netcdf
(Builtin) middle step in this example corresponds to the
CWL CommandLineTool process presented in previous section. Other processes referenced in this Workflow
can be
found in Weaver Test Resources.
Steps processes names are resolved using the variations presented below. Important care also needs to be given to inputs and outputs definitions between each step.
Step Reference
In order to resolve referenced processes as steps, Weaver supports 3 formats.
- Process ID explicitly given.Any visible process from
GET {WEAVER_URL}/processes
(GetCapabilities) response should be resolved this way.(e.g.:jsonarray2netcdf
resolves to pre-deployedweaver.processes.builtin.jsonarray2netcdf
). Full URL to the process description endpoint, provided that it also offers a
GET {WEAVER_URL}/processes/{processID}/package
(Package) endpoint (Weaver-specific).Full URL to the explicit CWL file (usually corresponding to (2) or the
href
provided in deployment body).
When an URL to the CWL process “file” is provided with an extension, it must be one of the supported values
defined in weaver.processes.wps_package.PACKAGE_EXTENSIONS
. Otherwise, Weaver will refuse it as it cannot
figure out how to parse it.
Because Weaver and the underlying CWL executor need to resolve all steps in order to validate their input and
output definitions correspond (id, format, type, etc.) in order to chain them, all intermediate processes MUST
be available. This means that you cannot Deploy nor Execute
a Workflow
-flavored Application Package until all referenced steps have themselves been deployed and
made visible.
Warning
Because Weaver needs to convert given CWL documents into equivalent WPS process definition,
embedded CWL processes within a Workflow
step are not supported currently. This is a known limitation
of the implementation, but not much can be done against it without major modifications to the code base.
See also issue #56.
See also
Deploy request details.
Step Inputs/Outputs
Inputs and outputs of connected steps are required to match types and formats in order for the workflow to be valid.
This means that a process that produces an output of type String
cannot be directly chained to a process that takes
as input a File
, even if the String
of the first process represents an URL that could be resolved to a valid
file reference. In order to chain two such processes, an intermediate operation would need to be defined to explicitly
convert the String
input to the corresponding File
output. This is usually accomplished using Builtin
processes, such as in the previous example.
Since formats must also match (e.g.: a process producing application/json
cannot be mapped to one producing
application/x-netcdf
), all mismatching formats must also be converted with an intermediate step if such operation
is desired. This ensures that workflow definitions are always explicit and that as little interpretation, variation or
assumptions are possible between each execution. Because of this, all application generated by Weaver will attempt to
preserve and enforce matching input/output format
definition in both CWL and WPS as long as it does
not introduce ambiguous results (see File Format for more details).
Correspondence between CWL and WPS fields
Because CWL definition and WPS process description inherently provide “duplicate” information, many fields can be mapped between one another. In order to handle any provided metadata in the various supported locations by both specifications, as well as to extend details of deployed processes, each Application Package get its details merged with complementary WPS description.
In some cases, complementary details are only documentation-related, but some information directly affect the format or
execution behaviour of some parameters. A common example is the maxOccurs
field provided by WPS that does
not have an exactly corresponding specification in CWL (any-sized array). On the other hand, CWL also
provides data preparation steps such as initial staging (i.e.: InitialWorkDirRequirement
) that doesn’t have an
equivalent under the WPS process description. For this reason, complementary details are merged and reflected
on both sides (as applicable), when non-ambiguous resolution is possible.
In case of conflicting metadata, the CWL specification will most of the time prevail over the WPS
metadata fields simply because it is expected that a strict CWL specification is provided upon deployment.
The only exceptions to this situation are when WPS specification help resolve some ambiguity or when
WPS enforces the parametrisation of some elements, such as with maxOccurs
field.
Note
Metadata merge operation between CWL and WPS is accomplished on per-mapped-field basis. In other
words, more explicit details such as maxOccurs
could be obtained from WPS and simultaneously the
same input’s format
could be obtained from the CWL side. Merge occurs bidirectionally for corresponding
information.
The merging strategy of process specifications also implies that some details can be omitted from one context if they
can be inferred from corresponding elements in the other. For example, the CWL and WPS context both
define keywords
(with minor naming variation) as a list of strings. Specifying this metadata in both locations
is redundant and only makes the process description longer. Therefore, the user is allowed to provide only one of the
two and Weaver will take care to propagate the information to the lacking location.
In order to help understand the resolution methodology between the contexts, following sub-section will cover supported mapping between the two specifications, and more specifically, how each field impacts the mapped equivalent metadata.
Warning
Merging of corresponding fields between CWL and WPS is a Weaver-specific implementation. The same behaviour is not necessarily supported by other implementations. For this reason, any converted information between the two contexts will be transferred to the other context if missing in order for both specification to reflect the similar details as closely as possible, wherever context the metadata originated from.
Inputs/Outputs ID
Inputs and outputs (I/O) id
from the CWL context will be respectively matched against corresponding
id
or identifier
field from I/O of WPS context. In the CWL definition, all of the allowed I/O
structures are supported, whether they are specified using an array list with explicit definitions, using “shortcut”
variant (i.e.: <type>[]
), or using key-value pairs (see CWL Mapping for more details). Regardless of array or
mapping format, CWL requires that all I/O have unique id
.
On the WPS side, either a mapping or list of I/O are also expected with unique id
.
Changed in version 4.0: Previous versions only supported WPS I/O using the listing format. Both can be used interchangeably in both CWL and WPS contexts as of this version.
To summarize, the following CWL and WPS I/O definitions are all equivalent and will result into the
same process definition after deployment. For simplification purpose, below examples omit all but mandatory fields
(only of the inputs
and outputs
portion of the full deployment body) to produce the same result.
Other fields are discussed afterward in specific sections.
1{
2 "inputs": [
3 {
4 "id": "single-str",
5 "type": "string"
6 },
7 {
8 "id": "multi-file",
9 "type": "File[]"
10 }
11 ],
12 "outputs": [
13 {
14 "id": "output-1",
15 "type": "File"
16 },
17 {
18 "id": "output-2",
19 "type": "File"
20 }
21 ]
22}
|
1{
2 "inputs": {
3 "single-str": {
4 "type": "string"
5 },
6 "multi-file": {
7 "type": "File[]"
8 }
9 },
10 "outputs": {
11 "output-1": {
12 "type": "File"
13 },
14 "output-2": {
15 "type": "File"
16 }
17 }
18}
|
1{
2 "inputs": [
3 {
4 "id": "single-str"
5 },
6 {
7 "id": "multi-file",
8 "formats": []
9 }
10 ],
11 "outputs": [
12 {
13 "id": "output-1",
14 "formats": []
15 },
16 {
17 "id": "output-2",
18 "formats": []
19 }
20 ]
21}
|
The WPS example above requires a format
field for the corresponding CWL File
type in order to
distinguish it from a plain string. More details are available in Inputs/Outputs Type below about this requirement.
Finally, it is to be noted that above CWL and WPS definitions can be specified in the Deploy request body with any of the following variations:
Both are simultaneously fully specified (valid although extremely verbose).
Both partially specified as long as sufficient complementary information is provided.
Only CWL I/O is fully provided (with empty or even unspecified
inputs
oroutputs
section from WPS).
Warning
Weaver assumes that its main purpose is to eventually execute an Application Package and will therefore
prioritize specification in CWL over WPS to infer types. Because of this, any unmatched id
from
the WPS context against provided CWL id
s of the same I/O section will be dropped, as they
ultimately would have no purpose during CWL execution.
This does not apply in the case of referenced WPS-1/2 processes since no CWL is available in the first place.
Inputs/Outputs Type
In the CWL context, the type
field indicates the type of I/O.
Available types are presented in the CWLType Symbols portion of the CWL specification.
Warning
Weaver does not support CWL type: Any
. This limitation is intentional in order to guarantee
proper resolution of CWL types to their corresponding WPS definitions. Furthermore, the Any
type would make the Process description too ambiguous.
Type Correspondance
A summary of applicable types is presented below.
Those CWL types can be mapped to WPS and/or OAS contexts in order to obtain corresponding
I/O definitions. However, not every type exists in each of those contexts. Therefore, some types will
necessarily be simplified or converted to their best corresponding match when exact mapping cannot be accomplished.
The simplification of types can happen when converting in any direction
(CWL ⇔ WPS ⇔ OAS).
It all depends on which definitions that were provided are the more specific. For example, a WPS dateTime
will be simplified to a generic CWL string
, and into an OAS string
with format: "date-time"
.
In this example, it would be important to provide the WPS or OAS definitions if the date-time portion
was critical, since it could not be inferred only from CWL string
.
Further details regarding handling methods or important considerations for specific types will be presented in Type Resolution and Directory Type sections.
CWL |
OAS type |
Description |
|
---|---|---|---|
|
n/a |
n/a |
Not supported. See note. |
|
n/a |
n/a |
Cannot be used by itself. |
|
|
|
Binary value. |
|
|
|
Numeric whole value. |
|
|
|
Numeric floating-point value.
By default, |
|
|
|
Generic string. Default employed if
nothing more specific is resolved. This type can be used to represent any File Reference as plain URL string without resolution. |
|
|
Partial support available[#bbox-note]_. |
|
|
|
File Reference with Media-Type validation and staging according to the applicable scheme. |
|
|
|
Directory Reference
handled as nested |
Footnotes
Type Resolution
In the WPS context, three data types exist, namely Literal
, BoundingBox
and Complex
data.
As presented in previous examples, I/O in the WPS context does not require an explicit indication of
which data type from one of Literal
, BoundingBox
and Complex
to apply. Instead, WPS type can be
inferred using the matched API schema of the I/O. For instance, Complex
I/O (e.g.: file reference) requires the
formats
field to distinguish it from a plain string
. Therefore, specifying either format
in CWL
or formats
in WPS immediately provides all needed information for Weaver to understand that this I/O is
expected to be a file reference.
1{
2 "id": "input",
3 "formats": [
4 {"mediaType": "application/json", "default": true}
5 ]
6}
A combination of supportedCRS
objects providing crs
references would
otherwise indicate a BoundingBox
I/O (see note).
1{
2 "id": "input",
3 "supportedCRS": [
4 {"crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84", "default": true}
5 ]
6}
If none of the two previous schemas are matched, the I/O type resolution falls back
to Literal
data of string
type. To employ another primitive data type such as Integer
,
an explicit indication needs to be provided as follows.
1{
2 "id": "input",
3 "literalDataDomains": [
4 {"dataType": {"name": "integer"}}
5 ]
6}
Obviously, the equivalent CWL definition is simpler in this case (i.e.: only type: int
is required).
It is therefore recommended to take advantage of Weaver’s merging strategy during
Process Deployment in this case by providing only the details through
the CWL definition and have the corresponding WPS I/O type automatically deduced by
the generated process. If desired, literalDataDomains
can still be explicitly provided as above to ensure that
it gets parsed as intended type.
Added in version 4.16.
With more recent versions of Weaver, it is also possible to employ OpenAPI schema (OAS) definitions
provided in the I/O to specify the explicit structure that applies to Literal
, BoundingBox
and Complex
data types. When OpenAPI schema are detected, they are also considered in the merging strategy along with
other specifications provided in CWL and WPS contexts. More details about OAS context is
provided in Inputs/Outputs OpenAPI Schema section.
Directory Type
Changed in version 4.27: Support of CWL type: Directory
added to Weaver.
In order to map a Directory
to the underlying WPS Process that do not natively offer this
type of reference, a Complex
“pseudo-file” using Media-Type application/directory
is employed. For further
validation that a Directory
is properly parsed by Weaver, provided URL references must also end with a trailing
slash (/
) character.
Warning
Note that, when using Directory
type, very few format and content validation can be accomplished for individual
files contained in that directory. The contents must therefore match the definitions expected by the application
receiving it. No explicit validation is accomplished by Weaver to ensure if expected contents are available.
When a Directory
type is specified in the Process definition, and that
a File Reference is provided during Execution, the reference
pointed to as Directory
must provide a listing of files. Those files can either be relative to the Directory
or other absolute File Reference locations. The applicable scheme to stage those files will
be applied as needed based on resolved references. It is therefore possible to mix URL schemes between the listed
references. For example, a Directory
listing as JSON obtained from a https://
endpoint could provide
multiple File
locations from s3://
buckets to stage for Process Execution.
The following Directory
listing formats are supported.
Listing Format |
Description |
---|---|
<html lang="en">
<body>
<h1>Index of /dir/</h1>
<hr>
<table>
<thead>
<tr>
<th>Content</th>
<th>Modified</th>
</tr>
</thead>
<tbody>
<tr>
<td><pre><a href="README">README</a></pre></td>
<td>2022-10-31 23:48</td>
</tr>
<tr>
<td><pre><a href="dir/">dir/</a></pre></td>
<td>2022-10-31 23:48</td></tr>
<tr>
<td><pre><a href="dir/file.txt">dir/file.txt</a></pre></td>
<td>2022-10-31 23:48</td>
</tr>
</tbody>
</table>
<hr>
</body>
</html>
|
A file index where each reference to be staged
should be contained in a The structure can be contained in a |
[
"https://example.com/base/dir/README.md",
"https://example.com/base/dir/nested/image.png",
"https://example.com/base/dir/nested/data.csv"
]
|
A JSON body returned from an endpoint
obtained by |
{
"ResponseMetadata": {
"RequestId": "vpiM5RBkJ3O68CnD5fO42d887Jh49Cf8bhA6nw7ZTHIuGRVccDQM",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"x-amzn-requestid": "vpiM5RBkJ3O68CnD5fO42d887Jh49Cf8bhA6nw7ZTHIuGRVccDQM"
},
"RetryAttempts": 0
},
"IsTruncated": false,
"Contents": [
{
"Key": "dir/file.txt",
"LastModified": "2022-11-01T04:25:42+00:00",
"ETag": "\"17404a596cbd0d1e6c7d23fcd845ab82\"",
"Size": 4,
"StorageClass": "STANDARD"
},
{
"Key": "dir/sub/file.txt",
"LastModified": "2022-11-01T04:25:42+00:00",
"ETag": "\"17404a596cbd0d1e6c7d23fcd845ab82\"",
"Size": 4,
"StorageClass": "STANDARD"
},
{
"Key": "dir/sub/nested/file.txt",
"LastModified": "2022-11-01T04:25:42+00:00",
"ETag": "\"17404a596cbd0d1e6c7d23fcd845ab82\"",
"Size": 4,
"StorageClass": "STANDARD"
}
],
"Name": "wps-process-test-bucket",
"Prefix": "dir/",
"MaxKeys": 1000,
"EncodingType": "url",
"KeyCount": 3
}
|
Any supported S3 endpoint as detailed in AWS S3 Bucket Access Methods that provides a listing of file objects to be staged. Proper access must be granted as per Configuration of AWS S3 Buckets if the bucket contents are not publicly accessible. More details about supported references can be found in AWS S3 Bucket References. |
File Format
An input or output resolved as CWL File
type, equivalent to a WPS ComplexData
, supports
format
specification. Every mimeType
field nested under formats
entries of the WPS definition
will be mapped against corresponding namespaced format
of CWL.
Note
For OGC API - Processes conformance and backward compatible support, both mimeType
and mediaType
can be used interchangeably for Process Deployment.
For Process Description, the employed name depends on the requested schema
as query
parameter, defaulting to OGC API - Processes mediaType
representation if unspecified.
Following is an example where input definitions are equivalent in both CWL and WPS contexts.
1{
2 "id": "input",
3 "formats": [
4 {"mimeType": "application/x-netcdf"},
5 {"mimeType": "application/json"}
6 ]
7}
|
1{
2 "inputs": [
3 {
4 "id": "input",
5 "format": [
6 "edam:format_3650",
7 "iana:application/json"
8 ]
9 }
10 ],
11 "$namespaces": {
12 "edam": "http://edamontology.org/",
13 "iana": "https://www.iana.org/assignments/media-types/"
14 }
15}
|
As demonstrated, both contexts accept multiple formats for inputs. These effectively represent supported formats by
the underlying application. The two Media-Types selected for this example are chosen specifically to demonstrate
how CWL formats must be specified. More precisely, CWL requires a real schema definition referencing to
an existing ontology to validate formats, specified through the $namespaces
section. Each format entry is then
defined as a mapping of the appropriate namespace to the identifier of the ontology. Alternatively, you can also provide
the full URL of the ontology reference in the format string.
Like many other fields, this information can become quite rapidly redundant and difficult to maintain. For this reason,
Weaver will automatically fill the missing detail if only one of the two corresponding information between CWL
and WPS is provided. In other words, an application developer could only specify the I/O’s formats
in the WPS portion during process deployment, and Weaver will take care to update the matching CWL
definition without any user intervention. This makes it also easier for the user to specify supported formats since it
is generally easier to remember names of Media-types than full ontology references. Weaver has a large set of
commonly employed Media-Types that it knows how to convert to corresponding ontologies. Also, Weaver will look
for new Media-Types it doesn’t explicitly know about onto either the IANA or the EDAM ontologies
in order to attempt automatically resolving them.
When formats are resolved between the two contexts, Weaver applies information in a complimentary fashion. This means
for example that if the user provided application/x-netcdf
on the WPS side and iana:application/json
on
the CWL side, both resulting contexts will have both of those formats combined. Weaver will not favour one
location over the other, but will rather merge them if they can be resolved into different and valid entities.
Since formats
is a required field for WPS ComplexData
definitions (see Inputs/Outputs Type) and
that Media-Types are easier to provide in this context, it is recommended to provide all of them in the
WPS definition. Alternatively, the Inputs/Outputs OpenAPI Schema representation also located within the
WPS I/O definitions can be used to provide contentMediaType
.
Above examples present the minimal content of formats
JSON objects
(i.e.: mimeType
or mediaType
value), but other fields, such as encoding
and schema
can be provided as well to further refine the specific format supported by the corresponding I/O definition.
These fields are directly mapped, merged and combined against complementary details provided with contentMediaType
,
and contentEncoding
and contentSchema
within an OAS schema (see Inputs/Outputs OpenAPI Schema).
Output File Format
Although WPS definition allows multiple supported formats for output that are later resolved to the applied
one onto the produced result of the job, CWL only considers the output format
that directly indicates the
applied schema. There is no concept of supported format in the CWL world. This is simply because CWL
cannot predict nor reliably determine which output will be produced by a given application execution without running it,
and therefore cannot expose consistent output specification before running the process. Because CWL requires to
validate the full process integrity before it can be executed, this means that only a single output format is
permitted in its context (providing many will raise a validation error when parsing the CWL definition).
To ensure compatibility with multiple supported formats outputs of WPS, any output that has more that one
format will have its format
field dropped in the corresponding CWL definition. Without any format
on the
CWL side, the validation process will ignore this specification and will effectively accept any type of file.
This will not break any execution operation with CWL, but it will remove the additional validation layer of the
format (which especially deteriorates process resolution when chaining processes inside a CWL Workflow).
If the WPS output only specifies a single MIME-type, then the equivalent format (after being resolved to a valid ontology) will be preserved on the CWL side since the result is ensured to be the unique one provided. For this reason, processes with specific single-format output are be preferred whenever possible. This also removes ambiguity in the expected output format, which usually requires a toggle input specifying the desired type for processes providing a multi-format output. It is instead recommended to produce multiple processes with a fixed output format for each case.
Allowed Values
Allowed values in the context of WPS Literal
data provides a mean for the application developer to restrict
inputs to a specific set of values. In CWL, the same can be achieved using an enum
definition. Therefore,
the following two variants are equivalent and completely interchangeable.
1{
2 "id": "input",
3 "literalDataDomains": [
4 {"allowedValues": ["value-1", "value-2"]}
5 ]
6}
|
1{
2 "id": "input",
3 "type": {
4 "type": "enum",
5 "symbols": ["value-1", "value-2"]
6 }
7}
|
Weaver will ensure to propagate such definitions bidirectionally in order to update the CWL or WPS
correspondingly with the provided information in the other context if missing. The primitive type to apply to a missing
WPS specification when resolving it from a CWL definition is automatically inferred with the best
matching type from provided values in the enum
list.
Note that enum
such as these will also be applied on top of Multiple and Optional Values definitions
presented next.
Multiple and Optional Values
Inputs that take multiple values or references can be specified using minOccurs
and maxOccurs
in WPS
context, while they are specified using the array
type in CWL. While the same minOccurs
parameter with a
value of zero (0
) can be employed to indicate an optional input, CWL requires the type to specify
"null"
or to use the shortcut ?
character suffixed to the base type to indicate optional input.
Resolution between WPS and CWL for the merging strategy implies all corresponding parameter
combinations and checks in this case.
Warning
Ensure to specify "null"
with quotes when working with JSON, YAML and CWL file formats
and/or contents submitted to API requests or with the CLI. Using an unquoted null
will result
into a parsed None
value which will not be detected as nullable CWL type.
Because CWL does not take an explicit amount of maximum occurrences, information in this case are not
necessarily completely interchangeable. In fact, WPS is slightly more verbose and easier to define in this case
than CWL because all details are contained within the same two parameters. Because of this, it is often
preferable to provide the minOccurs
and maxOccurs
in the WPS context, and let Weaver infer the
array
and/or "null"
type requirements automatically. Also, because of all implied parameters in this situation
to specify the similar details, it is important to avoid providing contradicting specifications as Weaver will have
trouble guessing the intended result when merging specifications. If unambiguous guess can be made, CWL will be
employed as the overruling definition to resolve erroneous mismatches (as for any other corresponding fields).
Warning
Parameters minOccurs
and maxOccurs
are not permitted for outputs in the WPS context. Native
WPS therefore does not permit multiple output reference files or data values under a same output ID.
To see potential workarounds, refer to Multiple Outputs section.
Note
Although WPS multi-value inputs are defined as a single entity during deployment, special care must be taken to the format in which to specify these values during execution. Please refer to Multiple Inputs section of Execute request.
Following are a few examples of equivalent WPS and CWL definitions to represent multiple values under a given input. Some parts of the following definitions are purposely omitted to better highlight the concise details of multiple and optional information.
1{
2 "id": "input-multi-required",
3 "format": "application/json",
4 "minOccurs": 1,
5 "maxOccurs": "unbounded"
6}
|
1{
2 "id": "input-multi-required",
3 "format": "iana:application/json",
4 "type": {
5 "type": "array", "items": "File"
6 }
7}
|
It can be noted from the examples that minOccurs
and maxOccurs
can be either an integer
or a string
representing one. This is to support backward compatibility of older WPS specification that always employed
strings although representing numbers. Weaver understands and handles both cases. Also, maxOccurs
can have the
special string value "unbounded"
, in which case the input is considered to be allowed an unlimited amount if
entries (although often capped by another implicit machine-level limitation such as memory capacity). In the case of
CWL, an array
is always considered as unbounded, therefore WPS is the only context that can limit
this amount.
Inputs/Outputs OpenAPI Schema
Added in version 4.16.
Basic Type Definitions
Alternatively to parameters presented in previous sections, and employed for representing
Multiple and Optional Values, Allowed Values specifications, supported File Format definitions
and/or Inputs/Outputs Type identification, the OpenAPI specification can be employed to entirely
define the I/O schema. More specifically, this is accomplished by providing an OAS-compliant structure
under the schema
field of each corresponding I/O. This capability allows each Process to be
compliant with OGC API - Processes specification that requires this detail in
the Process Description. The same kind of schema
definitions can be used
for the Deploy operation.
For example, the below representations are equivalent between WPS, OAS and CWL definitions. Obviously, corresponding definitions can become more or less complicated with multiple combinations of corresponding parameters presented later in this section. Some definitions are also not completely portable between contexts.
1{
2 "id": "input",
3 "literalDataDomains": [
4 {
5 "allowedValues": [
6 "value-1",
7 "value-2"
8 ]
9 }
10 ],
11 "minOccurs": 2,
12 "maxOccurs": 4
13}
|
1{
2 "id": "input",
3 "schema": {
4 "type": "array",
5 "items": {
6 "type": "string",
7 "enum": [
8 "value-1",
9 "value-2"
10 ]
11 },
12 "minItems": 2,
13 "maxItems": 4
14 }
15}
|
1{
2 "id": "input",
3 "type": {
4 "type": "array",
5 "items": {
6 "type": "enum",
7 "symbols": [
8 "value-1",
9 "value-2"
10 ]
11 }
12 }
13}
|
See also
An example with extensive variations of supported I/O definitions with OAS is available in tests/functional/application-packages/EchoProcess/describe.yml. This is also the corresponding example provided by OGC API - Processes standard to ensure Weaver complies to its specification.
As per all previous parameters in CWL and WPS contexts, details provided in OAS schema are complementary and Weaver will attempt to infer, combine and convert between the various representations as best as possible according to the level of details provided.
Furthermore, Weaver will extend (as needed) any provided schema
during
Process Deployment if it can identify that the specific OAS definition is inconsistent
with other parameters. For example, if minOccurs
/maxOccurs
were provided by indicating that the I/O must
have exactly between [2-4] elements, but only a single OAS object was defined under schema
, that OAS
definition would be converted to the corresponding array, as single values are not permitted in this case. Similarly, if
the range of items was instead [1-4], the OAS definition would be adjusted with oneOf
keyword, allowing both
single value and array representation of those values, when submitted for Process Execution.
Below is a summary of fields that are equivalent or considered to identify similar specifications
(corresponding fields are aligned in the table).
Note that all OAS elements are always nested under the schema
field of an I/O, with parameters
located where appropriate as per OpenAPI specification. Other OAS fields are still permitted, but
are not explicitly handled to search for corresponding definitions in WPS and CWL contexts.
Parameters in WPS Context |
Parameters in OAS Context |
Parameters in CWL Context |
---|---|---|
|
|
|
|
|
|
|
|
|
In order to be OGC-compliant, any previously deployed Process will automatically generate any missing
schema
specification for all I/O it employs when calling its Process Description.
Similarly, a deployed Process that did not make use of the schema
representation method to define its
I/O will also generate the corresponding OAS definitions from other WPS and CWL
contexts, provided those definitions offered sufficiently descriptive and valid I/O parameters for deployment.
JSON Types
Along above parameter combinations, OAS context also accomplishes the auto-detection of common JSON
structures to convert between raw-data string formatted as JSON, literal JSON object embedded in the
body, and application/json
file references toward the corresponding Complex
WPS input or output.
When any of those three JSON definition is detected, other equivalent representations will be added using
a oneOf
keyword if they were not already explicitly provided in schema
. When analyzing and combining those
definitions, any OAS $ref
or contentSchema
specifications will be used to resolve the corresponding
type: object
with the most explicit schema
definition available. If this cannot be achieved, a generic
object
allowing any additionalProperties
(i.e.: no JSON schema variation) will be used instead.
External URIs pointing to an OAS schema formatted either as JSON or YAML are resolved and
fetched inline as needed during I/O merging strategy to interpret specified references.
Following is a sample representation of equivalent variants JSON definitions, which would be
automatically expended using the oneOf
structure with other missing components if applicable.
{
"id:" "input",
"schema": {
"oneOf": [
{
"type": "string",
"contentMediaType": "application/json"
"contentSchema": "http://host.com/schema.json"
},
{
"$ref": "http://host.com/schema.json"
}
]
}
}
|
{
"id:" "input",
"schema": {
"oneOf": [
{
"type": "string",
"contentMediaType": "application/json"
},
{
"type": "object",
"additionalProperties": true
}
]
}
}
|
Special handling of well-known OAS type: object
structures is also performed to convert them to more
specific and appropriate WPS types intended for their purpose. For instance, a measurement value provided
along with an Unit of Measure (UoM) is converted to a WPS Literal
. An object containing bbox
and crs
fields with the correct schema are converted to WPS BoundingBox
type. Except for these special
cases, all other OAS type: object
are otherwise converted to WPS Complex
type, which in turn is
communicated to the CWL application using a File
I/O. Other non-JSON definitions are also
converted using the same WPS Complex
/CWL File
, but their values cannot be submitted with literal
JSON structures during Process Execution, only using raw-data (i.e: encoding string)
or a file reference.
See also
File tests/functional/application-packages/EchoProcess/describe.yml provides example I/O definitions for mentioned special OAS interpretations and more advanced JSON schemas with nested OAS keywords.
File References
It is important to consider that all OAS schema
that can be provided during a Deploy
request or retrieved from a Process Description only define the expected value
representations of the I/O data to be submitted for Execution request.
In other words, an I/O typed as Complex
that can be submitted using any of the supported
File Reference Types to be forwarded to CWL SHOULD NOT add any URI-related definition in schema
.
It is implicit for every Process that an I/O of given supported Media-Types can be submitted by
reference using a link pointing to contents of such types. This implicit file reference interpretation serves multiple
purposes.
Using only expected value definition and leaving out the by-reference equivalent greatly simplifies the
schema
definitions since every singleComplex
I/O does not need to provide a very verboseschema
containing aoneOf(file-ref,raw-data)
representation to indicate that data can be submitted both by value or by reference.Using a generic
{"type": "string", "format": "uri"}
OAS schema does not convey the Media-Types requirements as well as inferring them “link-to”{"type": "string", "contentMediaType: <format>}
. It is therefore better to omit them entirely as they do not add any I/O descriptive value.Because the above string-formatted
uri
are left out from definitions, it can instead be used explicitly in an I/O specification to indicate to Weaver that the Process uses aLiteral
URI string, that must not be fetched by Weaver, and must be passed down as plain string URI directly without modification or interpretation to the underlying CWL Application Package.
To summarize, strings with format: uri
will NOT be considered as Complex
I/O by Weaver. They will
be seen as any other string Literal
, but this allows a Process describing its I/O as an external URI
reference. This can be useful for an application that handles itself the retrieval of the resource referred to by this
URI. To represent supported formats of Complex
file references, the schema
should be represented using the
following structures. If the contentMediaType
happens to be JSON, then the explicit OAS object
schema can be added as well, as presented in JSON Types section.
{
"id:" "input",
"schema": {
"type": "string",
"contentMediaType": "image/png",
"contentEncoding": "base64"
}
}
|
{
"id:" "input",
"schema": {
"oneOf": [
{
"type": "string",
"contentMediaType": "application/gml+xml"
},
{
"type": "string",
"contentMediaType": "application/kml+xml"
}
]
}
}
|
Metadata
Metadata fields are transferred between WPS (from Process description) and CWL (from Application Package) when match is possible. Per I/O definition that support certain metadata fields (notably descriptions), are also transferred.
Note
Because the schema
(OAS) definitions are embedded within WPS I/O definitions, corresponding
metadata fields ARE NOT transferred. This choice is made in order to keep schema
succinct such that they
only describe the structure of the expected data type and format, and to avoid too much metadata duplication for
each I/O in the resulting Process description.
Below is a list of compatible elements.
Parameters in WPS Context |
Parameters in CWL Context |
---|---|
|
|
|
Supported fields [8] : |
|
|
|
|
|
|
Footnotes
When using these properties, it is expected that the CWL Application Package resolves
the $schemas
with a reference to the RDF Schema Definitions. Furthermore, the $namespaces
is expected to resolve the prefix s
to the http://schema.org definitions
corresponding to the RDF schema, as shown below.
See CWL Examples - Metadata and Authorship for a complete example using those fields and their expected contents.
$schemas:
- https://schema.org/version/latest/schemaorg-current-https.rdf
$namespaces:
s: http://schema.org
Using Secret Parameters
Under some circumstances, input parameters to a Job must be hidden, whether to avoid leaking credentials
required by the underlying application, or for using sensible information that should not be easily accessible.
In such cases, typical CWL string
inputs should not be directly employed.
There are 2 strategies available to employ secrets when working with Weaver:
Using the Vault feature
Using
cwltool:Secrets
hint
Secrets using the File Vault
Using File Vault Inputs essentially consists of wrapping any sensible data within an input of type File
,
which will be Uploaded to the Vault for Job execution. Once the file is accessed and
staged by the relevant Job, its contents are automatically deleted from the Vault. This offers a
secured single access endpoint only available by the client that uploaded the file, for a short period of time,
which decides for which Process it should be summited to with the corresponding authentication token and
Vault ID. Since the sensible data is contained within a file, its contents are only available by the targeted
Job for the selected Process, while logs will only display a temporary path.
However, the Vault approach as potential drawbacks.
It is a feature specific to Weaver, which will not be available an easily interoperable when involving other OGC API - Processes servers.
It forces the CWL to be implemented using a
File
input. While this is not necessarily an issue in some cases, it becomes the responsibility of the Application Package developer to figure out how to propagate the contained data to the relevant piece of code if a plain string is needed. To do so, the developer must also avoid outputting any information tostdout
. Otherwise, the data would be captured in Job logs and defeating the purpose of using the Vault.
Note
For more details about the Vault, refer to sections File Vault Inputs, Uploading File to the Vault, and the corresponding capabilities in cli_example_upload.
Secrets using the CWL Hints
An alternative approach is to use the CWL hints as follows:
{
"cwlVersion": "v1.2",
"inputs": {
"<input-name>": {
"type": "string"
}
},
"hints": {
"cwltool:Secrets": {
"secrets": [
"<input-name>"
]
}
},
"$namespaces": {
"cwltool": "http://commonwl.org/cwltool#"
}
}
Using this definition either in a class: CommandLineTool
(see CWL CommandLineTool)
or a class: Workflow
(see CWL Workflow) will instruct the underlying Job execution
to replace all specified inputs (i.e.: <input-name>
in the above example) to be masked in commands and logs.
Looking at Job logs, all sensible inputs will be replaced by a representation similar to (secret-<UUID>)
The original data identified by this masked definition will be substituted back only at the last possible moment,
when the underlying operation accessed it to perform its processing.
A few notable considerations must be taken when using the cwltool:Secrets
definition.
It is limited to
string
inputs. Any other literal data type and intermediate conversions would need to be handled explicitly by the Application Package maintainer.The secrets definition can only be provided in the
hints
section of the CWL document, meaning that any remote server supporting CWL are not required to support this feature. If the Application Package is expected to be deployed remotely, it is up to the client to determine whether the remote server will perform the necessary actions to mask sensible data. If unsupported, secrets could become visible in the Job logs as if they were submitted using typicalstring
inputs.The feature does not avoid any misuse of underlying commands that could expose the sensible data due to manipulation errors or the use of operations that are redirected to
stdout
. For example, if the shellecho
command is used within the CWL with an input listed incwltool:Secrets
, its value will still be displayed in plain text in the Job logs.