.. include:: references.rst .. _application-package: .. shortcuts for visualization .. |br| raw:: html
.. |na| replace:: *n/a* .. |nbsp| unicode:: 0xA0 :trim: .. |<=>| unicode:: 0x21D4 ************************* Application Package ************************* .. contents:: :local: :depth: 3 The :term:`Application Package` defines the internal script definition and configuration that will be executed by a :term:`Process`. This package is based on |CWL|_ (:term:`CWL`). Using the extensive |cwl-spec|_ as backbone for internal execution of the process allows it to run multiple type of applications, whether they are referenced to by :term:`Docker` image, scripts (`bash`, `python`, etc.), some remote :term:`Process` and more. .. note:: The large community and use cases covered by :term:`CWL` makes it extremely versatile. If you encounter any issue running your :term:`Application Package` in `Weaver` (such as file permissions for example), chances are that there exists a workaround somewhere in the |cwl-spec|_. Most typical problems are usually handled by some flag or argument in the :term:`CWL` definition, so this reference should be explored first. Please also refer to :ref:`FAQ` section as well as existing |Weaver-issues|_. Ultimately if no solution can be found, open an new issue about your specific problem. All processes deployed locally into `Weaver` using a :term:`CWL` package definition will have their full package definition available with |pkg-req|_ request. .. note:: The package request is a `Weaver`-specific implementation, and therefore, is not necessarily available on other :term:`ADES`/:term:`EMS` implementation as this feature is not part of |ogc-api-proc|_ specification. .. _app_pkg_types: Typical CWL Package Definition =========================================== .. _app_pkg_cmd: CWL CommandLineTool ------------------------ Following :term:`CWL` package definition represents the :py:mod:`weaver.processes.builtin.jsonarray2netcdf` process. .. literalinclude:: ../../weaver/processes/builtin/jsonarray2netcdf.cwl :language: YAML :linenos: The first main components is the ``class: CommandLineTool`` that tells `Weaver` it will be an *atomic* process (contrarily to :ref:`app_pkg_workflow` presented later). The other important sections are ``inputs`` and ``outputs``. These define which parameters will be expected and produced by the described application. `Weaver` supports most formats and types as specified by |cwl-spec|_. See :ref:`cwl-io-types` for more details. .. _app_pkg_script: Script Application ~~~~~~~~~~~~~~~~~~~~~~~~ When deploying a ``CommandLineTool`` that only needs to execute script or shell commands, it is recommended to define an appropriate |cwl-docker-req|_ to containerize the :term:`Process`, even though no *advanced* operation is needed. The reason for this is because there is no way for `Weaver` to otherwise know for sure how to provide all appropriate dependencies that this operation might need. In order to preserve processing environment and results separate between any :term:`Process` and `Weaver` itself, the executions will either be automatically containerized (with some default image), or blocked entirely when `Weaver` cannot resolve the appropriate execution environment. Therefore, it is recommended that the :term:`Application Package` provider defines a specific image to avoid unexpected failures if this auto-resolution changes across versions. Below are minimalistic :term:`Application Package` samples that make use of a shell command and a custom Python script for quickly running some operations, without actually needing to package any specialized :term:`Docker` image. The first example simply outputs the contents of a ``file`` input using the ``cat`` command. Because the :term:`Docker` image ``debian:stretch-slim`` is specified, we can guarantee that the command will be available within its containerized environment. In this case, we also take advantage of the ``stdout.log`` which is always collected by `Weaver` (along with the ``stderr``) in order to obtain traces produced by any :term:`Application Package` when performing :term:`Job` executions. .. literalinclude:: ../examples/docker-shell-script-cat.cwl :language: yaml :caption: Sample CWL definition of a shell script The second example takes advantage of the |cwl-workdir-req|_ to generate a Python script dynamically (i.e.: ``script.py``), prior to executing it for processing the received inputs and produce the output file. Because a Python runner is required, the |cwl-docker-req|_ specification defines a basic :term:`Docker` image that meets our needs. Note that in this case, special interpretation of ``$(...)`` entries within the definition can be provided to tell :term:`CWL` how to map :term:`Job` input values to the dynamically created script. .. literalinclude:: ../examples/docker-python-script-report.cwl :language: yaml :caption: Sample CWL definition of a Python script :name: example_app_pkg_script .. seealso:: See the :ref:`app_pkg_python` section for more utilities to help create an :term:`Application Package` from Python. .. seealso:: For other programing languages, see |cwl-dev-tools|_ for a list of related utilities that helps working with :term:`CWL`, some of which offering convertion capabilities. .. _app_pkg_python: Python CLI Applications ~~~~~~~~~~~~~~~~~~~~~~~~ When the :term:`Application Package` to be generated consists of a Python script, which happens to make use of the builtin |python-argparse|_ package, it is possible to employ the |argparse2tool|_ utility, which will automatically generate a corresponding :term:`CWL` definition using the specified :term:`CLI` arguments and their types. The |argparse2tool|_ utility can help quickly generate a valid :term:`CWL` definition, but it is the responsibility of the user to validate that converted arguments have the appropriate types, or any additional metadata required to properly describe the intended :term:`Process`. Notably, users might find the need to add appropriate ``format`` definitions to the :term:`I/O`, since those will generally be missing descriptive :term:`Media-Types`. .. note:: Although |argparse2tool|_ can help in the initial :term:`CWL` generation procedure, it is recommended to apply additional containerization best-practices, such as described in :ref:`app_pkg_script`, to increase chances to obtain a replicable and reusable :term:`Application Package` definition. .. seealso:: For pure Python scripts not using |python-argparse|_, the |scriptcwl|_ utility can be considered instead. .. seealso:: For Python code embedded in a |jupyter-notebook|_, refer to :ref:`app_pkg_jupyter_notebook` for more details. .. |python-argparse| replace:: ``argparse`` .. _python-argparse: https://docs.python.org/3/library/argparse.html .. |argparse2tool| replace:: ``argparse2tool`` .. _argparse2tool: https://github.com/hexylena/argparse2tool .. |scriptcwl| replace:: ``scriptcwl`` .. _scriptcwl: https://github.com/NLeSC/scriptcwl .. _app_pkg_jupyter_notebook: Jupyter Notebook Applications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When working on experimental or research applications, a |jupyter-notebook|_ is a popular development environment, due to its convenient interface for displaying results, interacting with visualization tools, or the larger plugin ecosystem that it can offer. However, a |jupyter-notebook|_ is typically insufficient by itself to describe a complete application. To help developers transition from a |jupyter-notebook|_ to :ref:`app_pkg_docker`, which ensures the :term:`Application Package` can be deployed and reused, the |jupyter-ipython2cwl|_ utility can be employed. Using |jupyter-repo2cwl| (after installing |jupyter-ipython2cwl|_ in the Python environment), it is possible to directly convert a Git repository reference containing a |jupyter-notebook|_ into deployable :term:`CWL` leveraging a :term:`Docker` container. To do this, the utility uses two strategies under the hood: 1. |jupyterhub-repo2docker|_ is employed to convert a Git repository into a :term:`Docker` container, with any applicable package requirements, project metadata, and advanced configuration details. 2. Python typing annotations provided by |jupyter-ipython2cwl|_ define the :term:`CWL` :term:`I/O` from variables and results located within the |jupyter-notebook|_. .. note:: Because |jupyterhub-repo2docker|_ is employed, which is highly adaptable to many use cases, all typical Python project `Configuration Files `_, such as ``requirements.txt``, ``environment.yml``, ``setup.py``, ``pyproject.toml``, etc. can be employed. The :term:`Docker` container dependencies can be provided with an explicit ``Dockerfile`` as well. Please refer to the official documentation for all advanced configuration options. Because Python type annotations are employed with |jupyter-repo2cwl| to indicate which variables will contain the :term:`CWL` :term:`I/O` references, it is actually possible to annotate a |jupyter-notebook|_ *without any additional package dependencies*. To do so, one only needs to employ *string annotations* as follows. .. literalinclude:: ../examples/jupyter_repo2cwl_python.py :language: python :caption: Sample CWL annotations of Python code in Jupyter Notebook :name: example_app_pkg_jupyter_repo2cwl_python .. seealso:: See `IPython2CWL Supported Types `_ for more details about the mapping from a Python annotation to the resulting :term:`CWL` :ref:`cwl-io-types`. When the above code is saved in a |jupyter-notebook|_ and committed to a Git repository, the |jupyterhub-repo2docker|_ utility can automatically clone the repository, parse the Python code, extract the :term:`CWL` annotations, and generate the :term:`Application Package` with a :term:`Docker` container containing all of their respective definitions. All of this is accomplished with a single call to obtain a deployable :term:`CWL` in `Weaver`, which can then take over from the :ref:`Process Deployment ` to obtain an :term:`OGC API - Processes` definition. Jupyter Notebook to CWL Example: NCML to STAC Application ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For a more concrete example of a |jupyter-notebook|_ convertion to :term:`CWL`, see the |ncml2stac-repo|_ GitHub repository, which contains a sample |ncml2stac-notebook|_. This script, as indicated by its name, converts *NCML XML metadata with CMIP6 attributes* into the corresponding *SpatioTemporal Asset Catalog* (STAC) definition and extensions. It uses the same |jupyter-ipython2cwl|_ type annotation strategy as presented :ref:`above ` to indicate which NCML ``File`` variable is to be employed as as the :term:`CWL` input reference, and the expected STAC ``File`` as output to be collected by :term:`CWL`. Using |jupyter-repo2cwl| and the :ref:`Weaver CLI ` in combination, as shown below, it is possible to automatically convert the |jupyter-notebook|_ script into a Dockerized :term:`CWL` and deploy it to a :term:`OGC API - Processes` server supporting :term:`Application Package` such as `Weaver`. .. code-block:: shell :caption: |jupyter-notebook|_ conversion to :term:`CWL` and deployment as :term:`OGC API - Processes` jupyter-repo2cwl "https://github.com/crim-ca/ncml2stac" -o /tmp weaver deploy -u http://example.com/weaver -i ncml2stac --cwl /tmp/notebooks_ncml2stac.cwl .. seealso:: - Refer to the |ncml2stac-repo|_ repository's README for more details about the utilities. - Refer to the |ncml2stac-notebook|_ for the implementation of the :term:`Application Package` script. .. |jupyter-notebook| replace:: Jupyter Notebook .. _jupyter-notebook: https://jupyter.org/ .. |jupyterhub-repo2docker| replace:: ``jupyterhub/repo2docker`` .. _jupyterhub-repo2docker: https://github.com/jupyterhub/repo2docker .. |jupyter-repo2cwl| replace:: ``jupyter repo2cwl`` .. |jupyter-ipython2cwl| replace:: ``IPython2CWL`` .. _jupyter-ipython2cwl: https://github.com/common-workflow-lab/ipython2cwl .. |ncml2stac-repo| replace:: ``crim-ca/ncml2stac`` .. _ncml2stac-repo: https://github.com/crim-ca/ncml2stac/tree/main#ncml-to-stac .. |ncml2stac-notebook| replace:: NCML to STAC Jupyter Notebook .. _ncml2stac-notebook: https://github.com/crim-ca/ncml2stac/blob/main/notebooks/ncml2stac.ipynb .. _app_pkg_docker: Dockerized Applications ~~~~~~~~~~~~~~~~~~~~~~~~ When advanced processing capabilities and more complicated environment preparation are required, it is recommended to package and push pre-built :term:`Docker` images to a remote registry. In this situation, just like for :ref:`app_pkg_script` examples, the |cwl-docker-req|_ is needed. The definitions would also be essentially the same as previous examples, but with more complicated operations and possibly larger amount of inputs or outputs. Whenever a :term:`Docker` image reference is detected, `Weaver` will ensure that the application will be pulled using :term:`CWL` capabilities in order to run it. .. literalinclude:: ../../weaver/wps_restapi/examples/deploy_process_yaml.cwl :caption: Sample CWL definition of a Dockerized Application :language: yaml Because :term:`Application Package` providers could desire to make use of :term:`Docker` images hosted on private registries, `Weaver` offers the capability to specify an authorization token through HTTP request headers during the :term:`Process` deployment. More specifically, the following definition can be provided during a :ref:`Deploy ` request. .. code-block:: http POST /processes HTTP/1.1 Host: weaver.example.com Content-Type: application/json;charset=UTF-8 X-Auth-Docker: Basic { "processDescription": { }, "executionUnit": { } } The ``X-Auth-Docker`` header should be defined exactly like any typical ``Authorization`` headers (|auth-schemes|_). The name ``X-Auth-Docker`` is inspired from existing implementations that employ ``X-Auth-Token`` in a similar fashion. The reason why ``Authorization`` and ``X-Auth-Token`` headers are not themselves employed in this case is to ensure that they do not interfere with any proxy or server authentication mechanism, which `Weaver` could be located behind. For the moment, only ``Basic`` (:rfc:`7617`) authentication is supported. To generate the base64 token, following methods can be used: .. code-block:: shell :caption: Command Line echo -n ":" | base64 .. code-block:: python :caption: Python import base64 base64.b64encode(b":") When the HTTP ``X-Auth-Docker`` header is detected in combination of a |cwl-docker-req|_ entry within the :term:`Application Package` of the :term:`Process` being deployed, `Weaver` will parse the targeted :term:`Docker` registry defined in ``dockerPull`` and will attempt to identify it for later authentication towards it with the provided token. Given a successful authentication, `Weaver` should then be able to pull the :term:`Docker` image whenever required for launching new :term:`Job` executions. .. note:: `Weaver` only attempts to authenticate itself temporarily at the moment when the :term:`Job` is submitted to retrieve the :term:`Docker` image, and only if the image is not already available locally. Because of this, the provided authentication token should have a sufficient lifetime to run the :term:`Job` at later times, considering any retention time of cached :term:`Docker` images on the server. If the cache is cleaned, and the :term:`Docker` image is made unavailable, `Weaver` will attempt to authenticate itself again when receiving the new :term:`Job`. It is left up to the developer and :term:`Application Package` provider to manage expired tokens in `Weaver` according to their needs. To resolve such cases, the :ref:`Replace Process ` request or an entire re-deployment of the :term:`Process` (:ref:`Undeploy ` and :ref:`Deploy `) including the new ``X-Auth-Docker`` header can be employed to update the token used for authentication. .. versionadded:: 4.5 Specification and handling of the ``X-Auth-Docker`` header for providing an authentication token. .. _app_pkg_resources: GPU and Resource dependant Applications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When an :term:`Application Package` requires GPU or any other minimal set of hardware capabilities, such as in the case of machine learning or high-performance computing tasks, the submitted :term:`CWL` must explicitly indicate those requirements to ensure they can be met for performing its execution. Similarly, an :term:`Application Package` that must obtain external access to remote contents must not assume that the connection would be available, and must therefore request network access. Below are examples where such requirements are demonstrated and how to define them. .. literalinclude:: ../examples/requirement-cuda.cwl :language: yaml :caption: Sample CWL definition with CUDA capabilities :name: example_app_pkg_cuda .. literalinclude:: ../examples/requirement-resources.cwl :language: yaml :caption: Sample CWL definition with computing resources :name: example_app_pkg_resources .. literalinclude:: ../examples/requirement-network.cwl :language: yaml :caption: Sample CWL definition with network access :name: example_app_pkg_network Above requirements can be combined in any fashion as needed. They can also be combined with any other requirements employed to define the core components of the application. Whenever possible, requirements should be provided with values that best match the minimum and maximum amount of resources that the :term:`Application Package` operation requires. More precisely, over-requesting resources should be avoided as this could lead to failing :term:`Job` execution if the server or worker node processing it deems it cannot fulfill the requirements because they are too broad to obtain proper resource assignation, because it has insufficient computing resources, or simply for rate-limiting/fair-share reasons. Although definitions such as |cwl-resource-req|_ and |cwl-cuda-req|_ are usually applied for atomic operations, they can also become relevant in the context of :ref:`app_pkg_workflow` execution. Effectively, providing the required hardware capabilities for each atomic application can allow the :term:`Workflow` engine to better schedule :term:`Job` steps. For example, if two computationally heavy steps happened to have no restriction for parallelization based on the :term:`Workflow` steps definition alone, but that running both of them simultaneously on the same machine would necessarily end up causing an ``OutOfMemory`` error due to insufficient resources, those requirements could help preemptively let the engine know to wait until *reserved* resources become available. As a result, execution of the second task could be delayed until the first task is completed, therefore avoiding the error. .. versionadded:: 4.17 Support of |cwl-resource-req|_. .. versionadded:: 4.27 Support of |cwl-network-req|_ and |cwl-cuda-req|_. .. versionchanged:: Deprecated ``DockerGpuRequirement``. .. warning:: Any :term:`Application Package` that was making use of ``DockerGpuRequirement`` should be updated to employ the official |cwl-docker-req|_ in combination with |cwl-cuda-req|_. For backward compatibility, any detected ``DockerGpuRequirement`` definition will be updated automatically with a minimalistic |cwl-cuda-req|_ definition using a very lax set of CUDA capabilities. It is recommended to provide specific configurations for your needs. .. _app_pkg_remote: .. _app_pkg_wps1: .. _app_pkg_ogc_api: .. _app_pkg_esgf_cwt: Remote Applications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To define an application that refers to a :ref:`proc_remote_provider`, an :ref:`proc_wps_12`, an :ref:`proc_ogc_api` or an :ref:`proc_esgf_cwt` endpoint, the corresponding `Weaver`-specific :term:`CWL`-like requirements must be employed to indicate the URL where that remote resource is accessible. Once deployed, the contained :term:`CWL` package and the resulting :term:`Process` will be exposed as a :ref:`proc_ogc_api` resource. Upon reception of a :ref:`Process Execution ` request, `Weaver` will take care of resolving the indicated process URL from the :term:`CWL` requirement and will dispatch the execution to the resource after applying any relevant I/O, parameter and Media-Type conversion to align with the target server standard for submitting the :term:`Job` requests. Below are examples of the corresponding :term:`CWL` requirements employed for each type of remote application. .. code-block:: yaml :caption: WPS-1/2 Package Definition cwlVersion: "v1.0" class: CommandLineTool hints: WPS1Requirement: provider: "https://example.com/ows/wps/catalog" process: "getpoint" .. code-block:: yaml :caption: OGC API Package Definition cwlVersion: "v1.0" class: CommandLineTool hints: OGCAPIRequirement: process: "https://example.com/ogcapi/processes/getpoint" .. code-block:: json :caption: ESGF-CWT Package Definition { "cwlVersion": "v1.0", "class": "CommandLineTool", "hints": { "ESGF-CWTRequirement": { "provider": "https://edas.nccs.nasa.gov/wps/cwt", "process": "xarray.subset" } } } .. seealso:: - :ref:`proc_remote_provider` - :ref:`proc_wps_12` - :ref:`proc_ogc_api` - :ref:`proc_esgf_cwt` .. _app_pkg_workflow: CWL Workflow ------------------------ `Weaver` also supports :term:`CWL` ``class: Workflow``. When an :term:`Application Package` is defined this way, the :ref:`Process Deployment ` operation will attempt to resolve each ``step`` as another :term:`Process`. The reference to the :term:`CWL` definition can be placed in any location supported as for the case of atomic processes (see details about :ref:`supported package locations `). The following :term:`CWL` definition demonstrates an example ``Workflow`` process that would resolve each ``step`` with local processes of match IDs. .. literalinclude:: ../../tests/functional/application-packages/WorkflowSubsetIceDays/package.cwl :language: JSON :linenos: For instance, the ``jsonarray2netcdf`` (:ref:`Builtin`) middle step in this example corresponds to the `CWL CommandLineTool`_ process presented in previous section. Other processes referenced in this ``Workflow`` can be found in |test-res|_. Steps processes names are resolved using the variations presented below. Important care also needs to be given to inputs and outputs definitions between each step. .. |test-res| replace:: Weaver Test Resources .. _test-res: https://github.com/crim-ca/weaver/tree/master/tests/functional/application-packages Step Reference ~~~~~~~~~~~~~~~~~ In order to resolve referenced processes as steps, `Weaver` supports 3 formats. 1. | Process ID explicitly given. | Any *visible* process from |getcap-req|_ response should be resolved this way. | (e.g.: ``jsonarray2netcdf`` resolves to pre-deployed :py:mod:`weaver.processes.builtin.jsonarray2netcdf`). 2. Full URL to the process description endpoint, provided that it also offers a |pkg-req|_ endpoint (`Weaver`-specific). 3. Full URL to the explicit `CWL` file (usually corresponding to (2) or the ``href`` provided in deployment body). When an URL to the :term:`CWL` process "file" is provided with an extension, it must be one of the supported values defined in :py:data:`weaver.processes.wps_package.PACKAGE_EXTENSIONS`. Otherwise, `Weaver` will refuse it as it cannot figure out how to parse it. Because `Weaver` and the underlying `CWL` executor need to resolve all steps in order to validate their input and output definitions correspond (id, format, type, etc.) in order to chain them, all intermediate processes **MUST** be available. This means that you cannot :ref:`Deploy ` nor :ref:`Execute ` a ``Workflow``-flavored :term:`Application Package` until all referenced steps have themselves been deployed and made visible. .. warning:: Because `Weaver` needs to convert given :term:`CWL` documents into equivalent :term:`WPS` process definition, embedded :term:`CWL` processes within a ``Workflow`` step are not supported currently. This is a known limitation of the implementation, but not much can be done against it without major modifications to the code base. See also issue `#56 `_. .. seealso:: - :py:func:`weaver.processes.wps_package.get_package_workflow_steps` - :ref:`Deploy ` request details. Step Inputs/Outputs ~~~~~~~~~~~~~~~~~~~~~ Inputs and outputs of connected steps are required to match types and formats in order for the workflow to be valid. This means that a process that produces an output of type ``String`` cannot be directly chained to a process that takes as input a ``File``, even if the ``String`` of the first process represents an URL that could be resolved to a valid file reference. In order to chain two such processes, an intermediate operation would need to be defined to explicitly convert the ``String`` input to the corresponding ``File`` output. This is usually accomplished using :ref:`Builtin` processes, such as in the previous example. Since formats must also match (e.g.: a process producing ``application/json`` cannot be mapped to one producing ``application/x-netcdf``), all mismatching formats must also be converted with an intermediate step if such operation is desired. This ensures that workflow definitions are always explicit and that as little interpretation, variation or assumptions are possible between each execution. Because of this, all application generated by `Weaver` will attempt to preserve and enforce matching input/output ``format`` definition in both :term:`CWL` and :term:`WPS` as long as it does not introduce ambiguous results (see :ref:`File Format` for more details). .. _cwl-wps-mapping: Correspondence between CWL and WPS fields =========================================== Because :term:`CWL` definition and :term:`WPS` process description inherently provide "duplicate" information, many fields can be mapped between one another. In order to handle any provided metadata in the various supported locations by both specifications, as well as to extend details of deployed processes, each :term:`Application Package` get its details merged with complementary :term:`WPS` description. In some cases, complementary details are only documentation-related, but some information directly affect the format or execution behaviour of some parameters. A common example is the ``maxOccurs`` field provided by :term:`WPS` that does not have an exactly corresponding specification in :term:`CWL` (any-sized array). On the other hand, :term:`CWL` also provides data preparation steps such as initial staging (i.e.: ``InitialWorkDirRequirement``) that doesn't have an equivalent under the :term:`WPS` process description. For this reason, complementary details are merged and reflected on both sides (as applicable), when non-ambiguous resolution is possible. In case of conflicting metadata, the :term:`CWL` specification will most of the time prevail over the :term:`WPS` metadata fields simply because it is expected that a strict `CWL` specification is provided upon deployment. The only exceptions to this situation are when :term:`WPS` specification help resolve some ambiguity or when :term:`WPS` enforces the parametrisation of some elements, such as with ``maxOccurs`` field. .. note:: Metadata merge operation between :term:`CWL` and :term:`WPS` is accomplished on *per-mapped-field* basis. In other words, more explicit details such as ``maxOccurs`` could be obtained from :term:`WPS` and **simultaneously** the same input's ``format`` could be obtained from the :term:`CWL` side. Merge occurs bidirectionally for corresponding information. The merging strategy of process specifications also implies that some details can be omitted from one context if they can be inferred from corresponding elements in the other. For example, the :term:`CWL` and :term:`WPS` context both define ``keywords`` (with minor naming variation) as a list of strings. Specifying this metadata in both locations is redundant and only makes the process description longer. Therefore, the user is allowed to provide only one of the two and `Weaver` will take care to propagate the information to the lacking location. In order to help understand the resolution methodology between the contexts, following sub-section will cover supported mapping between the two specifications, and more specifically, how each field impacts the mapped equivalent metadata. .. warning:: Merging of corresponding fields between :term:`CWL` and :term:`WPS` is a `Weaver`-specific implementation. The same behaviour is not necessarily supported by other implementations. For this reason, any converted information between the two contexts will be transferred to the other context if missing in order for both specification to reflect the similar details as closely as possible, wherever context the metadata originated from. Inputs/Outputs ID ----------------------- Inputs and outputs (:term:`I/O`) ``id`` from the :term:`CWL` context will be respectively matched against corresponding ``id`` or ``identifier`` field from :term:`I/O` of :term:`WPS` context. In the :term:`CWL` definition, all of the allowed :term:`I/O` structures are supported, whether they are specified using an array list with explicit definitions, using "shortcut" variant (i.e.: ``[]``), or using key-value pairs (see |cwl-io-map|_ for more details). Regardless of array or mapping format, :term:`CWL` requires that all :term:`I/O` have unique ``id``. On the :term:`WPS` side, either a mapping or list of :term:`I/O` are also expected with unique ``id``. .. versionchanged:: 4.0 Previous versions only supported :term:`WPS` :term:`I/O` using the listing format. Both can be used interchangeably in both :term:`CWL` and :term:`WPS` contexts as of this version. To summarize, the following :term:`CWL` and :term:`WPS` :term:`I/O` definitions are all equivalent and will result into the same process definition after deployment. For simplification purpose, below examples omit all but mandatory fields (only of the ``inputs`` and ``outputs`` portion of the full deployment body) to produce the same result. Other fields are discussed afterward in specific sections. .. table:: :class: table-code :align: center +-----------------------------------+----------------------------------------+----------------------------------+ | .. code-block:: json | .. code-block:: json | .. code-block:: json | | :caption: CWL I/O objects array | :caption: CWL I/O key-value mapping | :caption: WPS I/O definition | | :linenos: | :linenos: | :linenos: | | | | | | { | { | { | | "inputs": [ | "inputs": { | "inputs": [ | | { | "single-str": { | { | | "id": "single-str", | "type": "string" | "id": "single-str" | | "type": "string" | }, | }, | | }, | "multi-file": { | { | | { | "type": "File[]" | "id": "multi-file", | | "id": "multi-file", | } | "formats": [] | | "type": "File[]" | }, | } | | } | "outputs": { | ], | | ], | "output-1": { | "outputs": [ | | "outputs": [ | "type": "File" | { | | { | }, | "id": "output-1", | | "id": "output-1", | "output-2": { | "formats": [] | | "type": "File" | "type": "File" | }, | | }, | } | { | | { | } | "id": "output-2", | | "id": "output-2", | } | "formats": [] | | "type": "File" | | } | | } | | ] | | ] | | } | | } | | | +-----------------------------------+----------------------------------------+----------------------------------+ The :term:`WPS` example above requires a ``format`` field for the corresponding :term:`CWL` ``File`` type in order to distinguish it from a plain string. More details are available in :ref:`cwl-io-types` below about this requirement. Finally, it is to be noted that above :term:`CWL` and :term:`WPS` definitions can be specified in the :ref:`Deploy ` request body with any of the following variations: 1. Both are simultaneously fully specified (valid although extremely verbose). 2. Both partially specified as long as sufficient complementary information is provided. 3. Only :term:`CWL` :term:`I/O` is fully provided (with empty or even unspecified ``inputs`` or ``outputs`` section from :term:`WPS`). .. warning:: `Weaver` assumes that its main purpose is to eventually execute an :term:`Application Package` and will therefore prioritize specification in :term:`CWL` over :term:`WPS` to infer types. Because of this, any unmatched ``id`` from the :term:`WPS` context against provided :term:`CWL` ``id``\s of the same :term:`I/O` section **will be dropped**, as they ultimately would have no purpose during :term:`CWL` execution. This does not apply in the case of referenced :ref:`proc_wps_12` processes since no :term:`CWL` is available in the first place. Similarly, when deploying a :ref:`Remote OGC API - Processes ` by :term:`URL` reference, it is expected that only the :term:`OAS` context (see :ref:`oas_io_schema`) with ``schema`` are provided. Therefore, these definitions could overrule the :term:`CWL` resolution that would normally occur. .. _cwl-io-types: Inputs/Outputs Type ----------------------- In the :term:`CWL` context, the ``type`` field indicates the type of :term:`I/O`. Available types are presented in the |cwl-io-type|_ portion of the :term:`CWL` specification. .. _warn-any: .. warning:: `Weaver` does not support :term:`CWL` ``type: Any``. This limitation is **intentional** in order to guarantee proper resolution of :term:`CWL` types to their corresponding :term:`WPS` definitions. Furthermore, the ``Any`` type would make the :term:`Process` description too ambiguous. Type Correspondance ~~~~~~~~~~~~~~~~~~~~ A summary of applicable types is presented below. Those :term:`CWL` types can be mapped to :term:`WPS` and/or :term:`OAS` contexts in order to obtain corresponding :term:`I/O` definitions. However, not every type exists in each of those contexts. Therefore, some types will necessarily be simplified or converted to their best corresponding match when exact mapping cannot be accomplished. The simplification of types can happen when converting in any direction (:term:`CWL` |nbsp| |<=>| |nbsp| :term:`WPS` |nbsp| |<=>| |nbsp| :term:`OAS`). It all depends on which definitions that were provided are the more specific. For example, a :term:`WPS` ``dateTime`` will be simplified to a generic :term:`CWL` ``string``, and into an :term:`OAS` ``string`` with ``format: "date-time"``. In this example, it would be important to provide the :term:`WPS` or :term:`OAS` definitions if the *date-time* portion was critical, since it could not be inferred only from :term:`CWL` ``string`` (since it doesn't define this concept). .. seealso:: Further details for the :term:`OAS` context are provided in :ref:`oas_io_schema`. Further details regarding handling methods or important considerations for specific types will be presented in :ref:`cwl-type` and :ref:`cwl-dir` sections. +----------------------+-------------------------+------------------------+--------------------------------------------+ | :term:`CWL` ``type`` | :term:`WPS` data type | :term:`OAS` type | Description | | | and sub-type [#note1]_ | | | +======================+=========================+========================+============================================+ | ``Any`` | |na| | |na| | Not supported. See :ref:`note `. | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``null`` | |na| | |na| | Cannot be used by itself. |br| | | | | | Represents optional :term:`I/O` when | | | | | combined with other types [#note2]_. | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``boolean`` | ``Literal`` |br| | ``boolean`` | Binary value. | | | (``bool``, ``boolean``) | | | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``int``, | ``Literal`` |br| | ``integer``, | Numeric whole value. |br| | | ``long`` | (``int``, ``integer``, | ``number`` |br| | Unless when explicit conversion between | | | ``long``, | (format: ``int32``, | contextes can accomplished, the generic | | | ``positiveInteger``, | ``int64``) [#note3]_ | ``integer`` will be employed. | | | ``nonNegativeInteger``) | | | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``float``, | ``Literal`` |br| | ``number`` |br| | Numeric floating-point value. | | ``double`` | (``float``, ``double``, | (format: ``float``, | By default, ``float`` is used unless more | | | ``scale``, ``angle``) | ``double``) [#note3]_ | explicit context conversion can be | | | | | accomplished [#note4]_. | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``string`` | ``Literal`` |br| | ``string`` |br| | Generic string. Default employed if | | | (``string``, ``date``, | (format: ``date``, | nothing more specific is resolved. |br| | | | ``time``, ``dateTime``, | ``time``, | | | | ``anyURI``) | ``datetime``, | This type can be used to represent any | | | | ``date-time``, | :ref:`File Reference ` | | | | ``full-date``, | as plain URL string without resolution. | | | | ``uri``, ``url``, | | | | | etc.) [#note5]_ | | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``File`` | ``BoundingBox`` | :term:`JSON` [#note6]_ | Partial support available [#noteBBOX]_. | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``File`` | ``Complex`` | :term:`JSON` [#note6]_ | :ref:`File Reference ` | | | | | with Media-Type validation and staging | | | | | according to the applicable scheme. | +----------------------+-------------------------+------------------------+--------------------------------------------+ | ``Directory`` | ``Complex`` | :term:`JSON` [#note6]_ | :ref:`Directory Reference ` | | | | | handled as nested ``Files`` to stage. | +----------------------+-------------------------+------------------------+--------------------------------------------+ .. rubric:: Details .. [#note1] Resolution method according to critical fields defined in :ref:`cwl-type`. .. [#note2] More details in :ref:`oas_basic_types` and :ref:`cwl-array-null-values` sections. .. [#note3] Number is used in combination with ``format`` to find best match between integer and floating point values. If not provided, it defaults to ``float`` to handle both cases. .. [#note4] The ``float`` name is employed loosely to represent any *floating-point* value rather than *single-precision* (16-bits). Its internal representation is *double-precision* (32-bits) given that the implementation is in Python. .. [#note5] Because ``string`` is the default, any ``format`` and ``pattern`` can be specified. More specific types with these items can help apply additional validation, although not strictly enforced. .. [#note6] Specific schema required as described in :ref:`oas_json_types`. .. [#noteBBOX] The :term:`WPS` data type ``BoundingBox`` has a schema definition in :term:`WPS` and :term:`OAS` contexts, but is not handled natively by :term:`CWL` types. When the conversion to a :term:`CWL` job occurs, an equivalent ``Complex`` type using a :term:`CWL` ``File`` with ``format: ogc-bbox`` and the contents stored as :term:`JSON` is employed. It is up to the :term:`Application Package` to parse this :term:`JSON` content as necessary. Alternatively, it is possible to use a ``Literal`` data of type ``string`` corresponding to :term:`WKT` (see |wkt-example|_) if it is deemed preferable that the :term:`CWL` script receives the data directly without intermediate interpretation. .. _cwl-type: Type Resolution ~~~~~~~~~~~~~~~ In the :term:`WPS` context, three data types exist, namely ``Literal``, ``BoundingBox`` and ``Complex`` data. As presented in previous examples, :term:`I/O` in the :term:`WPS` context does not require an explicit indication of which data type from one of ``Literal``, ``BoundingBox`` and ``Complex`` to apply. Instead, :term:`WPS` type can be inferred using the matched API schema of the :term:`I/O`. For instance, ``Complex`` :term:`I/O` (e.g.: :ref:`File Reference `) requires the ``formats`` field to distinguish it from a plain ``string``. Therefore, specifying either ``format`` in :term:`CWL` or ``formats`` in :term:`WPS` immediately provides all needed information for `Weaver` to understand that this :term:`I/O` is expected to be a file reference. .. code-block:: json :caption: WPS Complex Data Type :linenos: { "id": "input", "formats": [ {"mediaType": "application/json", "default": true} ] } A combination of ``supportedCRS`` objects providing ``crs`` references would otherwise indicate a ``BoundingBox`` :term:`I/O` (see :ref:`note `). .. code-block:: json :caption: WPS BoundingBox Data Type :linenos: { "id": "input", "supportedCRS": [ {"crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84", "default": true} ] } If none of the two previous schemas are matched, the :term:`I/O` type resolution falls back to ``Literal`` data of ``string`` type. To employ another primitive data type such as ``Integer``, an explicit indication needs to be provided as follows. .. code-block:: json :caption: WPS Literal Data Type :linenos: { "id": "input", "literalDataDomains": [ {"dataType": {"name": "integer"}} ] } Obviously, the equivalent :term:`CWL` definition is simpler in this case (i.e.: only ``type: int`` is required). It is therefore *recommended* to take advantage of `Weaver`'s merging strategy during :ref:`Process Deployment ` in this case by providing only the details through the :term:`CWL` definition and have the corresponding :term:`WPS` I/O type automatically deduced by the generated process. If desired, ``literalDataDomains`` can still be explicitly provided as above to ensure that it gets parsed as intended type. .. versionadded:: 4.16 With more recent versions of `Weaver`, it is also possible to employ :term:`OpenAPI` schema (:term:`OAS`) definitions provided in the I/O to specify the explicit structure that applies to ``Literal``, ``BoundingBox`` and ``Complex`` data types. When :term:`OpenAPI` schema are detected, they are also considered in the merging strategy along with other specifications provided in :term:`CWL` and :term:`WPS` contexts. More details about :term:`OAS` context is provided in :ref:`oas_io_schema` section. .. _dir_ref_type: .. _cwl-dir: Directory Type ~~~~~~~~~~~~~~ .. versionchanged:: 4.27 Support of :term:`CWL` ``type: Directory`` added to `Weaver`. In order to map a ``Directory`` to the underlying :term:`WPS` :term:`Process` that do not natively offer this type of reference, a ``Complex`` "*pseudo-file*" using Media-Type ``application/directory`` is employed. For further validation that a ``Directory`` is properly parsed by `Weaver`, provided URL references must also end with a trailing slash (``/``) character. .. warning:: Note that, when using ``Directory`` type, very few format and content validation can be accomplished for individual files contained in that directory. The contents must therefore match the definitions expected by the application receiving it. No explicit validation is accomplished by `Weaver` to ensure if expected contents are available. When a ``Directory`` type is specified in the :term:`Process` definition, and that a :ref:`File Reference ` is provided during :ref:`Execution `, the reference pointed to as ``Directory`` must provide a listing of files. Those files can either be relative to the ``Directory`` or other absolute :ref:`File Reference ` locations. The applicable scheme to stage those files will be applied as needed based on resolved references. It is therefore possible to mix URL schemes between the listed references. For example, a ``Directory`` listing as :term:`JSON` obtained from a ``https://`` endpoint could provide multiple ``File`` locations from ``s3://`` buckets to stage for :ref:`Process Execution `. The following ``Directory`` listing formats are supported. .. table:: :class: table-code :align: center :widths: 70,30 +-----------------------------------------------------------+------------------------------------------------------+ | Listing Format | Description | +===========================================================+======================================================+ | .. literalinclude:: ../examples/directory-listing.html | A file index where each reference to be staged | | :caption: HTML File Index | should be contained in a ```` tag. | | :language: yaml | | | | The structure can be contained in a ````, | | | an HTML list (``
    ``, ``