Processes
Type of Processes
Weaver supports multiple type of processes, as listed below. Each one of them are accessible through the same API interface, but they have different implications.
OGC API - Processes (formerly known as WPS-REST, WPS-T or WPS-3)
See also
Section Examples provides multiple concrete use cases of Deploy and Execute request payloads for diverse set of applications.
Builtin
These processes come pre-packaged with Weaver. They will be available directly on startup of the application and re-updated on each boot to make sure internal database references are updated with any source code changes.
Theses processes typically correspond to utility operations. They are specifically useful when employed as
step within a Workflow process that requires data-type conversion between input/output of similar, but not
perfectly, compatible definitions.
As of the latest release, following builtin processes are available:
weaver.processes.builtin.collection_processorImplements parsing capabilities to support Collection Input as defined by the OGC API - Processes - Part 3: Workflows extension. This allows Process Execution to employ Collection Inputs in certain cases when conditions are met.
weaver.processes.builtin.echo_processCorresponds to the OGC API - Processes - Part 1: Core - Echo Process definition. This Process is used to evaluate the API against the OGC Execution Test Suite (ETS) for the Weaver Product Implementation. It also is employed to test the implementation against a wide range of input and output formats.
weaver.processes.builtin.file2string_arrayTransforms a File Reference input into JSON file containing an array of file references as value. This is typically employed to resolve a JSON array containing multiple sub-file references, allowing to “unpack” a single item into multiple references.
weaver.processes.builtin.file_index_selectorSelects the single File Reference at the provided index within an array of file URL.
weaver.processes.builtin.jsonarray2netcdfTakes a single input JSON file which its content contains an array-list of NetCDF file references, and returns them directly as the corresponding list of output files. These two different file formats (single JSON to multiple NetCDF) can then be used to map two processes with these respective output and inputs.
weaver.processes.builtin.metalink2netcdfExtracts and fetches NetCDF files from a Metalink file containing an URL, and outputs the NetCDF file at a given index of the list.
All builtin processes are marked with weaver.processes.constants.CWL_REQUIREMENT_APP_BUILTIN in the
CWL hints section and are all defined in weaver.processes.builtin. For explicit schema
validation using the CWL requirements, the weaver:BuiltinRequirement can also be used.
WPS-1/2
This kind of Process corresponds to a traditional WPS XML or JSON endpoint (depending of supported version) prior to OGC API - Processes (WPS-REST, WPS-T, WPS-3) specification. When an OGC API - Processes description is deployed in Weaver using an URL reference to an WPS-1/2 process through the use of a Remote Applications requirement, Weaver parses and converts the XML or JSON body of the WPS response and registers the process locally. This allows a remote server offering limited functionalities (e.g.: no REST or OGC API bindings supported) to provide them through Weaver.
A minimal Deploy request body for this kind of process could be as follows:
{
"processDescription": {
"process": {
"id": "my-process-reference"
}
},
"executionUnit": [
{
"href": "https://example.com/wps?service=WPS&request=DescribeProcess&identifier=my-process&version=1.0.0"
}
]
}
This would tell Weaver to locally Deploy the my-process-reference process using the WPS-1
URL reference that is expected to return a DescribeProcess XML schema. Provided that this endpoint can be
resolved and parsed according to typical WPS specification, this should result into a successful Process
registration.
The deployed Process would then be accessible with DescribeProcess requests.
The above deployment procedure can be automated on startup using Weaver’s wps_processes.yml configuration file.
Please refer to Configuration of WPS Processes section for more details on this matter.
Warning
Because Weaver creates a snapshot of the reference process at the moment it was deployed, the local process definition could become out-of-sync with the remote reference where the Execute request will be sent. Refer to Remote Provider section for more details to work around this issue.
Any Process deployed from a WPS reference should have a resulting CWL definition that either
contains WPS1Requirement in the hints section, or weaver:WPS1Requirement in the requirements section.
See also
OGC API - Processes (WPS-REST, WPS-T, WPS-3)
This Process type is the main component of Weaver. All other types are converted to this one either through some parsing (e.g.: WPS-1/2) or with some requirement indicators (e.g.: Builtin, Workflow) for special handling. The represented Process is aligned with OGC API - Processes specifications.
When deploying one such Process directly, it is expected to have a definition specified with a CWL application-package, which provides resources about one of the described Typical CWL Package Definition.
This is most of the time employed to wrap operations packaged in a reference Docker image, but it can also
wrap Remote Applications to be executed on another server (i.e.: ADES). When the Process should be
deployed using a remote URL reference pointing at an existing OGC API - Processes description, the CWL should
contain either OGCAPIRequirement in the hints section, or weaver:OGCAPIRequirement in the requirements
section.
The referenced Application Package can be provided in multiple ways as presented below.
Note
When a process is deployed with any of the below supported Application Package formats, additional parsing of this CWL as well as complementary details directly within the WPS deployment body is accomplished. See Correspondence between CWL and WPS fields section for more details.
Package as Literal Execution Unit Block
In this situation, the CWL definition is provided as is using JSON-formatted package embedded within the
POST {WEAVER_URL}/processes (Deploy) request. The request payload would take the following shape:
{
"processDescription": {
"process": {
"id": "my-process-literal-package"
}
},
"executionUnit": [
{
"unit": {
"cwlVersion": "v1.0",
"class": "CommandLineTool",
"inputs": ["<...>"],
"outputs": ["<...>"],
"<...>": "<...>"
}
}
]
}
Package as External Execution Unit Reference
In this situation, the CWL is provided indirectly using an external file reference which is expected to have contents describing the Application Package (as presented in the Package as Literal Execution Unit Block case). Because an external file is employed instead of embedding the package within the JSON HTTP request contents, it is possible to employ both JSON and YAML definitions.
An example is presented below:
{
"processDescription": {
"process": {
"id": "my-process-reference-package"
}
},
"executionUnit": [
{
"href": "https://remote-file-server.com/my-package.cwl"
}
]
}
Where the referenced file hosted at "https://remote-file-server.com/my-package.cwl" could contain:
cwlVersion: "v1.0"
class: CommandLineTool
inputs:
- "<...>"
outputs:
- "<...>"
"<...>": "<...>"
ESGF-CWT
For ESGF-CWT processes, the ESGF-CWTRequirement must be used in the CWL hints section.
Using hints allows the CWL content to be parsed even if the schema reference is missing.
This can be useful for deploying the Process on other instances not implemented with Weaver.
Note however that executing the Process in such case will most potentially fail unless the other implementation
handles it with custom logic.
To define the Process with explicit CWL schema validation, the requirements section must be used
instead. To resolve the schema, the value weaver:ESGF-CWTRequirement should be used instead.
For an example CWL using this definition, see Remote Applications section.
This kind of Process allows for remote Execution and Monitoring of a Job dispatched to an instance that implements ESGF Compute API part of the Earth System Grid Federation. Using Weaver, this Process automatically obtains an OGC API - Processes (WPS-REST, WPS-T, WPS-3) representation.
Workflow
Processes categorized as Workflow are very similar to OGC API - Processes (WPS-REST, WPS-T, WPS-3) processes. From the API standpoint, they actually look exactly the same as an atomic process when calling DescribeProcess or Execute requests. The difference lies within the referenced Application Package which uses a CWL Workflow instead of typical CWL CommandLineTool, and therefore, modifies how the Process is internally executed.
For Workflow processes to be deploy-able and executable, it is mandatory that Weaver is configured as EMS or HYBRID (see: Configuration Settings). This requirement is due to the nature of Workflow that chain processes that need to be dispatched to known remote ADES servers (see: Configuration of Data Sources and Workflow Step Operations) according to defined Data Source configuration.
Given that a Workflow process was successfully deployed and that all process steps can be resolved, calling its Execute request will tell Weaver to parse the chain of operations and send step process execution requests to relevant ADES picked according to Data Source. Each step’s job will then gradually be monitored from the relevant ADES until completion.
Upon successful intermediate result, the EMS (or HYBRID acting as such) will stage the data references locally to chain them to the following step. When the complete chain succeeds, the final results of the last step will be provided as Workflow output in the same manner as for atomic processes. In case of failure, the error will be indicated in the logs with the appropriate step and message where the error occurred.
Note
Although chaining sub-workflow(s) within a bigger scoped Workflow is technically possible, this have not yet been fully explored (tested) in Weaver. There is a chance that Data Source resolution fails to identify where to dispatch the step in this situation. If this impacts you, please vote and indicate your concern on issue #171.
See also
Workflow Step Operations provides more details on each of the internal operations accomplished by individual step Process chained in a Workflow.
Remote Provider
A remote Provider corresponds to a service hosted remotely that provides similar or compatible (WPS-like) interfaces supported by Weaver. For example, a remote WPS-1 XML endpoint can be referenced as a Provider. When an API Providers-scoped request is executed, for example to list its process capabilities (see GetCapabilities), Weaver will send the corresponding request using the reference URL from the registered Provider to access the remote server and reply with the parsed response, as if its processes were registered locally.
Since remote providers obviously require access to the remote service, Weaver will only be able to provide results if the service is accessible with respect to standard implementation features and supported specifications.
The main advantage of using Weaver’s endpoint rather than directly accessing the referenced remote Provider processes is to palliate the limited functionalities offered by the service. For instance, WPS-1 do not always offer Monitoring a Job Execution (GetStatus) feature, and there is no extensive Job monitoring capabilities. Since Weaver effectively wraps the referenced Provider with its own endpoints, these features indirectly become employable through an extended OGC API - Processes interface. Similarly, although many WPS-1 offer XML-only responses, the parsing operation accomplished by Weaver makes theses services available as WPS-REST JSON endpoints with automatic conversion. On top of that, registering a remote Provider into Weaver allows the user to use it as a central hub to keep references to all his remotely accessible services and dispatch Job executions from a common location.
A remote provider differs from previously presented WPS-1/2 processes such that the underlying processes of the service are not registered locally. For example, if a remote service has two WPS processes, only top-level service URL will be registered locally (in Weaver’s database) and the application will have no explicit knowledge of these remote processes until requested. When calling Process-specific requests (e.g.: DescribeProcess or Execute), Weaver will re-send the corresponding request (with appropriate interface conversion) directly to the remote Provider each time and return the result accordingly. On the other hand, a WPS-1/2 reference would be parsed and saved locally with the response at the time of deployment. This means that a deployed WPS-1/2 reference would act as a snapshot of the reference Process (which could become out-of-sync), while Remote Provider will dynamically update according to the re-fetched response from the remote service each time, always keeping the obtained description in sync with the remote Provider. If our example remote service was extended to have a third WPS process, it would immediately and transparently be reflected in GetCapabilities and DescribeProcess retrieved by Weaver on Providers-scoped requests without any change to the registered Provider definition. This would not be the case for the WPS-1/2 reference that would need a manual update (i.e.: deploy the third Process to register it in Weaver).
An example body of the register provider request could be as follows:
{
"id": "my-service",
"url": "https://example.com/wps",
"public": true
}
Then, processes of this registered Remote Provider will be accessible. For example, if the referenced
service by the above URL add a WPS process identified by my-process, its JSON description would be obtained with
following request (DescribeProviderProcess):
GET {WEAVER_URL}/providers/my-service/processes/my-process
Note
Process my-process in the example is not registered locally. From the point of view of Weaver’s processes
(i.e.: route /processes/{id}), it does NOT exist. You must absolutely use the provider-prefixed route
/providers/{id}/processes/{id} to explicitly fetch and resolve this remote process definition.
Warning
API requests scoped under Providers are Weaver-specific implementation. These are not part of OGC API - Processes specification.
Managing Processes included in Weaver ADES/EMS
Following steps represent the typical steps applied to deploy a process, execute it and retrieve the results.
Register a New Process (Deploy)
Deployment of a new process is accomplished through the POST {WEAVER_URL}/processes POST {WEAVER_URL}/processes (Deploy) request.
See also
OGC API - Processes - Part 2: Deploy, Replace, Undeploy specification.
The request body requires mainly two components:
processDescription:Defines the Process identifier, metadata, inputs, outputs, and some execution specifications. This mostly corresponds to additional information that is provided by traditional WPS or OGC API - Processes definitions. A notable situation when this is required is when the followingexecutionUnitcannot directly resolve certain definitions specific to OGC API - Processes, such as a Media-Type or File Format not explicitly handled by CWL.See also
Section Correspondence between CWL and WPS fields provides further details about notable considerations that could require additional fields in
processDescriptionfor an adequate Process definition.executionUnit:Defines the core details of the Application Package. This corresponds to the explicit CWL definition or other Type of Processes references that indicates how to execute the underlying application.
Note
If the Process can be directly represented and converted from the CWL with regard to
all Correspondence between CWL and WPS fields considerations, the CWL might be directly deployed with the
appropriate application/cwl+json or application/cwl+yaml Media-Type in Content-Type header.
Upon deploy request, Weaver will either respond with a successful result, or with the appropriate error message, whether caused by conflicting ID, invalid definitions or other parsing issues. A successful process deployment will result in this process to become available for following steps.
Warning
When a process is deployed, it is not necessarily available immediately. This is because process visibility also
needs to be updated. The process must be made public to allow its discovery. Alternatively, the visibility can
be directly provided within the body of the deploy request to skip this extra step.
For specifying or updating visibility, please refer to corresponding POST {WEAVER_URL}/processes (Deploy) and PUT {WEAVER_URL}/processes/{processID}/visibility (Visibility) requests.
After deployment and visibility preconditions have been met, the corresponding process should become available through DescribeProcess requests and other routes that depend on an existing process.
Note that when a process is deployed using the OGC API - Processes (WPS-REST, WPS-T, WPS-3) interface, it also becomes available through the WPS-1/2 interface with the same identifier and definition. Because of compatibility limitations, some parameters in the WPS-1/2 side might not be perfectly mapped to the equivalent or adjusted OGC API - Processes (WPS-REST, WPS-T, WPS-3) interface, although this concerns mostly only new features such as Job status monitoring. For most traditional use cases, properties are mapped between the two interfaces, but it is recommended to use the OGC API - Processes (WPS-REST, WPS-T, WPS-3) one because of the added features.
See also
Please refer to application-package chapter for any additional parameters that can be provided for specific types of Application Package and Process definitions.
Access Registered Processes (GetCapabilities, DescribeProcess)
Available processes can all be listed using GET {WEAVER_URL}/processes (GetCapabilities) request. This request will return all locally registered
process summaries. Other return formats and filters are also available according to provided request query parameters.
Note that processes not marked with public visibility will not be listed in this result.
For more specific process details, the GET {WEAVER_URL}/processes/{processID} (DescribeProcess) request should be used. This will return all information
that define the process references and expected inputs/outputs.
Note
For remote processes (see: Remote Provider), Provider requests are also available for more fine-grained
search of underlying processes. These processes are not necessarily listed as local processes, and will therefore
sometime not yield any result if using the typical DescribeProcess request on wps_endpoint.
All routes listed under Process requests should normally be applicable for remote processes by prefixing
them with /providers/{id}.
Changed in version 4.20.
With the addition of Process revisions (see Update Operation below), a registered
Process specified only by {processID} will retrieve the latest revision of that Process.
A specific older revision can be obtained by adding the tagged version in the path ({processID}:{version}) or
adding the request query parameter version.
Using revisions provided through PUT and PATCH requests, it is also possible to list specific or all existing
revisions of a given or multiple processes simultaneously using the revisions and version query parameters with
the GET {WEAVER_URL}/processes (GetCapabilities) request.
Modify an Existing Process (Update, Replace, Undeploy)
Added in version 4.20.
Since Weaver supports OGC API - Processes - Part 2: Deploy, Replace, Undeploy, it is able to remove a previously registered Process using
the Deployment request. The undeploy operation consist of a DELETE request targeting the
specific {WEAVER_URL}/processes/{processID} to be removed.
Note
The Process must be accessible by the user considering any visibility configuration to perform this step. See Register a New Process (Deploy) section for details.
Starting from version 4.20, a Process can be replaced or
updated using respectively the PUT and PATCH requests onto the specific {WEAVER_URL}/processes/{processID}
location of the reference to modify.
Note
The Process partial update operation (using PATCH) is specific to Weaver only.
OGC API - Processes - Part 2: Deploy, Replace, Undeploy only mandates the definition of PUT request for full override of a Process.
When a Process is modified using the PATCH operation, only the new definitions need to be provided, and
unspecified items are transferred over from the referenced Process (i.e.: the previous revision). Using either
the PUT or PATCH requests, previous revisions can be referenced using two formats:
{processID}:{version}as request path parameters (instead of usual{processID}only){processID}in the request path combined with?version={version}query parameter
Weaver employs MAJOR.MINOR.PATCH semantic versioning to maintain revisions of updated or replaced Process
definitions. The next revision number to employ for update or replacement can either be provided explicitly in the
request body using a version, or be omitted. When omitted, the next revision will be guessed automatically based
on the previous available revision according to the level of changes required. In either cases, the resolved version
will have to be available and respect the expected update level to be accepted as a new valid Process revision.
The applicable revision level depends on the contents being modified using submitted request body fields according
to the following table. When a combination of the below items occur, the higher update level is required.
HTTP Method |
Level |
Change |
Examples |
|---|---|---|---|
|
|
Modifications to metadata not impacting the Process execution or definition. |
|
|
|
Modification that impacts how the Process could be executed, but not its definition. |
|
|
|
Modification that impacts what the Process executes. |
|
Note
For all applicable fields of updating a Process, refer to the schema of PATCH {WEAVER_URL}/processes/{processID} (Update).
For replacing a Process, refer instead to the schema of PUT {WEAVER_URL}/processes/{processID} (Replace). The replacement request contents
are extremely similar to the Deploy schema since the full Process definition must
be provided.
For example, if the test-process:1.2.3 was previously deployed, and is the active latest revision of that
Process, submitting the below request body will produce a PATCH revision as test-process:1.2.4.
PATCH revisionPATCH /processes/test-process:1.2.3 HTTP/1.1
Host: weaver.example.com
Content-Type: application/json
{
"description": "new description",
"inputs": {
"input": {
"description": "modified input description"
},
"outputs": {
"output": {
"title": "modified title"
}
}
}
}
Here, only metadata is adjusted and there is no risk to impact produced results or execution methods of the
Process. An external user would probably not even notice the Process changed, which is why PATCH
is reasonable in this case. Notice that the version is not explicitly provided in the body. It is guessed
automatically from the modified contents. Also, the example displays how Process-level and
inputs/outputs-level metadata can be updated.
Similarly, the following request would produce a MINOR revision of test-process. Since both PATCH and
MINOR level contents are defined for update, the higher MINOR revision is required. In this case MINOR is
required because jobControlOptions (forced to asynchronous execution for following versions) would break any
future request made by users that would expect the Process to run (or support) synchronous execution.
Notice that this time, the Process reference does not indicate the revision in the path (no :1.2.4 part).
This automatically resolves to the updated revision test-process:1.2.4 that became the new latest revision following
our previous PATCH request.
MINOR revisionPATCH /processes/test-process HTTP/1.1
Host: weaver.example.com
Content-Type: application/json
{
"description": "process async only",
"jobControlOptions": ["async-execute"],
"version": "1.4.0"
}
In this case, the desired version (1.4.0) is also specified explicitly in the body. Since the updated number
(MINOR = 4) matches the expected update level from the above table and respects an higher level than the reference
1.2.4 Process, this revision value will be accepted (instead of auto-resolved 1.3.0 otherwise). Note
that if 2.4.0 was specified instead, the version would be refused, as Weaver does not consider this modification
to be worth a MAJOR revision, and tries to keep version levels consistent. Skipping numbers (i.e.: 1.3.0 in this
case), is permitted as long as there are no other versions above of the same level (i.e.: 1.4.0 would be refused if
1.5.0 existed). This allows some level of flexibility with revisions in case users want to use specific numbering
values that have more meaning to them. It is recommended to let Weaver auto-update version values between updates if
this level of fined-grained control is not required.
Note
To avoid conflicting definitions, a Process cannot be Deployed directly using a
{processID}:{version} reference. Deployments are expected as the first revision and should only include the
{processID} portion as their identifier.
If the user desires a specific version to deploy, the PUT request should be used with the appropriate version
within the request body. It is although up to the user to provide the full definition of that Process,
as PUT request will completely replace the previous definition rather than transfer over previous updates
(i.e: PATCH requests).
Even when a Process is “replaced” using PUT, the older revision is not actually removed and undeployed
(DELETE request). It is therefore still possible to refer to the old revision using explicit references with the
corresponding version. Weaver keeps track of revisions by corresponding {processID} entries such that if
the latest revision is undeployed, the previous revision will automatically become the latest once again. For complete
replacement, the user should instead perform a DELETE of all existing revisions (to avoid conflicts) followed by a
new Deploy request.
Execution of a Process (Execute)
Process execution (i.e.: submitting a Job) is accomplished using the POST {WEAVER_URL}/processes/{processID}/execution (Execute) request.
Note
For backward compatibility, the POST {WEAVER_URL}/processes/{processID}/jobs (Execute) request is also supported as alias to the above
OGC API - Processes compliant endpoint.
See also
Alternatively, the POST {WEAVER_URL}/jobs (Create) request can also be used to submit a Job for later execution,
as well as enabling other advanced Job Management capabilities.
See Submitting a Job Creation for more details.
This section will first describe the basics of this request format (Execution Body), and after go into further details for specific use cases and parametrization of various input/output combinations (Execution Mode, Execution Results, etc.). Below are some examples of JSON body that can be sent to the Job execution endpoint to better illustrate where each of the mentioned parameters in following sections are expected.
Job Execution Payload as Listing
{
"mode": "async",
"response": "document",
"inputs": [
{
"id": "input-file",
"href": "<file-reference>"
},
{
"id": "input-value",
"data": 1,
}
],
"outputs": [
{
"id": "output",
"transmissionMode": "reference"
}
]
}
|
Job Execution Payload as Mapping
{
"mode": "async",
"response": "document",
"inputs": {
"input-file": {
"href": "<file-reference>"
},
"input-value": {
"value": 1
}
},
"outputs": {
"output": {
"transmissionMode": "reference"
}
}
}
|
Note
For backward compatibility, the execution payload inputs and outputs can be provided either as mapping
(keys are the IDs, values are the content), or as listing (each item has content and "id" field)
interchangeably. When working with OGC API - Processes compliant services, the mapping representation
should be preferred as it is the official schema, is more compact, and it allows inline specification of literal
data (values provided without the nested value field). The listing representation is the older format employed
during previous OGC testbed developments.
See also
Many additional parameters can be used to request further functionalities. The above fields only present the most common definitions employed to request a Job. Please refer to the OpenAPI Execute definition, as well as following sections, for all applicable features.
Execution Body, Execution Mode and Execution Results sections provide details applicable to Weaver, which align with OGC API - Processes, but that can also support additional capabilities.
OGC API - Processes - Execution Outputs offers general details on
transmissionModeparameter of requested outputs.OGC API - Processes - Execution Mode describes general details about the execution negotiation (sync/async), formerly with
modeparameter, and more recently withPreferheader.OGC API - Processes - Execution Responses (sync) and OGC API - Processes - Execution Responses (async) provide a complete listing of available
responseformats considering all other parameters.
Changed in version 4.20.
With the addition of Process revisions (see Update Operation section), a registered
Process specified only by {processID} will execute the latest revision of that Process. An older
revision can be executed by adding the tagged version in the path ({processID}:{version}) or adding the request
query parameter version.
Execution Body
The inputs definition is the most important section of the request body. It is also the only one that is completely
required when submitting the execution request, even for a no-input process (an empty mapping is needed in such case).
It defines which parameters
to forward to the referenced Process to be executed. All id elements in this Job request
body must correspond to valid inputs from the definition returned by DescribeProcess
response. Obviously, all formatting requirements (i.e.: proper file Media-Types),
data types (e.g.: int, string, etc.) and validations rules (e.g.: minOccurs, AllowedValues, etc.)
must also be fulfilled. When providing files as input,
multiple protocols are supported. See later section File Reference Types for details.
The outputs section defines, for each id available from the Process definition, how to
report the produced outputs from a successful Job execution. The method under which each output will
be returned depends on the negotiated Execution Mode and Execution Results.
When an output corresponds to a File Reference produced by the Application Package,
and stored locally, the result will typically (unless requested otherwise), be exposed externally using the returned
reference URL.
For outputs that correspond to literal data, such as plain strings or numbers, Weaver will typically prefer
returning the value directly. However, alternate link representations can also be obtained if specified in the
execution request, using transmissionMode overrides for the desired outputs.
When the outputs section is omitted, it simply means that the Process to be executed
should return all outputs it offers in the created Job Results.
If the outputs section is specified, but that one of the requested outputs [1] defined in
the Process Description is not specified, this indicates that the Job should
omit this output from the produced results. When requested outputs are specified without any transmissionMode,
the reference representation is used automatically for File Reference as it makes all
outputs more easily accessible with distinct URL afterwards, and value is used for literal data to
obtain them directly (inline in the response). Opposite value/reference representations can be requested
explicitly, for each respective output, using the transmissionMode as presented below.
Warning
When using outputs in the request body, one necessarily introduces filtering indications of results to
be returned. If all outputs are desired, some of which override transmissionMode and others letting
their representation auto-resolve, explicit {} mapping must be indicated to avoid filtering them out.
{
"inputs": {"<...>": "<...>"},
"outputs": {
"output-default": {},
"output-by-value": {"transmissionMode": "value"},
"output-by-ref": {"transmissionMode": "reference"}
}
}
When transmissionMode is specified for a given output, its result representation will override any other
parameter that would otherwise affect its automatic or “informed” resolution of the output representation.
These parameters are further detailed in the following Execution Mode and Execution Results sections.
Execution Mode
In order to select how to execute a Process, either synchronously or asynchronously, the Prefer header
should be specified. If omitted, Weaver defaults to asynchronous execution. To execute asynchronously
explicitly, Prefer: respond-async should be used. Otherwise, the synchronous execution can be requested
with Prefer: wait=X where X is the duration in seconds to wait for a response. If no worker becomes available
within that time, or if this value is greater than
the weaver.execute_sync_max_wait setting (see detail), the Job will
resume asynchronously and the response will be returned. Furthermore, synchronous and asynchronous execution of
a Process can only be requested for corresponding jobControlOptions it reports as supported in
its Process Description. It is important to provide the jobControlOptions parameter with
applicable modes when Deploying a Process to allow it to run as desired. By default, Weaver
will assume that deployed processes are only asynchronous to handle longer operations.
Changed in version 4.15: By default, every Builtin Process can accept both modes.
All previously deployed processes will only allow asynchronous execution, as only this one was supported.
This should be reported in their jobControlOptions.
Warning
It is important to remember that the Prefer header is indeed a preference. If Weaver deems it cannot
allocate a worker to execute the task synchronously within a reasonable delay, it can enforce the asynchronous
execution. The asynchronous mode is also prioritized for running longer Job submitted over the task
queue, as this allows Weaver to offer better availability for all requests submitted by its users.
The synchronous mode should be reserved only for very quick and relatively low computation intensive operations.
Todo
Support the Prefer: handling=strict modifier to disallow switching between sync/async
https://github.com/crim-ca/weaver/issues/701
The mode field displayed in the Example Job Execution Request Body is another method to tell whether to run the Process
in a blocking (sync) or non-blocking (async) manner. Note that support is limited for mode sync as this use
case is often more cumbersome than async execution. Effectively, sync mode requires to have a task worker
executor available to run the Job (otherwise it fails immediately due to lack of processing resource), and
the requester must wait for the whole execution to complete to obtain the result.
Given that Process could take a very long time to complete, it is not practical to execute them in this
manner and potentially have to wait hours to retrieve outputs.
Instead, the preferred and default approach is to request an async Job execution. When doing so, Weaver
will add this to a task queue for processing, and will immediately return a Job identifier and Location
where the user can probe for its status, using Monitoring request. As soon as any task worker
becomes available, it will pick any leftover queued Job to execute it.
Note
The mode field is an older methodology that precedes the latest OGC API - Processes method using
the Prefer header. It is recommended to employ the Prefer header that ensures higher interoperability
with other services using the same standard. The mode field is deprecated and preserved only for backward
compatibility purpose.
When requesting a synchronous execution, and provided a worker was available to pick and complete the task before
the maximum wait time was reached, the final status will be directly returned. Therefore, the contents obtained this
way will be identical to any following Job Status request. If no worker is available, or if
the worker that picked the Job cannot complete it in time (either because it takes too long to execute or had
to wait on resources for too long), the Job execution will automatically switch to asynchronous mode.
The distinction between an asynchronous or synchronous response when executing a Job can be
observed in multiple ways. The easiest is with the HTTP status code of the response, 200 being for
a Job entirely completed synchronously, and 201 for a created Job that should be
monitored asynchronously. Another method is to observe the "status" value.
Effectively, a Job that is executed asynchronously will return status information contents, while
a synchronous Job will return the results directly, along a Location header referring to the
equivalent contents returned by GetStatus as in the case of asynchronous Job.
It is also possible to extract the Preference-Applied response header which will clearly indicate if the
submitted Prefer header was respected (because it could be with available worker resources) or not.
In general, this means that if the Job submission request was not provided with Prefer: wait=X AND
replied with the same Preference-Applied value, it is safe to assume Weaver decided to queue the Job
for asynchronous execution. That Job could be executed immediately, or at a later time, according to
worker availability.
It is also possible that a failed Job, even when synchronous, will respond with equivalent contents
to the status location instead of results. This is because it is impossible for Weaver to return
the result(s) as outputs would not be generated by the incomplete Job.
For any of the execution combinations, it is always possible to obtain Job outputs, along with logs, exceptions and other details using the Obtaining Job Details and Metadata endpoints.
Execution Results
When requesting a Job execution, the structure under which the Process results are returned can
be adjusted using the Prefer header with the return parameter. More precisely, the Prefer: return=minimal
and Prefer: return=representation definitions can be used to control whether the resulting outputs would be
provided using link references, or directly using their raw data representation. This behavior is described by the
OGC API - Processes (v2.0) standard revision.
The previous OGC API - Processes (v1.0) standard revision instead made use of a combination of the response
and transmissionMode parameters in the execution request body, as previously shown in table Example Job Execution Request Body.
In general, both approaches can be used interchangeably, but some combinations are not directly portable.
Whenever possible, it is recommended to employ the Prefer header that should provide higher interoperability
with latest service implementations using the same standard. However, given that transmissionMode and response
fields can allow more flexibility and strict control regarding how data is returned is specific edge cases, in contrast
to the Prefer header approach, both approaches remain available in Weaver.
See also
See the opengeospatial/ogcapi-processes#412 discussions for more details about each approach, their considerations, and potential side-effects.
Following is a detailed listing of the expected response structure according to requested parameters.
OGC API - Processes v2.0 |
OGC API - Processes v1.0 |
Amount and type of |
Results (see: important note) |
|
|---|---|---|---|---|
|
|
|
||
<any> |
<any> |
n/a |
0 |
empty [2] |
<none> |
<none> |
<none> |
1 |
|
n/a [6] |
|
<none> |
1 |
|
|
|
|
1 (literal) |
|
n/a [6] |
|
|
1 (complex) |
|
|
|
|
1 (complex) |
|
n/a [6] |
|
|
1 (literal) |
|
<none> |
<none> |
<none> |
>1 |
|
n/a [6] |
|
<none> |
>1 |
|
n/a [6] |
|
|
>1 |
|
|
|
|
>1 |
|
n/a [6] |
|
|
>1 |
|
<none> |
|
<none> |
<any> |
|
|
<none> |
<none> |
1 (literal) |
|
|
<none> |
<none> |
1 (complex) |
|
|
|
<none> |
<none> or >1 |
|
|
|
|
<none> or >1 (literal) |
|
|
|
|
<any> or >1 (complex) |
|
n/a [6] |
|
|
<any> (complex) |
|
n/a [6] |
|
|
<any> (literal) |
|
Important
Typically, clients will not use Prefer header and response/transmissionMode body parameters
simultaneously (although permitted), since they should be interchangeable in most situations.
The table indicates both OGC API - Processes v1.0/v2.0 variations to illustrate which combinations lead to the same result.
If a client happens to use both combination simultaneously, the body parameters will take precedence
over the Prefer header for conflicting cases. This is in order to respect the fact that body parameters
are “hard requirements”, whereas Prefer is a “soft requirement” (i.e.: a preference) that does not
necessarily need to be respected if the server cannot resolve the combination.
Important
It is important not to confuse expected Results above with Responses.
The actual HTTP Response returned from the execution endpoint will depend on the requested Execution Mode.
A Job successfully resolved with synchronous execution will return the Results shown in the table
directly with a HTTP 200 OK or HTTP 204 No Content status (as applicable), whereas an asynchronous execution
will always return a Job Status Response with HTTP 201 Created or HTTP 202 Accepted
status (accordingly if the Job started immediately or is still pending).
In the case of a successfully completed asynchronous execution, a
subsequent Results Request using the Job Location
is needed to obtain the Results presented in the above table.
Note that a synchronous execution can also make use of the Results Request operation to obtain the outputs again at a later time, to request alternate output representations, or retrieve additional Job information such as logs and metadata.
Note
Combinations using <none> indicate that the parameter is omitted entirely from the request. When the value is provided but “does not matter” (i.e.: leading to the same outcome regardless), the <any> notation is used instead. The n/a notation indicates not applicable cases, due to a technical or logical impossibility.
Warning
When negotiating a single output as JSON, there is a potential ambiguity between Results representation and a single file’s data, such as in the case of a GeoJSON structure, both of which are encoded in JSON. Similar ambiguities could also occur for other Media-Types, depending on supported formats, such as representing Job results in XML, or retrieving a file’s data encoded as GML XML.
To avoid ambiguity, it is recommended that the response: document or response: raw
is explicitly set for such cases to ensure the result matches the desired outcome.
In the event that the Results Document representation is desired as JSON response for that single output result, Content Negotiation by Profile [7] can be employed to remove the ambiguity. Other Profile definitions could be added to remove further ambiguities with other JSON-like structures, but these are not explicitly handled by Weaver at this time.
Details
In summary, the Prefer and response parameters define how to return the results produced by the Process.
The Prefer header is also used by OGC API - Processes v2.0 to control how the results are encoded, whereas v1.0 relies on a
separate transmissionMode parameter. By reducing the amount of parameters involved, v2.0 makes the request easier
to submit with a single header (also used to indicate the Execution Mode), but limits certain representation
combinations only possible with v1.0. These limited representations can be retrieved by involving more advanced
Profile and Media-Type Content Negotiation techniques.
See also
Examples of typical contents for many of the combinations are provided under the Job Results section.
Execution Steps
Once the Job is submitted, its status should initially switch to accepted. This effectively means that the
Job is pending execution (task queued), but is not yet executing. When a worker retrieves it for execution, the
status will change to started for preparation steps (i.e.: allocation resources, retrieving required
parametrization details, etc.), followed by running when effectively reaching the execution step of the underlying
Application Package operation. This status will remain as such until the operation completes, either with
successful or failed status.
At any moment during asynchronous execution, the Job status can be requested using GET {WEAVER_URL}/processes/{processID}/jobs/{jobID} (GetStatus). Note that
depending on the timing at which the user executes this request and the availability of task workers, it could be
possible that the Job be already in running state, or even failed in case of early problem detected.
When the Job reaches its final state, multiple parameters will be adjusted in the status response to indicate its completion, notably the completed percentage, time it finished execution and full duration. At that moment, the requests for retrieving either error details or produced outputs become accessible. Examples are presented in Result section.
Workflow Step Operations
For each Type of Processes known by Weaver, specific Workflow step implementations must be provided.
In order to simplify the chaining procedure of file references, step implementations are only required to provide the relevant methodology for their Deploy, Execute, Monitor and ref:Result <proc_op_result> operations. Operations related to staging of files, Process preparation and cleanup are abstracted away from specific implementations to ensure consistent functionalities between each type.
Operations are accomplished in the following order for each individual step:
Step Method |
Requirements |
Description |
|---|---|---|
|
I* |
|
|
R |
Retrieve input locations (considering remote files and Workflow previous-step staging). |
|
I* |
Perform operations on staged inputs to obtain desired format expected by the target Process. |
|
I* |
Perform operations on expected outputs to obtain desired format expected by the target Process. |
|
R,I |
Perform request for remote execution of the Process. |
|
R,I |
Perform monitoring of the Job status until completion. |
|
R,I |
Perform operations to obtain results location in the expected format from the target Process. |
|
R |
Retrieve results from remote Job for local storage using output locations. |
|
I* |
Perform any final steps before completing the execution or after failed execution. |
Note
All methods are defined within
weaver.processes.wps_process_base.WpsProcessInterface.Steps marked by
*are optional.Steps marked by
Rare required.Steps marked by
Iare implementation dependant.
See also
weaver.processes.wps_process_base.WpsProcessInterface.execute() for the implementation of operations order.
File Reference Types
Most inputs can be categorized into two of the most commonly employed types, namely LiteralData and ComplexData.
The former represents basic values such as integers or strings, while the other represents a File or Directory
reference. Files in Weaver (and WPS in general) can be specified with any formats as Media-Types.
As for standard WPS, only remote File references are usually handled and limited to http(s) scheme,
unless the process takes a LiteralData input string and parses the unusual reference from its value to process it
by itself. On the other hand, Weaver supports all following reference schemes.
http(s)://file://s3://opensearchfile://[experimental]
Note
Handling of Directory type for above references is specific to Weaver.
Directories require specific formats and naming conditions as described in Directory Type.
Remote WPS could support it but their expected behaviour is undefined.
The method in which Weaver will handle such references depends on its configuration, in other words, whether it is running as ADES, EMS or HYBRID (see: Configuration), as well as depending on some other CWL package requirements. These use-cases are described below.
Warning
Missing schemes in URL reference are considered identical as if file:// was used. In most cases, if not always,
an execution request should not employ this scheme unless the file is ensured to be at the specific location where
the running Weaver application can find it. This scheme is usually only employed as byproduct of the fetch
operation that Weaver uses to provide the file locally to underlying CWL application package to be
executed.
When Weaver is able to figure out that the Process needs to be executed locally in ADES mode, it will fetch all necessary files prior to process execution in order to make them available to the CWL package. When Weaver is in EMS configuration, it will always forward remote references (regardless of scheme) exactly as provided as input of the process execution request, since it assumes it needs to dispatch the execution to another ADES remote server, and therefore only needs to verify that the file reference is reachable remotely. In this case, it becomes the responsibility of this remote instance to handle the reference appropriately. This also avoids potential problems such as if Weaver as EMS doesn’t have authorized access to a link that only the target ADES would have access to.
When CWL package defines WPS1Requirement under hints for corresponding WPS-1/2 remote
processes being monitored by Weaver (see also Remote Applications),
it will skip fetching of http(s)://-based references since that would otherwise lead
to useless double downloads (one on Weaver and the other on the WPS side). It is the same in situation for
ESGF-CWTRequirement employed for ESGF-CWT processes. Because these processes do not always support S3
buckets, and because Weaver supports many variants of S3 reference formats, it will first fetch the S3
reference using its internal AWS Configuration, and then expose this downloaded file as http(s):// reference
accessible by the remote WPS process.
Note
When Weaver is fetching remote files with http(s)://, it can take advantage of additional
Request Options to support unusual or server-specific handling of remote reference as necessary.
This could be employed for instance to attribute access permissions only to some given ADES server by
providing additional authorization tokens to the requests. Please refer to Configuration of Request Options
for this matter.
Note
An exception to above mentioned skipped fetching of http(s):// files is when the corresponding Process
types are intermediate steps within a Workflow. In this case, local staging of remote results occurs between
each step because Weaver cannot assume any of the remote Provider is able to communicate with each other,
according to potential Request Options or Data Source only configured for access by Weaver.
When using AWS S3 references, Weaver will attempt to retrieve the files using server AWS Configuration and AWS Credentials. Provided that the corresponding S3 bucket can be accessed by the running Weaver application, it will fetch the files and stage them locally temporarily for CWL execution.
Note
When using S3 buckets, authorization are handled through typical AWS credentials and role permissions. This means that AWS access must be granted to the application in order to allow it fetching files. Please refer to Configuration of AWS S3 Buckets for more details.
Important
Different formats for AWS S3 references are handled by Weaver (see AWS S3 Bucket References).
They can be formed with generic s3:// and specific http(s):// with some reference to Amazon AWS endpoint.
When a reference with http(s)://-like scheme refers to an S3 bucket, it will be converted accordingly and
handled as any other s3:// reference. In the below Summary of File Type Handling Methods, these special HTTP-like
URLs should be understood as part of the s3:// category.
When using OpenSearch references, additional parameters are necessary to handle retrieval of specific file URL. Please refer to OpenSearch Data Source for more details.
Following table summarize the default behaviour of input file reference handling of different situations when received as input argument of process execution. For simplification, keyword <any> is used to indicate that any other value in the corresponding column can be substituted for a given row when applied with conditions of other columns, which results to same operational behaviour. Elements that behave similarly are also presented together in rows to reduce displayed combinations.
Configuration |
Process Type |
File Scheme |
Applied Operation |
|---|---|---|---|
<any> |
<any> |
|
Query and re-process [8] |
|
Convert to |
||
|
Nothing (unmodified) |
||
|
Fetch and convert to |
||
|
Convert to |
||
|
Nothing (file already local) |
||
|
Fetch and convert to |
||
|
|||
|
Convert to |
||
|
Convert to |
||
|
Nothing (unmodified, step will handle it) |
||
|
|||
|
|||
|
Convert to |
||
|
Nothing (unmodified) |
||
|
Fetch and convert to |
||
|
Convert to |
||
|
Nothing (unmodified) |
||
|
Fetch and convert to |
||
|
Convert to |
||
|
Convert to |
||
|
Nothing (unmodified, step will handle it) |
||
|
|||
|
Details
References defined by opensearchfile:// will trigger an OpenSearch query using the provided URL as
well as other input additional parameters (see OpenSearch Data Source). After processing of this query,
retrieved file references will be re-processed using the summarized logic in the table for the given use case.
When a file:// (or empty scheme) maps to a local file that needs to be exposed externally for
another remote process, the conversion to http(s):// scheme employs setting weaver.wps_output_url to form
the result URL reference. The file is placed in weaver.wps_output_dir to expose it as HTTP(S) endpoint.
Note that the HTTP(S) servicing of the file is not handled by Weaver itself. It is assumed that the server
where Weaver is hosted or another service takes care of this task.
When the process refers to a remote OGC API - Processes (WPS-REST, WPS-T, WPS-3) process (i.e.: remote WPS instance that supports
REST bindings but that is not necessarily an ADES), Weaver simply wraps and monitors its remote
execution, therefore files are handled just as for any other type of remote WPS-like servers. When the
process contains an actual CWL Application Package that defines a CommandLineTool class
(including applications with Docker image requirement), files are fetched as it will be executed locally.
See CWL CommandLineTool, OGC API - Processes (WPS-REST, WPS-T, WPS-3) and Remote Provider for further details.
When an s3:// file is fetched, is gets downloaded to a temporary file:// location, which is NOT
necessarily exposed as http(s)://. If execution is transferred to a remove process that is expected to not
support S3 references, only then the file gets converted as in [9].
When a vault://<UUID> file is specified, the local OGC API - Processes (WPS-REST, WPS-T, WPS-3) process can make use of it directly.
The file is therefore retrieved from the Vault using the provided UUID and access token to be passed
to the application. See File Vault Inputs and Uploading File to the Vault for more details.
When a vault://<UUID> file is specified, the remote process needs to access it using the hosted Vault endpoint.
Therefore, Weaver converts any vault reference to the corresponding location and inserts the access token in the
requests headers to authorize download from the remote server. See File Vault Inputs and Uploading File to the Vault
for more details.
Workflows are only available on EMS and HYBRID instances. Since they chain processes, no fetch is needed as the sub-step process will do it instead as needed. See Workflow process as well as CWL Workflow for more details.
Todo
method to indicate explicit fetch to override these? (https://github.com/crim-ca/weaver/issues/183)
File Reference Names
When processing any of the previous File Reference Types, the resulting name of the file after retrieval can depend on the applicable scheme. In most cases, the file name is simply the last fragment of the path, whether it is an URL, an S3 bucket or plainly a file directory path. The following cases are exceptions.
Changed in version 4.4: When using http(s):// references, the Content-Disposition header can be provided with filename
and/or filename* as specified by RFC 2183, RFC 5987 and RFC 6266 specifications in order to define a
staging file name. Note that Weaver takes this name only as a suggestion as will ignore the preferred name if it
does not conform to basic naming conventions for security reasons. As a general rule of thumb, common alphanumeric
characters and separators such as dash (-), underscores (_) or dots (.) should be employed to limit
chances of errors. If none of the suggested names are valid, Weaver falls back to the typical last fragment of
the URL as file name.
Added in version 4.27: References using any scheme can refer to a Directory. Do do so, they must respect definitions in Directory Type.
When provided, all retrievable contents under that directory will be recursively staged.
When using s3:// references (or equivalent http(s):// referring to S3 bucket), the staged file names
will depend on the stored object names within the bucket. In that regard, naming conventions from AWS should be
respected.
When using vault://<UUID> references, the resulting file name will be obtained from the filename specified in
the Content-Disposition within the uploaded content of the multipart/form-data (RFC 7578) request.
See also
File Vault Inputs
See also
Refer to Uploading File to the Vault section for general details about the Vault feature.
Stored files in the Vault can be employed as input for Execution of a Process (Execute) operation using the
provided vault://<UUID> reference from the response following upload. The Execute
request must also include the X-Auth-Vault header to obtain access to the file.
Warning
Avoid using the Vault HTTP location as href input. Prefer the vault://<UUID> representation.
The direct Vault HTTP location SHOULD NOT be employed as input reference to a Process to ensure its proper interpretation during execution. There are two main reasons for this.
Firstly, using the plain HTTP endpoint will not provide any hint to Weaver about whether the input link is a generic remote file or one hosted in the Vault. With the lack of this information, Weaver could attempt to download the file to retrieve it for its local Process execution, creating unnecessary operations and wasting bandwidth since it is already available locally. Furthermore, the Vault behaviour that deletes the file after its download would cause it to become unavailable upon subsequent access attempts, as it could be the case during handling and forwarding of references during intermediate Workflow step operations. This could inadvertently break the Workflow execution.
Secondly, without the explicit Vault reference, Weaver cannot be aware of the necessary X-Auth-Vault
authorization needed to download it. Using the vault://<UUID> not only tells Weaver that it must forward any relevant
access token to obtain the file, but it also ensures that those tokens are not inadvertently sent to other locations.
Effectively, because the Vault can be used to temporarily host sensitive data for Process execution,
Weaver can better control and avoid leaking the access token to irrelevant resource locations such that only the
intended Job and specific input can access it. This is even more important in situations where multiple
Vault references are required, to make sure each input forwards the respective access token for retrieving
its file.
When submitting the Execute request, it is important to provide the X-Auth-Vault header
with additional reference to the Vault parameter when multiple files are involved. Each token should be
provided using a comma to separated them, as detailed below. When only one file refers to the Vault the
parameters can be omitted since there is no need to map between tokens and distinct vault://<UUID> entries.
POST /processes/{process_id}/execution HTTP/1.1
Host: weaver.example.com
Content-Type: application/json
X-Auth-Vault: token <access-token-1>; id=<vault-uuid-1>,token <access-token-2>; id=<vault-uuid-2>
{
"mode": "async",
"response": "document",
"inputs": {"input-1": {"href": "vault://<vault-uuid-1>"}, "input-2": {"href": "vault://<vault-uuid-2>"}},
"outputs": {"out": {"transmissionMode": "reference"}}
}
The notation (RFC 5234, RFC 7230 Section 1.2) of the X-Auth-Vault header is presented below.
X-Auth-Vault = vault-unique / vault-multi vault-unique = credentials [ BWS ";" OWS auth-param ] vault-multi = credentials BWS ";" OWS auth-param 1*( "," OWS credentials BWS ";" OWS auth-param ) credentials = auth-scheme RWS access-token auth-scheme = "token" auth-param = "id" "=" vault-id vault-id = UUID / ( DQUOTE UUID DQUOTE ) access-token = base64 base64 = <base64, see RFC 4648 Section 4> DQUOTE = <DQUOTE, see RFC 7230 Section 1.2> UUID = <UUID, see RFC 4122 Section 3> BWS = <BWS, see RFC 7230 Section 3.2.3> OWS = <OWS, see RFC 7230 Section 3.2.3> RWS = <RWS, see RFC 7230 Section 3.2.3>
In summary, the access token can be provided by itself by omitting the Vault UUID parameter only if a single file is referenced across all inputs within the Execute request. Otherwise, multiple Vault references all require to specify both their respective access token and UUID in a comma separated list.
AWS S3 Bucket References
File and directory references to AWS S3 items can be defined using one of the below formats.
They can either use the http(s):// or s3://, whichever one is deemed more appropriate by the user.
The relevant reference format according to the location where the Bucket is hosted and can be accessed from
must be employed.
https://s3.{Region}.amazonaws.com/{Bucket}/[{dirs}/][{file-key}]
https://{Bucket}.s3.{Region}.amazonaws.com/[{dirs}/][{file-key}]
https://{AccessPointName}-{AccountId}.s3-accesspoint.{Region}.amazonaws.com/[{dirs}/][{file-key}]
https://{AccessPointName}-{AccountId}.{outpostID}.s3-outposts.{Region}.amazonaws.com/[{dirs}/][{file-key}]
s3://{Bucket}/[{dirs}/][{file-key}]
arn:aws:s3:{Region}:{AccountId}:accesspoint/{AccessPointName}/[{dirs}/][{file-key}]
arn:aws:s3-outposts:{Region}:{AccountId}:outpost/{OutpostId}/accesspoint/{AccessPointName}/[{dirs}/][{file-key}]
Warning
Using the s3:// with a Bucket name directly (without ARN) implies that the default profile from
the configuration will be used (see Configuration of AWS S3 Buckets).
OpenSearch Data Source
In order to provide OpenSearch query results as input to Process for execution, the
corresponding Deploy request body must be provided with additionalParameters in order
to indicate how to interpret any specified metadata. The appropriate OpenSearch queries can then be applied
prior the execution to retrieve the explicit file reference(s) of EOImage elements that have
been found and to be submitted to the Job.
Depending on the desired context (application or per-input) over which the AOI, TOI, EOImage and multiple other metadata search filters are to be applied, their definition can be provided in the following locations within the Deploy body.
Context |
Location |
Role |
|---|---|---|
Application |
|
|
Input |
|
|
The distinction between application or per-input contexts is entirely dependent of whatever is the intended processing operation of the underlying Process, which is why they must be defined by the user deploying the process since there is no way for Weaver to automatically infer how to employ provided search parameters.
In each case, the structure of additionalParameters should be similar to the following definition:
{
"additionalParameters": [
{
"role": "http://www.opengis.net/eoc/applicationContext/inputMetadata",
"parameters": [
{
"name": "EOImage",
"values": [
"true"
]
},
{
"name": "AllowedCollections",
"values": "s2-collection-1,s2-collection-2,s2-sentinel2,s2-landsat8"
}
]
}
]
}
In each case, it is also expected that the role should correspond to the location where the definition is provided
accordingly to their context from the above table.
For each deployment, processes using EOImage to be processed into OpenSearch query results can interpret the following field definitions for mapping against respective inputs or application context.
Name |
Values |
Context |
Description |
|---|---|---|---|
|
|
Input |
Indicates that the nested parameters within the current |
|
String of comma-separated list of collection IDs. |
Input (same one as |
Provides a subset of collection identifiers that are supported. During execution any specified input not respecting one of the defined values will fail OpenSearch query resolution. |
|
|
Input (other one than |
String with the relevant OpenSearch query filter name according to the described input.
Defines a given Process input |
|
|
Application |
Indicates that provided |
|
|
Application |
Indicates that provided |
When an EOImage is detected for a given Process, any submitted Job execution will expect the
defined inputs in the Process description to indicate which images to retrieve for the application. Using
inputs defined with corresponding CatalogSearchField filters, a specific OpenSearch query will be sent to
obtain the relevant images. The inputs corresponding to search fields will then be discarded following
OpenSearch resolution. The resolved link(s) for to EOImage will be substituted within the id of the
input where EOImage was specified and will be forwarded to the underlying Application Package for execution.
Note
Collection identifiers are mapped against URL endpoints defined in configuration to execute the appropriate OpenSearch requests. See Configuration of Data Sources for more details.
See also
Definitions in OpenSearch Deploy request body provides a more detailed example of the expected structure and
relevant additionalParameters locations.
See also
Definitions in OpenSearch Examples providing different combinations of inputs, notably for using distinct
AOI, term:TOI and collections, with or without UniqueAOI and UniqueTOI specifiers.
BoundingBox Inputs
Todo
provide example and details, (crs, dimensions, etc.)
Todo
cross-reference Inputs/Outputs Type for more details/examples
Collection Inputs
The Collection Input is defined by the OGC API - Processes - Part 3: Workflows extension. This allows to submit a
Process Execution using the following JSON structure when the targeted Process
can make use of the resulting data sources retrieved from the referred Collection and processing conditions.
The collection keyword is employed to identify this type of input, in contrast to literal data and complex file
inputs respectively using value and href, as presented in the Process Execution
section.
{
"inputs": {
"image-input": {
"collection": "https://example.com/collections/sentinel-2"
}
}
}
Note
More properties can be provided with the collection, such as filter, sortBy, etc.
The OpenAPI definition in Weaver is defined with a minimal set of properties, since specific requirements
to be supported might need multiple OGC Testbed iterations to be established.
Also, different combinations of parameters will be supported depending on which remote API gets
interrogated to resolve the Collection contents. The OGC API - Processes - Part 3: Workflows is still under development,
and interactions with the various access points of OGC Web API standards remains to
be evaluated in detail to further explore interoperability concerns between all API implementations.
Refer to Examples for potential combinations and additional samples.
To determine which items should be retrieved from the Collection, whether they are obtained by
OGC API - Coverages, OGC API - Features, OGC API - Maps, OGC API - Tiles, STAC API Specification,
or any other relevant data access mechanisms defined by the OGC Web API standards,
depends on the negotiated Media-Types required by the corresponding input
in the Process Description, any relevant format indication,
and capabilities offered by the server referenced with the collection URL.
For example, if a Process input definition indicates that
it accepts GeoJSON (application/geo+json) as its contents Media-Type,
or contains a format: geojson-feature-collection indication within its schema definition,
the referenced collection would most probably need to resolve access to the data
using OGC API - Features (i.e.: with request GET /collections/dataset-features/items),
to retrieve relevant GeoJSON items as a FeatureCollection, which would then be
passed to the corresponding input of the Process.
However, depending on the capabilities of the server (e.g.: a STAC API instance or various extension support),
the POST /search or the POST /collections/dataset-features/search operations could be considered as well.
Alternatively, if an array of image/tiff; application=geotiff is expected by the Process input definition
and the submitted input provides a collection referring to a STAC API endpoint,
the STAC Assets matching the requested Media-Type could potentially be retrieved as input for
the Process Execution.
Contrary to an href input where the referenced URL (potentially pointing at a Collection
with predefined filtering query parameters) is directly accessed with a single request, the collection input offers
the option to the server to further negotiate and resolve the targeted Collection reference and the data it
contains. In some situations, such as when a very large amount of data needs to be accessed and retrieved with an
iterative paging or tiling approach, the collection input allows to automatically resolve this operation
(as supported by the server), whereas the direct href reference would only return the limited content as directly
responded by the server hosting the :term`Collection`, and potentially not reflecting the actual intent of the user
submitting the Process Execution. Therefore, the collection allows the resolution of more
complex data access mechanisms that cannot be resolved by a single request or operation.
In summary, the Collection Input offers a lot of flexibility with its resolution compared to
the typical Input Types (i.e.: Literal, BoundingBox, Complex) that must be explicitly
specified. However, its capability to auto-resolve multiple Media-Types negotiations, formats, data structures,
data cardinality and API access mechanisms simultaneously can make its behavior hard to predict.
Hint
In order to evaluate the expected resolution of a Collection
prior to including it into a complex Process or Workflow execution, the Builtin
weaver.processes.builtin.collection_processor can be employed to test its result.
This function will be used under-the-hood whenever a Collection Input is specified.
Since the Builtin Process only performs the resolution of the collection into the corresponding
data sources for the target Process, without actually downloading the resolved URL references,
using it can potentially help identify and avoid unintended large processing, or allow users to validate that
the defined filter (or any other below parameters) produces the appropriate data retrieval strategy for the
desired execution purpose.
See also
The Examples section further demonstrates how to apply Collection Input and how its parameters can help produce various result combinations.
Note
Do not hesitate to submit a new issue if the Collection Input resolution does not seem to behave according to your specific use cases.
Format Selection
For cases where the resolution does not automatically resolve with the intended behavior, any submitted Collection Input can include the following additional parameters to hint the resolution toward certain outcomes.
Parameter |
Description |
|---|---|
|
Indicates the desired Media-Type to resolve and extract from the Collection Input. This can be used in situations where the target Process receiving the Collection as input supports multiple compatible Media-Types, and that the user wants to explicitly indicate which one would be preferred, or to limit combinations to a certain Media-Type when multiple matches are resolved simultaneously. |
|
Indicates the desired schema to resolve and extract from the Collection Input.
This can be used similarly to |
|
Indicates the preferred data access mechanism to employ amongst
|
Filtering
When adding a filter parameter along the collection reference, it is possible to provide filtering conditions
to limit the items to be extracted from the Collection. See the Examples for samples.
In the event that a filter contains coordinates that do not employ the
commonly employed default CRS of EPSG:4326 (or CRS84/CRS84h equivalents),
the filter-crs parameter can be specified to provide the applicable CRS.
Note
Weaver will not itself interpret the filter-crs beside transforming between URI and
common short name representations to ensure the remote API can properly resolve the intended reference.
If a filter-crs is provided, it is up to the remote API receiving it to interpret it and the
referenced coordinates within filter correctly.
If the targeted server by the collection URL cannot resolve the CRS, the user will need
to convert it themselves to make it appropriate according to the target server capabilities.
The filter-lang parameter can be employed to indicate which language encoding is specified in filter.
At the moment, the following languages (case-insensitive) are handled in Weaver using pygeofilter.
Name and Reference |
Value for |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note
Although there are a lot of “Common Query Language” (CQL) variations, most of them only imply minimal variations between some operations, sometimes allowing alternate or additional systax and/or operators.
Because most OGC Web API standards rely extensively on CQL2-JSON or CQL2-Text encodings, and that most of them have common bases that can be easily translated, all language variants will be converted to an appropriate and equivalent CQL2-based definition, before submitting it to the Collection resolution operation.
Examples
The following section presents some examples of potential Collection Input definitions that could be used for Process Execution, and some explanation about their expected resolution.
The following example presents the use of a filter encoded with CQL2-JSON, used to limit the retrieved
geometries only to Feature instances that intersect the specified polygon. Any Feature that was matched
should also be sorted in descending order of their respective id property, according to the sortBy parameter.
Furthermore, the OGC API - Features resolution is requested using the format parameter. Because it is
expected from this API that a GeoJSON FeatureCollection document would be returned,
the features input of the Process receiving this result should support application/geo+json
or a similar schema definition for this execution request to be successful. Since this Media-Type
is the default value returned by OGC API - Features, the type does not need to be set explicitly.
{
"inputs": {
"features": {
"collection": "https://example.com/collections/dataset-features",
"format": "ogc-feature-collection",
"filter": {
"op": "s_intersects",
"args": [
{"property": "geometry"},
{
"type": "Polygon",
"coordinates": [ [30, 10], [40, 40], [20, 40], [10, 20], [30, 10] ]
}
]
},
"filter-crs": "https://www.opengis.net/def/crs/OGC/1.3/CRS84",
"filter-lang": "cql2-json",
"sortBy": "-id"
}
}
}
The following example presents a filter encoded with CQL2-Text, which aims to return only elements
that contain a property matching the eo:cloud_cover < 0.1 criteria from the Collection
named sentinel-2. In this case, the STAC API Specification is indicated by the format. Therefore,
STAC Items defined under that Collection are expected to be considered if their properties respect
the eo:cloud_cover filter. However, the Media-Type defined by type corresponding to Cloud Optimized GeoTIFF (COG)
is also specified, meaning that the result from the Collection Input resolution is not
the GeoJSON STAC Items themselves, but the STAC Assets they respectively contain, and that match
this GeoTIFF type.
Therefore, the definition of the Process input images should support an array of GeoTIFF images,
for this resolution to succeed, and proceed to execute the Process using them.
{
"inputs": {
"images": {
"collection": "https://example.com/collections/sentinel-2",
"format": "stac-collection",
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"filter": "properties.eo:cloud_cover < 0.1",
"filter-lang": "cql2-text"
}
}
}
Collection Outputs
Todo
Not implemented. See crim-ca/weaver#683.
Multiple Inputs
Todo
repeating IDs example for WPS multi-inputs
See also
Multiple Outputs
Warning
In this section, Multiple Outputs refer to multiple value or reference items under a single {outputID}.
This is not to be confused by a Process which as multiple and distinct {outputID} under its outputs
definition, which is supported by all CWL, WPS and OGC API - Processes representations.
Although CWL allows output type: array, WPS does not support it directly. According to WPS
specification, only a single value is allowed under each corresponding outputs ID. Adding more than one <wps:Data>
or <wps:ComplexData> definition causes undefined behavior.
To work around this limitation, there are two potential solutions.
Use a “container” format, such as Metalink or
application/zip.This method essentially “packages” resulting files from a CWL operation into a single
type: File, therefore avoiding thearraytype entirely, and making the resulting WPS compliant with a singleComplexDatareference.However, that approach requires that the Application Package itself handles the creation of the selected file “container” format. Weaver will not automatically perform this step. Also, this approach can be limiting for cases where the underlying
itemsin thearrayare literal values rather thanFile, since that would require embedding the literal data within atext/plainfile before packaging them. Furthermore, chaining this kind of output to another step input in a Workflow would also require that the input respect the same media-type, and that the Application Package receiving that input handles by itself any necessary unpacking the relevant “container” format.Whether this approach is appropriate depends on user-specific requirements.
See also
For more details regarding the Metalink format and how to use it, see PyWPS Multiple Outputs.
Let Weaver transparently embedded the CWL
arrayas a single valueComplexData.Added in version 5.5.
This method relies on encoding the resulting CWL
arrayoutput into its correspondingstringrepresentation, and transforms the WPS output into aComplexDatacontaining this JSON “string” instead of aFile. When obtaining the result from the WPS interface, the output will therefore be represented as a single raw string value to respect the specification. Once this output is retrieved with the OGC API - Processes interface, it will be automatically unpacked into its original JSONarrayform for the HTTP response. From the point of view of a user interacting only with OGC API - Processes, transition from CWL and WPS will be transparent. Users of the WPS would need to perform a manual JSON parsing (e.g.:json.loads()) of the string to obtain thearray.To disambiguate from
ComplexDatathat could be an actual single-value JSON (i.e.: a Process returning any JSON-like media-type, such asapplication/geo+json), Weaver will employ the special media-typeapplication/raw+jsonto detect this embedded JSON strategy used to represent the CWLarray. Other JSON-like media-types will remain unmodified.
See also
Outputs Location
By default, Job results will be hosted under the endpoint configured by weaver.wps_output_url and
weaver.wps_output_path, and will be stored under directory defined by weaver.wps_output_dir setting.
Warning
Hosting of results from the file system is NOT handled by Weaver itself. The API will only report the
expected endpoints using configured weaver.wps_output_url. It is up to an alternate service or the platform
provider that serves the Weaver application to provide the external hosting and availability of files online
as desired.
Each Job will have its specific UUID employed for all of the outputs files, logs and status in order to avoid conflicts. Therefore, outputs will be available with the following location:
{WPS_OUTPUT_URL}/{JOB_UUID}.xml # status location
{WPS_OUTPUT_URL}/{JOB_UUID}.log # execution logs
{WPS_OUTPUT_URL}/{JOB_UUID}/{outputID}/{output.ext} # results of the job if successful
Note
Value WPS_OUTPUT_URL in above example is resolved accordingly with weaver.wps_output_url,
weaver.wps_output_path and weaver.url, as per Configuration Settings details.
When submitting a Job for execution, it is possible to provide the X-WPS-Output-Context header.
This modifies the output location to be nested under the specified directory or sub-directories.
For example, providing X-WPS-Output-Context: project/test-1 will result in outputs located at:
{WPS_OUTPUT_URL}/project/test-1/{JOB_UUID}/{outputID}/{output.ext}
Note
Values provided by X-WPS-Output-Context can only contain alphanumeric, hyphens, underscores and path separators
that will result in a valid directory and URL locations. The path is assumed relative by design to be
resolved under the WPS output directory, and will therefore reject any . or .. path references.
The path also CANNOT start by /. In such cases, an HTTP error will be immediately raised indicating
the symbols that where rejected when detected within X-WPS-Output-Context header.
If desired, parameter weaver.wps_output_context can also be defined in the Configuration Settings in order to employ
a default directory location nested under weaver.wps_output_dir when X-WPS-Output-Context header is omitted
from the request. By default, this parameter is not defined (empty) in order to store Job results directly under
the configured WPS output directory.
Note
Header X-WPS-Output-Context is ignored when using S3 buckets for output location since they are stored
individually per Job UUID, and hold no relevant context location. See also Configuration of AWS S3 Buckets.
Changed in version 4.3: Addition of the X-WPS-Output-Context header.
Notification Subscribers
When submitting a Job for execution, it is possible to provide the notification_email field.
Doing so will tell Weaver to send an email to the specified address with successful or failure details
upon Job completion. The format of the email is configurable from weaver.ini.example file with
email-specific settings (see: Email Configuration).
Alternatively to notification_email, the subscribers field of the API can be employed during Job
submission. Using this field will take precedence over notification_email for corresponding email and status
combinations. The Job subscribers allow more fined-grained control over which emails will be sent for
the various combinations of Job status milestones.
Furthermore, subscribers allow specifying URLs where HTTP(S) callback requests (i.e.: webhooks) will be sent with
the Job Status or Job Results contents directly in JSON format.
This allows users and/or servers to directly receive the necessary details using a push-notification mechanism instead
of the polling-based method on the Job Status endpoint otherwise required to obtain updated
Job details.
See also
Refer to the OpenAPI Documentation of the POST {WEAVER_URL}/processes/{processID}/execution (Execute) request for all available subscribers properties.
Job Management
This section presents capabilities related to Job management. The endpoints and related operations are defined in a mixture of OGC API - Processes Core requirements, some official extensions, and further Weaver-specific capabilities.
See also
Submitting a Job Creation
Important
All concepts introduced in the Execution of a Process also apply in this section. Consider reading the subsections for more specific details.
This section will only cover additional concepts and parameters applicable only for this feature.
Rather than using the POST {WEAVER_URL}/processes/{processID}/execution (Execute) request, the POST {WEAVER_URL}/jobs (Create) request can be used to submit a Job.
When doing so, all parameters typically required for Process execution must also be provided, including
any relevant Execution Body contents (I/O), the desired Execution Mode, and
the Execution Results options. However, an additional process URL in the request body is required,
to indicate which Process should be executed by the Job.
The POST {WEAVER_URL}/jobs (Create) operation allows interoperability alignement with other execution strategies, such as defined
by the openEO API Profiles and the OGC Testbed-20 - GeoDataCubes (GDC) API Profile. It also opens the door for advanced Workflow
definitions from a common Job endpoint interface, as described by the OGC API - Processes - Part 4: Job Management extension.
Furthermore, an optional "status": "create" request body parameter can be supplied to indicate to the Job
that it should remain in pending state, until a later Job Execution Trigger is performed
to start its execution. This allows the user to apply any desired Job Updates or reviewing
the resolved Job Inputs prior to submitting the Job. This acts in contrast to
the Core POST {WEAVER_URL}/processes/{processID}/execution (Execute) operation that immediately places the Job in queue, locking it from any update.
Updating a Job
The PATCH {WEAVER_URL}/jobs/{jobID} (Update) request allows updating the Job and its underlying parameters prior to execution.
For this reason, it has for pre-requirement to be in created Job Status, such that
it is pending a Job Execution Trigger before being sent to the worker execution queue.
For any other status than created, attempts to modify the Job will return an HTTP 423 Locked error
response.
Potential parameters that can be updated are:
Submitted Process
inputsDesired
outputsformats and representations, as per Execution ResultsApplicable
headers,responseandmodeoptions as per Execution ModeAdditional metadata such as a custom Job
title
After updating the Job, the Job Status and Job Inputs operations can further be performed to review the pending Job state. Using all those operations allows the user to iteratively adjust the Job until it is ready for execution, for which the Job Execution Trigger would then be employed.
Triggering Job Execution
The POST {WEAVER_URL}/jobs{jobID}/results (Trigger) request allows submitting a pending Job to the worker execution queue. Once performed,
the typical Monitoring a Job Execution (GetStatus) operation can be employed, until eventual success or failure of the Job.
If the Job was already submitted, is already in queue, is actively running, or already finished execution, this operation will return a HTTP 423 Locked error response.
Monitoring a Job Execution (GetStatus)
Monitoring the execution of a Job consists of polling the status Location provided from the
Execute or Trigger operation and verifying the
indicated status for the expected result.
The status can correspond to any of the value defined by weaver.status.Status
accordingly to the internal state of the workers processing their execution, and the
negotiated Alternate Job Status representation.
When targeting a Job submitted to a Weaver instance, monitoring is usually accomplished through
the OGC API - Processes endpoint using GET {WEAVER_URL}/processes/{processID}/jobs/{jobID} (GetStatus), which will return a JSON body.
Alternatively, the XML status location document returned by the WPS Endpoint could also be
employed to monitor the execution.
In general, both endpoints should be interchangeable, using below mapping. The Job monitoring process
keeps both contents equivalent according to their standard. For convenience, requesting the execute with
Accept: <content-type> header corresponding to either JSON or XML should redirect to the response
of the relevant endpoint, regardless of where the original request was submitted. Otherwise, the default contents
format is employed according to the chosen location.
Standard |
Contents |
Location |
|---|---|---|
|
||
|
See also
For the WPS endpoint, refer to Configuration Settings.
Following are examples for both representations. Note that results might vary according to other parameters such as when using Alternate Job Status, or when different Process references or Workflow definitions are involved.
{
"$schema": "https://schemas.opengis.net/ogcapi/processes/part1/1.0/openapi/schemas/statusInfo.yaml",
"jobID": "a305ef3e-3220-4d43-b1be-301f5ef13c23",
"processID": "example-process",
"providerID": null,
"type": "process",
"status": "successful",
"message": "Job successful.",
"created": "2024-10-02T14:21:12.380000+00:00",
"started": "2024-10-02T14:21:12.990000+00:00",
"finished": "2024-10-02T14:21:23.629000+00:00",
"updated": "2024-10-02T14:21:23.630000+00:00",
"duration": "00:00:10",
"runningDuration": "PT11S",
"runningSeconds": 10.639,
"percentCompleted": 100,
"progress": 100,
"links": [
{
"title": "Job status.",
"hreflang": "en-CA",
"href": "https://example.com/processes/download-band-sentinel2-product-safe/jobs/a305ef3e-3220-4d43-b1be-301f5ef13c23",
"type": "application/json",
"rel": "status"
},
{
"title": "Job monitoring location.",
"hreflang": "en-CA",
"href": "https://example.com/processes/download-band-sentinel2-product-safe/jobs/a305ef3e-3220-4d43-b1be-301f5ef13c23",
"type": "application/json",
"rel": "monitor"
},
{
"title": "Job results of successful process execution (direct output values mapping).",
"hreflang": "en-CA",
"href": "https://example.com/processes/download-band-sentinel2-product-safe/jobs/a305ef3e-3220-4d43-b1be-301f5ef13c23/results",
"type": "application/json",
"rel": "http://www.opengis.net/def/rel/ogc/1.0/results"
}
]
}
<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse
xmlns:wps="http://www.opengis.net/wps/1.0.0"
xmlns:ows="http://www.opengis.net/ows/1.1"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd"
service="WPS"
version="1.0.0"
xml:lang="en-CA"
serviceInstance="https:/example.com/wps?request=GetCapabilities&amp;service=WPS"
statusLocation="https:/example.com/wpsoutputs/fd6e2171-39a7-4f99-b639-c43b51f15b56.xml">
<wps:Process wps:processVersion="1.1.0">
<ows:Identifier>example-process</ows:Identifier>
</wps:Process>
<wps:Status creationTime="2024-09-19T14:29:49Z">
<wps:ProcessSucceeded>Package operations complete.</wps:ProcessSucceeded>
</wps:Status>
<wps:ProcessOutputs>
<wps:Output>
<ows:Identifier>output</ows:Identifier>
<ows:Title>output</ows:Title>
<wps:Reference
href="https:/example.com/wpsoutputs/fd6e2171-39a7-4f99-b639-c43b51f15b56/output/data.json"
mimeType="application/json" encoding="" schema=""
/>
</wps:Output>
</wps:ProcessOutputs>
</wps:ExecuteResponse>
Alternate Job Status
In order to support alternate Job status representations, the following approaches can be used when performing
the GET {WEAVER_URL}/processes/{processID}/jobs/{jobID} (GetStatus) request.
Specify either a
profileorschemaquery parameter (e.g.:/jobs/{jobID}?profile=openeo).Specify a
profileparameter within theAcceptheader (e.g.:Accept: application/json; profile=openeo).
Using the openEO profile for example, will allow returning status values that are appropriate
as per the openEO API Profiles definition.
When performing Job Status requests, the received response should
contain a Content-Schema header indicating which of the applied profile is being represented.
This header is employed because multiple Content-Type: application/json headers are applicable
across multiple API implementations and status representations.
Obtaining Job Details and Metadata
All endpoints to retrieve any of the following information about a Job can either be requested directly
(i.e.: /jobs/{jobID}/...) or with equivalent Provider and/or Process prefixed endpoints,
if the requested Job did refer to those Provider and/or Process.
A local Process would have its Job references as /processes/{processId}/jobs/{jobID}/...
while a Remote Provider will use /provider/{providerName}/processes/{processId}/jobs/{jobID}/....
Job Outputs
Note
This endpoint is a Weaver-specific implementation provided for convenience. For the OGC API - Processes compliant endpoint, refer to Job Results.
In the case of successful Job execution, the outputs can be retrieved with GET {WEAVER_URL}/jobs/{jobID}/outputs (Outputs) request to list
each corresponding output id with the generated file reference URL. Keep in mind that the purpose of those URLs are
only to fetch the results (not persistent storage), and could therefore be purged after some reasonable amount of time.
The format should be similar to the following example, with minor variations according to Configuration
parameters for the base WPS output location:
{
"outputs": [
{
"id": "output-file",
"href": "https://example.com/wpsoutputs/f93a15be-6e16-11ea-b667-08002752172a/output_netcdf.nc",
"type": "application/x-netcdf"
},
{
"id": "output-data",
"value": 3.1416
}
]
}
The Job Outputs endpoint can receive additional query parameters,
such as schema=OGC+strict (see weaver.processes.constants.JobInputsOutputsSchema for other values),
which allows it to return contents formatted slightly differently, to imitate the JSON mapping representation
(rather than the array) used by the Execution Results endpoint as if response=document was specified
during submission of the Process execution. However, this JSON mapping will still employ a
nested outputs property, as presented below.
{
"outputs": {
"output-file": {
"href": "https://example.com/wpsoutputs/f93a15be-6e16-11ea-b667-08002752172a/output_netcdf.nc",
"type": "application/x-netcdf"
},
"output-data": 3.1416
}
}
Because these responses nests the results under outputs (in contrast to Job Results
returning {outputID} directly at the root), other information can be returned, such as relevant links
with references to Job Inputs, Job Logs, Job Status,
or the source Process Description that produced returned Job outputs.
In the event of a Job executed with response=document or Prefer: return=minimal, the contents
of Job Results will be very similar to the above Output Mapping contents,
but with respective {outputID} returned directly at the root, instead of nesting them under outputs.
On the other hand, a Job submitted with response=raw, Prefer: return=representation or other
combinations of Accept headers and transmissionMode parameters, can produce
many alternative content variations (see Execution Results) to respect OGC compliance requirements.
The structure of contents received from Job Results responses can also surprisingly vary according to
the number of requested outputs, the submitted request parameters, and the alternative Media-Type, schema
or literal data supported by each respective output.
For this reason, the Job Outputs endpoint will always provide all data and links in the
response body using the minimal representation as shown by above JSON examples,
no matter which request parameters where originally submitted to execute the Job.
In other words, the contents of a complex File Reference (such as the “output_netcdf.nc”)
will never be directly returned inline/by-value in the JSON response when using the Job Outputs
endpoint, and will always use the document/minimal file link. Similarly, a literal data value will never be
returned by link reference, nor be returned directly as the response contents. An output of literal data will
always have its value included inline in the JSON document. This behavior is performed to offer a
simplified data access mechanism without having to deal will all possible combinations of data representations
potentially returned by Execution Results.
Job Results
This corresponds to the OGC API - Processes compliant endpoint, using the GET {WEAVER_URL}/jobs/{jobID}/results (Results) request.
Contrary to Job Outputs, where the JSON document representation is always enforced,
this endpoint will respond according to the submitted Job parameters.
See also
This section presents examples of the most typical result combinations. For an exhaustive list of expected content results and resolution behaviors, according to submitted execution parameters, refer to the Execution Results section.
In the event of a Job executed with response=document or Prefer: return=minimal with multiple outputs,
the contents will typically be a JSON mapping representation, where each requested {outputID} can be
found either as value or reference, accordingly to how they were requested or resolved according
to Execution Results. An example of such results is presented below.
document response with minimal representation{
"output-file": {
"href": "https://example.com/wpsoutputs/f93a15be-6e16-11ea-b667-08002752172a/output_netcdf.nc",
"type": "application/x-netcdf"
},
"output-data": 3.1416
}
Note
The {outputID} are returned at the root of the contents using this representation,
contrary to the Job Outputs endpoint that nests them under outputs.
When a Job is executed with response=raw, or when the requested outputs [1] consisted only of
a single {outputID}, the returned data will directly
be the contents of the produced file, or literal value, as applicable according to the schema definition of the
corresponding output in the Process Description.
The following result will be obtained if any of the following conditions are encountered:
The result is a complex File Reference and the
Prefer: return=representationheader was usedThe result is a complex File Reference and the
transmissionMode: valueparameter was usedThe result is a literal data type, whether or not
Prefer/transmissionModewere specified with above values.
HTTP/1.1 200 OK
Host: weaver.example.com
Content-ID: <output@f93a15be-6e16-11ea-b667-08002752172a>
Content-Type: application/x-netcdf
<netcdf data>
The following result will be obtained if any of the following conditions are encountered:
The result is a complex File Reference and the
Prefer: return=minimalheader was usedThe result is a complex File Reference and the
transmissionMode: referenceparameter was usedThe result is a literal data type, and any above
Prefer/transmissionModevalue is explicitly requested.
HTTP/1.1 204 No Content
Host: weaver.example.com
Content-Length: 0
Content-Type: application/x-netcdf
Content-ID: <output@f93a15be-6e16-11ea-b667-08002752172a>
Content-Location: https://example.com/wpsoutputs/f93a15be-6e16-11ea-b667-08002752172a/output/output_netcdf.nc
Link: <https://example.com/wpsoutputs/f93a15be-6e16-11ea-b667-08002752172a/output/output_netcdf.nc>; rel="output"; type="application/x-netcdf"
When results are resolved as transmissionMode: reference, either using Prefer: return=minimal
or response: raw parameters, leading to the creation of a File Reference link
directly returned as above (rather than embedded in a Document Result),
the generated reference will be reported using a HTTP Link header, for each applicable output, in order to fulfill
OGC API - Processes v1.0 Requirement 30.
However, given that such Link headers can result into conflicting rel: {outputID} with other Link
entries found in the response, and that they require additional parsing of the value to extract the URL,
a combination of Content-ID, Content-Type and Content-Location will also be provided.
Note
For cases where an output would represent an array of File References, returned Link
headers for each of these links will employ rel: "{outputID}.{index}" with their respective index from
the array.
To respect RFC 2392 definitions, Content-ID will use pattern <{outputID}@{jobID}> as unique identifier,
and <{outputID}.{index}@{jobID}> in the case of an array of File References.
When the number of requested outputs [1] is more than one, the obtained response will depend
on the negotiated Accept content header and the data/link resolution of each output.
If all outputs are File References and no
Acceptheader was specified, a no-content response with aLinkfor each output similarly to the above Results for a single output returned directly by reference is returned.If a
response=documentorPrefer: return=minimalresolution is requested, outputs are embedded in the Document Result contents.If either
multipartcontents (RFC 2046 Section 5.1) are explicitly requested byAcceptheader, or that the above cases were not encountered, a multipart content response as shown below is returned [3].
The resolution of the nested outputs within each boundary, either by value or reference, will resolve for each respective output according to the same rule conditions specified above for single output.
raw) with minimal preferenceHTTP/1.1 200 OK
Host: weaver.example.com
Content-Type: multipart/mixed; boundary=43003e2f205a180ace9cd34d98f911ff
--43003e2f205a180ace9cd34d98f911ff
Content-Type: application/x-netcdf
Content-ID: <output-file@f93a15be-6e16-11ea-b667-08002752172a>
Content-Location: https://example.com/wpsoutputs/f93a15be-6e16-11ea-b667-08002752172a/output-file/output_netcdf.nc
--43003e2f205a180ace9cd34d98f911ff
Content-Type: text/plain
Content-ID: <output-data@f93a15be-6e16-11ea-b667-08002752172a>
3.1416
--43003e2f205a180ace9cd34d98f911ff--
Note that, in the above response, the Content-Location is used for the output-file, whereas the data
is directly returned for the output-data. This is based on Weaver auto-resolving transmissionMode: reference
for a File Reference result, while using transmissionMode: value by default for literal
data types. This also assumes that response: raw was requested, and that no transmissionMode were specified.
If transmissionMode: value under output-file in the requested outputs [1] was used
(or alternatively, if Prefer: return=representation was specified),
the data of the file would be directly included inline within the response instead of using Content-Location,
similarly to the Single Output Value example,
but with its contents nested within its respective boundaries for the corresponding Content-ID.
Job Inputs
In order to better understand the parameters that were originally submitted during Job creation,
the GET {WEAVER_URL}/jobs/{jobID}/inputs (Inputs) can be employed. This will return both the data and reference inputs that were submitted,
as well as the requested outputs [1] to retrieve any relevant transmissionMode, format, etc.
parameters that where specified during submission of the Execution Body, and any other relevant headers
that can affect the Execution Mode and Execution Results.
For convenience, this endpoint also returns relevant links applicable for the requested Job.
{
"inputs": {
"calc": "4.26 * ((C / A) ** 3.94)",
"band_a": {
"href": "https://example.com/wpsoutputs/weaver/users/23/6f197568-38f5-42f4-851c-0c56d446094c/product/T29SPC_20190601T110621_B02_10m.jp2",
"type": "image/jp2"
},
"band_c": {
"href": "https://example.com/wpsoutputs/weaver/users/23/977799a0-bf63-4406-a419-6d686c9a8fc9/product/T29SPC_20190601T110621_B04_10m.jp2",
"type": "image/jp2"
},
"name": "output"
},
"outputs": {
"result": {
"transmissionMode": "reference"
}
},
"links": [
{
"title": "Job status.",
"hreflang": "en-CA",
"href": "https://example.com/weaver/processes/calculate-band/jobs/034151ec-a87e-41ed-8ab4-8afb22b48e96",
"type": "application/json",
"rel": "status"
},
{
"title": "Job status generic endpoint.",
"hreflang": "en-CA",
"href": "https://example.com/weaver/jobs/034151ec-a87e-41ed-8ab4-8afb22b48e96",
"type": "application/json",
"rel": "alternate"
},
{
"title": "New job submission endpoint for the corresponding process.",
"hreflang": "en-CA",
"href": "https://example.com/weaver/processes/calculate-band/jobs/execution",
"type": "application/json",
"rel": "http://www.opengis.net/def/rel/ogc/1.0/execute"
},
{
"title": "Submitted job inputs for process execution.",
"hreflang": "en-CA",
"href": "https://example.com/weaver/processes/calculate-band/jobs/034151ec-a87e-41ed-8ab4-8afb22b48e96/inputs",
"type": "application/json",
"rel": "inputs"
},
{
"title": "Job outputs of successful process execution (extended outputs with metadata).",
"hreflang": "en-CA",
"href": "https://example.com/weaver/processes/calculate-band/jobs/034151ec-a87e-41ed-8ab4-8afb22b48e96/outputs",
"type": "application/json",
"rel": "outputs"
},
{
"title": "Job results of successful process execution (direct output values mapping).",
"hreflang": "en-CA",
"href": "https://example.com/weaver/processes/calculate-band/jobs/034151ec-a87e-41ed-8ab4-8afb22b48e96/results",
"type": "application/json",
"rel": "http://www.opengis.net/def/rel/ogc/1.0/results"
},
{
"title": "Job statistics collected following process execution.",
"hreflang": "en-CA",
"href": "https://example.com/weaver/processes/calculate-band/jobs/034151ec-a87e-41ed-8ab4-8afb22b48e96/statistics",
"type": "application/json",
"rel": "statistics"
},
{
"title": "List of collected job logs during process execution.",
"hreflang": "en-CA",
"href": "https://example.com/weaver/processes/calculate-band/jobs/034151ec-a87e-41ed-8ab4-8afb22b48e96/logs",
"type": "application/json",
"rel": "logs"
}
]
}
Note
The links presented above are not an exhaustive list to keep the example relatively small.
If the Job is still pending execution, the parameters returned by this endpoint can be modified using the Updating a Job operation before submitting it.
Job Exceptions
In situations where the Job resulted into failed status, the GET {WEAVER_URL}/processes/{processID}/jobs/{jobID}/exceptions (GetLogs) can be used to retrieve
the potential cause of failure, by capturing any raised exception. Below is an example of such exception details.
[
"builtins.Exception: Could not read status document after 5 retries. Giving up."
]
The returned exception are often better understood when compared against, or in conjunction with, the logs that provide details over each step of the operation.
Job Logs
Any Job executed by Weaver will provide minimal information log, such as operation setup, the moment when it started execution and latest status. The extent of other log entries will more often than not depend on the verbosity of the underlying process being executed. When executing an Application Package, Weaver tries as best as possible to collect standard output and error steams to report them through log and exception lists.
Since Weaver can only report as much details as provided by the running application, it is recommended by
Application Package implementers to provide progressive status updates when developing their package
in order to help understand problematic steps in event of process execution failures. In the case of remote WPS
processes monitored by Weaver for example, this means gradually reporting process status updates
(e.g.: calling WPSResponse.update_status if you are using PyWPS, see: Progress and Status Report), while using print
and/or logging operation for scripts or Docker images executed through CWL CommandLineTool.
Note
Job logs and exceptions are a Weaver-specific implementation. They are not part of traditional OGC API - Processes.
A minimalistic example of logging output is presented below. This can be retrieved using GET {WEAVER_URL}/processes/{processID}/jobs/{jobID}/logs (GetLogs) request, at any
moment during Job execution (with logs up to that point in time) or after its completion (for full output).
Note again that the more the Process is verbose, the more tracking will be provided here.
[
"[2021-03-02 03:32:39] INFO [weaver.datatype.Job] 0:00:00 1% accepted Job task setup completed.",
"[2021-03-02 03:32:39] INFO [weaver.datatype.Job] 0:00:00 2% accepted Execute WPS request for process [ncdump]",
"[2021-03-02 03:32:39] INFO [weaver.datatype.Job] 0:00:01 3% accepted Fetching job input definitions.",
"[2021-03-02 03:32:39] INFO [weaver.datatype.Job] 0:00:01 4% accepted Fetching job output definitions.",
"[2021-03-02 03:32:41] INFO [weaver.datatype.Job] 0:00:02 5% accepted Starting job process execution.",
"[2021-03-02 03:32:41] INFO [weaver.datatype.Job] 0:00:02 5% accepted Following updates could take a while until the Application Package answers...",
"[2021-03-02 03:32:43] DEBUG [weaver.datatype.Job] 0:00:05 6% accepted Updated job status location: [/tmp/14c68477-c3ed-4784-9c0f-a4c9e1344db5.xml].",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 7% running Starting monitoring of job execution.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 8% running [2021-03-02 03:32:41] INFO [weaver.processes.wps_package.ncdump] 1% running Preparing package logs done.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 9% running [2021-03-02 03:32:41] INFO [weaver.processes.wps_package.ncdump] 2% running Launching package...",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 11% running [2021-03-02 03:32:41] INFO [weaver.processes.wps_package.ncdump] Visible application CWL euid:egid [1000:1000]",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 13% running [2021-03-02 03:32:41] DEBUG [weaver.processes.wps_package.ncdump] Using cwltool.RuntimeContext args:",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 15% running {",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 17% running \"no_read_only\": false,",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 19% running \"no_match_user\": false,",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 21% running \"tmpdir_prefix\": \"/tmp/weaver-hybrid/cwltool_tmp_\",",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 23% running \"tmp_outdir_prefix\": \"/tmp/weaver-hybrid/cwltool_out_\",",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 25% running \"outdir\": \"/tmp/weaver-hybrid/pywps_process_ughz7_oc\",",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 27% running \"debug\": true",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 29% running }",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 31% running [2021-03-02 03:32:41] INFO [cwltool] Resolved '/tmp/tmpdkg7lj26/ncdump' to 'file:///tmp/tmpdkg7lj26/ncdump'",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 33% running [2021-03-02 03:32:41] INFO [cwltool] ../../../../tmp/tmpdkg7lj26/ncdump:1:1: Unknown hint file:///tmp/tmpdkg7lj26/WPS1Requirement",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 35% running [2021-03-02 03:32:41] INFO [weaver.processes.wps_package.ncdump] 5% running Loading package content done.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 37% running [2021-03-02 03:32:41] INFO [weaver.processes.wps_package.ncdump] 6% running Retrieve package inputs done.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 39% running [2021-03-02 03:32:41] INFO [weaver.processes.wps_package.ncdump] File input (dataset) SKIPPED fetch: [https://schema-example.com/data/test.nc]",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 41% running [2021-03-02 03:32:42] INFO [weaver.processes.wps_package.ncdump] 8% running Convert package inputs done.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 43% running [2021-03-02 03:32:42] INFO [weaver.processes.wps_package.ncdump] 10% running Running package...",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 45% running [2021-03-02 03:32:42] DEBUG [weaver.processes.wps_package.ncdump] Launching process package with inputs:",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 47% running {",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 49% running \"dataset\": {",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 51% running \"location\": \"https://schema-example.com/data/test.nc\",",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 52% running \"class\": \"File\",",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 54% running \"format\": \"http://edamontology.org/format_3650\"",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 56% running }",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 58% running }",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 60% running [2021-03-02 03:32:42] INFO [weaver.processes.wps_package.ncdump] 10% running Preparing to launch package ncdump.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 62% running [2021-03-02 03:32:42] INFO [weaver.processes.wps_package.ncdump] WPS-1 Package resolved from requirement/hint: WPS1Requirement",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 64% running [2021-03-02 03:32:42] INFO [weaver.processes.wps_package.ncdump] 11% running https://schema-example.com/remote-wps [ncdump] - Preparing execute request for remote WPS1 provider.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 66% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 14% running https://schema-example.com/remote-wps [ncdump] - Executing job on remote WPS1 provider.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 68% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 18% running https://schema-example.com/remote-wps [ncdump] - Monitoring job on remote WPS1 provider : [https://schema-example.com/remote-wps]",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 70% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 86% running https://schema-example.com/remote-wps [ncdump] - 100% succeeded PyWPS Process NCDump finished",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 72% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 86% running https://schema-example.com/remote-wps [ncdump] - Fetching job outputs from remote WPS1 provider.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 74% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 95% succeeded https://schema-example.com/remote-wps [ncdump] - Execution on remote WPS1 provider completed.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 76% running [2021-03-02 03:32:43] DEBUG [cwltool] Moving /tmp/weaver-hybrid/cwltool_out_fitllvxx/output.txt to /tmp/weaver-hybrid/pywps_process_ughz7_oc/output.txt",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 78% running [2021-03-02 03:32:43] DEBUG [cwltool] Moving /tmp/weaver-hybrid/cwltool_out_fitllvxx/stderr.log to /tmp/weaver-hybrid/pywps_process_ughz7_oc/stderr.log",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 80% running [2021-03-02 03:32:43] DEBUG [cwltool] Moving /tmp/weaver-hybrid/cwltool_out_fitllvxx/stdout.log to /tmp/weaver-hybrid/pywps_process_ughz7_oc/stdout.log",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 82% running [2021-03-02 03:32:43] DEBUG [cwltool] Removing intermediate output directory /tmp/weaver-hybrid/cwltool_out_7mmdwd6w",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 84% running [2021-03-02 03:32:43] DEBUG [cwltool] Removing intermediate output directory /tmp/weaver-hybrid/cwltool_out_fitllvxx",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 86% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 95% running Package execution done.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 88% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 95% running Nothing captured from internal application logs.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 90% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] Resolved WPS output [output] as file reference: [/tmp/weaver-hybrid/pywps_process_ughz7_oc/output.txt]",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 92% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 98% running Generate package outputs done.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 94% running [2021-03-02 03:32:43] INFO [weaver.processes.wps_package.ncdump] 100% succeeded Package complete.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 96% succeeded Job succeeded (status: Package complete.).",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 98% succeeded Job succeeded.",
"[2021-03-02 03:32:43] INFO [weaver.datatype.Job] 0:00:05 100% succeeded Job task complete."
]
Job Provenance
Added in version 6.1.
The provenance endpoints allow to obtain W3C PROV metadata from a successfully completed Job using various representations. This provenance information can help identify traceability information such as the input data sources, validate output checksums, and understand all internal Process data transformations that were involved within an executed Workflow.
The PROV metadata consists of information records about entities, activities, and people involved in producing a piece of data or thing [^], which can be used to form assessments about its quality, reliability or trustworthiness.
Provenance Resource Relationships [PROV-O: The PROV Ontology]
The provenance endpoints are provided in alignment with the OGC API - Processes - Part 4: Job Management provenance class requirement. However, Weaver also provides additional functionalities in comparison to the minimal requirements from the OGC specification.
Following is a table of available formats and corresponding endpoints offered by Weaver.
Endpoint |
PROV Format |
Description |
|
|---|---|---|---|
|
|
Provenance metadata using JSON representation. |
|
|
|
Provenance metadata using JSON Linked Data representation. |
|
|
|
Provenance metadata using XML representation. |
|
|
|
Provenance metadata using the main PROV notation representation. |
|
|
PROV-NT |
|
Provenance metadata using RDF N-Triples (NT) representation. |
|
PROV-TURTLE |
|
Provenance metadata using RDF Turtle (TTL) representation. |
|
n/a |
|
Metadata about the Research Object packaging information. |
|
n/a |
|
Metadata of who ran the Job. |
|
n/a |
|
Obtain the list of |
|
n/a |
|
Metadata of the main Job and any nested step runs in the case of a Workflow. |
|
n/a |
|
Metadata about the Job input IDs. |
|
n/a |
|
Metadata about the Job output IDs. |
|
n/a |
|
Same as their respective definitions above, but for a specific step of a Workflow. |
See also
This feature is enabled by default. Its functionality and the corresponding API endpoints
can be controlled using Configuration Option weaver.cwl_prov.
Resulting metadata that is collected from Job Provenance will be stored under a similar endpoint
as the Outputs Location, except with an additional -prov suffix applied after the Job UUID,
as shown below.
This location is selected to conveniently offer the PROV metadata with a different parent directory than
the Job outputs, therefore allowing different endpoint access control schemes between the PROV metadata
and actual output data, while also reusing the configured Outputs Location that can be used to quickly
serve Provenance contents without any additional configuration.
{WPS_OUTPUT_URL}[/{WPS_OUTPUT_CONTEXT}]/{JOB_UUID}-prov
Job Statistics
Note
This feature is specific to Weaver.
The GET {WEAVER_URL}/jobs{jobID}/statistics request can be performed to obtain runtime statistics from the Job.
This content is only available when a Job has successfully completed.
Below is a sample of possible response. Some parts might be omitted according to the
internal Application Package of the Process represented by the Job execution.
{
"application": {
"usedMemory": "3.000 MiB",
"usedMemoryBytes": 3145728
},
"process": {
"rss": "139.531 MiB",
"rssBytes": 146309120,
"uss": "84.535 MiB",
"ussBytes": 88641536,
"vms": "1.388 GiB",
"vmsBytes": 1490432000,
"usedThreads": 11,
"usedCPU": 5,
"usedHandles": 0,
"usedMemory": "13.734 MiB",
"usedMemoryBytes": 14401536,
"totalSize": "3.000 B",
"totalSizeBytes": 3
},
"outputs": {
"output": {
"size": "3.000 B",
"sizeBytes": 3
}
}
}
Content Negotiation
Content negotiation can be performed using multiple query parameters and headers when submitting a Job execution request, when retrieving the Job results, or when inspecting the Process description to execute. Content negotiation also happens within the submitted Execution Body contents for respective Process inputs and outputs that should be employed or produced by the corresponding Job execution. Following is a summary of relevant parameters impacting content negotiation.
Parameter |
Location |
Description |
Allowed Encoding [#noteParamEncoding]_ |
Example |
|---|---|---|---|---|
|
Query |
Requested content Media-Type (explicitly or implicitly using a shorthand notation).
Equivalent to the |
|
|
|
Query |
Requested Profile representation of the contents in the response. Equivalent to the various headers below involving a Profile resolution, but with a higher precedence. See Profile Resolution Order for details. |
|
|
|
Query |
Alternate parameter to |
|
|
|
Header |
Requested content Media-Type for the response.
Can include an optional |
Media-Type with optional |
|
|
Header |
Alternate Profile representation to request for the response when the Media-Type
does not support the |
|
|
|
Header |
Typically involved in Execution Mode and Execution Results request strategies,
but it can simultaneously be used to request a specific Profile representation of the response,
as per RFC 7240 that allows additional parameters to be specified.
Because it is a preference, its application is not guaranteed. The |
Shorthand Notation or URI (for |
|
|
Header |
Alternate method request a specific Profile as per RFC 6906. See Profile Resolution Order for details. |
|
Notes
Depending on the location and the specific name of the query parameters or headers, certain encodings are enforced.
For example, the Link header is defined by RFC 6906, which requires its href to be an URI.
In such case, shorthand notation like rel=profile, href=cloud-optimized is not allowed.
Requesting this Profile would require using the corresponding full URI if available, or requires
using another header that allows the shorthand notation. See Shorthand Notation Identifiers for Content Negotiation for
common values.
See also
Shorthand Notation Identifiers for Content Negotiation
Following is a non-exhaustive list of common shorthand notations that can be used when involving content negotiation and their corresponding fully-defined URI or Media-Type representation.
Note
The shorthand notations are case-insensitive. Therefore, json and JSON are equivalent.
However, the corresponding URI or Media-Type representations must match exactly.
Shorthand Notation |
Specific Definition |
Description and Usage |
|---|---|---|
|
URI of the relevant schema indicated by OGC API - Processes - Profile Identifiers. |
Typically employed with |
|
n/a |
Same as the above |
|
n/a |
Typically employed with |
|
n/a |
Same as previous elements by combining the |
|
URI of the schema under |
Typically employed with |
|
URI of the relevant |
Typically employed with |
|
Media-Type |
Typically employed with |
|
Media-Type |
Typically employed with |
|
Media-Type |
Typically employed with |
|
Media-Type |
Typically employed with |
Profile Resolution Order
Possible locations where Profile can be specified are, in order of precedence:
profilequery parameter.Accept-Profileheader directly providing the profile URI.AcceptMedia-Type with aprofileparameter.Preferheader including aprofileparameter.Linkheader including aprofileparameter.
See also
Implementation of the resolution order in Weaver is provided in
weaver.utils.get_response_profile().
Note
The precedence requirement is mostly predominant regarding the use of a profile query parameter in contrast
to any other header variant. This is simply due to the fact that inserting a query parameter is the simplest
method to provide a Profile, especially in the case of Web browsers where headers are more
complicated to include in the request. Therefore, combining multiple headers approaches simultaneously with
distinct Profile values is considered undefined or random behavior by referenced standards.
In Weaver, the prioritization strategy is defined in terms of most explicit and most common header names to
least probable ones regarding where the Profile is potentially located. Another consideration for the order
is the “strictness requirement” aspect of each header. The Accept header imposes a strict refusal of the
request (406 Not Acceptable) if the Profile is not supported for a given endpoint, while the Prefer
header is more relaxed and fulfillment is optional (the server is allowed to ignore it and respond successfully).
The Link header is placed last, to potentially allow Prefer priority if a given Profile can be
respected, and revert back to Profile specified by Link otherwise. This allows the simultaneous
submission of Prefer: profile=... and Link: profile=... headers in a request with flexible outcomes between
clients and servers supporting different Profiles interoperability. In this case, the Link header can be
used to provide a fallback if the Profile in Prefer header cannot not be respected or resolved by the server
for the given request context. Fulfilling the Profile in Link header is “more important” in this fallback
scenario, but still NOT mandatory, contrary to the Accept and Accept-Profile headers.
Uploading File to the Vault
The Vault is available as secured storage for uploading files to be employed later for Process execution (see also File Vault Inputs).
Note
The Vault is a specific feature of Weaver. Other ADES, EMS and OGC API - Processes
servers are not expected to provide this endpoint nor support the vault://<UUID> reference format.
See also
Refer to Configuration of File Vault for applicable settings for this feature.
When upload succeeds, the response will return a Vault UUID and an access_token to access the file.
Uploaded files cannot be accessed unless the proper credentials are provided. Requests toward the Vault should
therefore include a X-Auth-Vault: token {access_token] header in combination to the provided Vault UUID in
the request path to retrieve the file contents. The upload response will also include a file_href field formatted
with a vault://<UUID> reference to be used for File Vault Inputs, as well as a Content-Location header of the
contextual Vault endpoint for that file.
Download of the file is accomplished using the Vault File Download (GET) request.
In order to either obtain the file metadata without downloading it, or simply to validate its existence,
the Vault File Details (HEAD) request can be used. This HEAD request can be queried any number of times without affecting
the file from the Vault. For both HTTP methods, the X-Auth-Vault header is required.
Note
The Vault acts only as temporary file storage. For this reason, once the file has been downloaded, it is immediately deleted. Download can only occur once. It is assumed that the resource that must employ it will have created a local copy from the download and the Vault doesn’t require to preserve it anymore. This behaviour intends to limit the duration for which potentially sensitive data remains available in the Vault as well as performing cleanup to limit storage space.
Using the Weaver CLI or Python client, it is possible to upload local files automatically to the
Vault of a remote Weaver server. This can help users host their local file for remote Process
execution. By default, the Weaver CLI and Client will automatically convert any local file path provided as execution input into
a vault://<UUID> reference to make use of the Vault self-hosting from the target Weaver instance. It will also
update the provided inputs or execution body to apply any transformed vault://<UUID> references transparently. This will
allow the executed Process to securely retrieve the files using File Vault Inputs behaviour. Transmission
of any required authorization headers is also handled automatically when using this approach.
It is also possible to manually provide vault://<UUID> references or endpoints if those were uploaded beforehand using
the upload operation, but the user must also generate the X-Auth-Vault header manually in such case.
See also
Section File Vault Inputs provides more details about the format of X-Auth-Vault for submission
of multiple inputs.
In order to manually upload files, the below code snippet can be employed.
import json
import requests
PATH = "/path/to/local/file.json"
with open(PATH, "w", encoding="utf-8") as file:
json.dump({"input": "data"}, file)
# provide the desired name and format Media-Type
files = {
"file": (
"desired-name.json",
open(PATH, "r", encoding="utf-8"),
"application/json; charset=UTF-8"
)
}
requests.post("https://weaver.example.com/vault", files=files, timeout=5)
This should automatically generate a similar request to the result below.
POST /vault HTTP/1.1
Host: weaver.example.com
Content-Type: multipart/form-data; boundary=43003e2f205a180ace9cd34d98f911ff
Content-Length: 202
--43003e2f205a180ace9cd34d98f911ff
Content-Disposition: form-data; name="file"; filename="desired-name.json"
Content-Type: application/json; charset=UTF-8
{"input": "data"}
--43003e2f205a180ace9cd34d98f911ff--
Warning
When providing literal HTTP request contents as above, make sure to employ CRLF instead of plain LF for
separating the data using the boundary. Also, make sure to omit any additional LF between the data and each
boundary if this could impact parsing of the data itself (e.g.: as in the case of non-text readable base64 data)
to avoid modifying the file contents during upload. Some additional newlines are presented in the above example
only for readability purpose. It is recommended to use utilities like the Python example or
the Weaver CLI so avoid such issues during request content generation.
Please refer to RFC 7578 Section 4.1 for more details regarding multipart content separators.
Note that the Content-Type embedded within the multipart content in the above example (not to be confused with the
actual Content-Type header of the request for uploading the file) can be important if the destination input of
the Process that will consume that Vault file for execution must provide a specific choice of
Media-Type if multiple are supported. This value could be employed to generate the explicit format portion of the
input, in case it cannot be resolved automatically from the file contents, or unless it is explicitly provided once
again for that input within the Execute request body.
WPS Endpoint
This endpoint is available if weaver.wps setting was enabled (true by default).
The specific location where WPS requests it will be accessible depends on the resolution
of relevant Configuration Settings, namely weaver.wps_path and weaver.wps_url.
Details regarding contents for each request is provided in schemas under WPS Endpoint Requests.
Note
Using the WPS endpoint allows fewer control over functionalities than the corresponding OGC API - Processes (WPS-REST) endpoints since it is the preceding standard.
Special Weaver EMS use-cases
This section highlight the additional behaviour available only through an EMS-configured Weaver instance. Some other points are already described in other sections, but are briefly indicated here for conciseness.
ADES dispatching using Data Sources
When using either the EMS or HYBRID [16] configurations, Process executions are dispatched to the relevant ADES or another HYBRID server supporting Process Deployment when inputs are matched against one of the configured Data Source. Minimal implementations of OGC API - Processes can also work as external Provider where to dispatch executions, but in the case of core implementations, the Process should be already available since it cannot be deployed.
In more details, when an Execute request is received, Weaver will analyse any file references in the specified inputs and try to match them against specified Data Source configuration. When a match is found and that the corresponding File Reference Types indicates that the reference is located remotely in a known Data Source provider that should take care of its processing, Weaver will attempt to Deploy the targeted Process (and the underlying Application Package) followed by its remote execution. It will then monitor the Job until completion and retrieve results if the full operation was successful.
The Data Source configuration therefore indicates to Weaver how to map a given data reference to a specific instance or server where that data is expected to reside. This procedure effectively allows Weaver to deliver applications close to the data which can be extremely more efficient (both in terms of time and quantity) than pulling the data locally when Data Source become substantial. Furthermore, it allows Data Source providers to define custom or private data retrieval mechanisms, where data cannot be exposed or offered externally, but are still available for use when requested.
Details
Configuration HYBRID applies here in cases where Weaver acts as an EMS for remote dispatch of Process execution based on applicable File Reference Types.
See also
Specific details about configuration of Data Source are provided in the Configuration of Data Sources section.
See also
Details regarding OpenSearch Data Source are also relevant when resolving possible matches of Data Source provider when the applicable File Reference Types are detected.
Workflow (Chaining Step Processes)
Todo
add details, explanation done in below reference
See also
Workflow process type