weaver.utils
Module Contents
- class weaver.utils.LoggerHandler[source]
Minimalistic logger interface (typically
logging.Logger
) intended to be used only withlog
method.
- class weaver.utils.Lazify(func: Callable[[], weaver.typedefs.Return])[source]
Wraps the callable for evaluation only on explicit call or string formatting.
Once string representation has been computed, it will be cached to avoid regenerating it on following calls.
Initialize the lazy-string representation.
- Parameters:
func – Callable that should return the computed string formatting.
- class weaver.utils.CaseInsensitive(_str: str)[source]
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
Initialize self. See help(type(self)) for accurate signature.
- class weaver.utils.HashList[source]
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
Initialize self. See help(type(self)) for accurate signature.
- class weaver.utils.HashDict[source]
dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs
- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v
- dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
Initialize self. See help(type(self)) for accurate signature.
- weaver.utils.json_hashable(func: weaver.typedefs.AnyCallableAnyArgs) Callable[[weaver.typedefs.AnyCallableAnyArgs], weaver.typedefs.Return] [source]
Decorator that will transform JSON-like dictionary and list arguments to an hashable variant.
By making the structure hashable, it can safely be cached with
functools.lru_cache()
orfunctools.cache()
. The decorator ignores other argument types expected to be already hashable.@json_hashable @functools.cache def function(json_data): ...
See also
Original inspiration: https://stackoverflow.com/a/44776960 The code is extended to allow recursively supporting JSON-like structures.
- class weaver.utils.SchemaRefResolver(base_uri: str, referrer: weaver.typedefs.OpenAPISchema, *_: Any, **__: Any)[source]
Reference resolver that supports both JSON and YAML files from a remote location.
- resolve_remote(uri: str) weaver.typedefs.OpenAPISchema [source]
Resolve a remote
uri
.If called directly, does not check the store first, but after retrieving the document at the specified URI it will be saved in the store if
cache_remote
is True.Note
If the requests library is present,
jsonschema
will use it to request the remoteuri
, so that the correct encoding is detected and used.If it isn’t, or if the scheme of the
uri
is nothttp
orhttps
, UTF-8 is assumed.Arguments:
uri (str):
The URI to resolve
Returns:
The retrieved document
- weaver.utils.get_weaver_url(container: weaver.typedefs.AnySettingsContainer) str [source]
Retrieves the home URL of the Weaver application.
- weaver.utils.get_any_id(info: MutableMapping[str, Any], default: weaver.typedefs.Default = None, pop: bool = False, key: bool = False) str | weaver.typedefs.Default [source]
Retrieves a dictionary id-like key using multiple common variations
[id, identifier, _id]
.- Parameters:
info – dictionary that potentially contains an id-like key.
default – Default identifier to be returned if none of the known keys were matched.
pop – If enabled, remove the matched key from the input mapping.
key – If enabled, return the matched key instead of the value.
- Returns:
value of the matched id-like key or
None
if not found.
- weaver.utils.get_any_value(info: MutableMapping[str, Any], default: Any = None, file: bool = True, data: bool = True, pop: bool = False, key: bool = False, extras: List[str] | None = None) weaver.typedefs.AnyValueType [source]
Retrieves a dictionary value-like key using multiple common variations
[href, value, reference, data]
.- Parameters:
info – Dictionary that potentially contains a value-like key.
default – Default value to be returned if none of the known keys were matched.
file – If enabled, file-related key names will be considered.
data – If enabled, data-related key names will be considered.
pop – If enabled, remove the matched key from the input mapping.
key – If enabled, return the matched key instead of the value.
extras – If provided, additional key names to be considered.
- Returns:
Value (or key if requested) of the matched value-like key or
None
if not found.
- weaver.utils.get_any_message(info: weaver.typedefs.JSON, default: str = '') str [source]
Retrieves a dictionary ‘value’-like key using multiple common variations [message, description, detail].
- Parameters:
info – Dictionary that potentially contains a ‘message’-like key.
default – Default message if no variation could be matched.
- Returns:
value of the matched ‘message’-like key or the default string if not found.
- weaver.utils.is_celery() bool [source]
Detect if the current application was executed as a
celery
command.
- weaver.utils.get_registry(container: weaver.typedefs.AnyRegistryContainer | None = None, nothrow: bool = False) pyramid.registry.Registry | None [source]
Retrieves the application
registry
from various containers referencing to it.
- weaver.utils.get_settings(container: weaver.typedefs.AnySettingsContainer | None = None) weaver.typedefs.SettingsType [source]
Retrieves the application
settings
from various containers referencing to it.
- weaver.utils.get_header(header_name: str, header_container: weaver.typedefs.AnyHeadersContainer, default: str | None, str | List[str] | None, bool = None, pop: bool = False, concat: bool = False) str | List[str] | None [source]
Find the specified header within a header container.
Retrieves
header_name
by fuzzy match (independently of upper/lower-case and underscore/dash) from various framework implementations of Headers.- Parameters:
header_name – Header to find.
header_container – Where to look for
header_name
.default – Returned value if
header_container
is invalid orheader_name
is not found.pop – Remove the matched header(s) by name from the input container.
concat – Allow parts of the header name to be concatenated without hyphens/underscores. This can be the case in some S3 responses. Disabled by default to avoid unexpected mismatches, notably for shorter named headers.
- Returns:
Found header if applicable, or the default value.
- weaver.utils.get_cookie_headers(header_container: weaver.typedefs.AnyHeadersContainer, cookie_header_name: str | None = 'Cookie') weaver.typedefs.HeadersType [source]
Looks for
cookie_header_name
header withinheader_container
.- Returns:
new header container in the form
{'Cookie': <found_cookie>}
if it was matched, or empty otherwise.
- weaver.utils.get_request_args(request: weaver.typedefs.AnyRequestType) weaver.typedefs.AnyRequestQueryMultiDict [source]
Extracts the parsed query string arguments from the appropriate request object strategy.
Depending on the request implementation, attribute
query_string
are expected asbytes
(werkzeug
) orstr
(pyramid
,webob
). Thequery_string
attribute is then used byargs
andparams
for respective implementations, but assuming their string-like formats are respected.
- weaver.utils.parse_kvp(query: str, key_value_sep: str = '=', pair_sep: str = ';', nested_pair_sep: str | None = '', multi_value_sep: str | None = ',', accumulate_keys: bool = True, unescape_quotes: bool = True, strip_spaces: bool = True, case_insensitive: bool = True) weaver.typedefs.KVP [source]
Parse key-value pairs using specified separators.
All values are normalized under a list, whether they have a unique or multi-value definition. When a key is by itself (without separator and value), the resulting value will be an empty list.
When
accumulate_keys
is enabled, entries such as{key}={val};{key}={val}
will be joined together under the same list as if they were specified using directly{key}={val},{val}
(default separators employed only for demonstration purpose). Both nomenclatures can also be employed simultaneously.When
nested_pair_sep
is provided, definitions that contain nestedkey_value_sep
character within an already established KVP will be parsed once again. This will parse{key}={subkey1}={val1},{subkey2}={val2}
into a nested KVP dictionary as value under the top level KVP entry{key}
. Separators are passed down for nested parsing, exceptpair_sep
that is replaced bynested_pair_sep
.>> parse_kvp("format=json&inputs=key1=value1;key2=val2,val3", pair_sep="&", nested_pair_sep=";") { 'format': ['json'], 'inputs': { 'key1': ['value1'], 'key2': ['val2', 'val3'] } }
- Parameters:
query – Definition to be parsed as KVP.
key_value_sep – Separator that delimits the keys from their values.
pair_sep – Separator that distinguish between different
(key, value)
entries.nested_pair_sep – Separator to parse values of pairs containing nested KVP definition.
multi_value_sep – Separator that delimits multiple values associated to the same key. If empty or
None
, values will be left as a single entry in the list under the key.accumulate_keys – Whether replicated keys should be considered equivalent to multi-value entries.
unescape_quotes – Whether to remove single and double quotes around values.
strip_spaces – Whether to remove spaces around values after splitting them.
case_insensitive – Whether to consider keys as case-insensitive. If
True
, resulting keys will be normalized to lowercase. Otherwise, original keys are employed.
- Returns:
Parsed KVP.
- Raises:
HTTPBadRequest – If parsing cannot be accomplished based on parsing conditions.
- weaver.utils.get_url_without_query(url: str | urllib.parse.ParseResult) str [source]
Removes the query string part of an URL.
- class weaver.utils.VersionLevel[source]
Constants container that provides similar functionalities to
ExtendedEnum
without explicit Enum membership.
- class weaver.utils.VersionFormat[source]
Constants container that provides similar functionalities to
ExtendedEnum
without explicit Enum membership.
- weaver.utils.as_version_major_minor_patch(version: weaver.typedefs.AnyVersion, version_format: weaver.typedefs.Literal[VersionFormat]) weaver.compat.Version [source]
- weaver.utils.as_version_major_minor_patch(version: weaver.typedefs.AnyVersion, version_format: weaver.typedefs.Literal[VersionFormat]) str
- weaver.utils.as_version_major_minor_patch(version: weaver.typedefs.AnyVersion, version_format: weaver.typedefs.Literal[VersionFormat]) Tuple[int, int, int]
- weaver.utils.as_version_major_minor_patch(version: weaver.typedefs.AnyVersion) Tuple[int, int, int]
Generates a
MAJOR.MINOR.PATCH
version with padded zeros for any missing parts.
- weaver.utils.is_update_version(version: weaver.typedefs.AnyVersion, taken_versions: Iterable[weaver.typedefs.AnyVersion], version_level: VersionLevel = VersionLevel.PATCH) typing_extensions.TypeGuard[weaver.typedefs.AnyVersion] [source]
Determines if the version corresponds to an available update version of specified level compared to existing ones.
If the specified version corresponds to an older version compared to available ones (i.e.: a taken more recent version also exists), the specified version will have to fit within the version level range to be considered valid. For example, requesting
PATCH
level will require that the specified version is greater than the last available version against other existing versions with equivalentMAJOR.MINOR
parts. If1.2.0
and2.0.0
were taken versions, and1.2.3
has to be verified as the update version, it will be considered valid since itsPATCH
number3
is greater than all other1.2.x
versions (it falls within the[1.2.x, 1.3.x[
range). Requesting insteadMINOR
level will require that the specified version is greater than the last available version against existing versions of sameMAJOR
part only. Using again the same example values,1.3.0
would be valid since itsMINOR
part3
is greater than any other1.x
taken versions. On the other hand, version1.2.4
would not be valid asx = 2
is already taken by other versions considering same1.x
portion (PATCH
part is ignored in this case sinceMINOR
is requested, and2.0.0
is ignored as not the sameMAJOR
portion of1
as the tested version). Finally, requesting aMAJOR
level will require necessarily that the specified version is greater than all other existing versions for update, sinceMAJOR
is the highest possible semantic part, and higher parts are not available to define an upper version bound.Note
As long as the version level is respected, the actual number of this level and all following ones can be anything as long as they are not taken. For example,
PATCH
with existing1.2.3
does not require that the update version be1.2.4
. It can be1.2.5
,1.2.24
, etc. as long as1.2.x
is respected. Similarly,MINOR
update can provide anyPATCH
number, since1.x
only must be respected. From existing1.2.3
,MINOR
update could specify1.4.99
as valid version. ThePATCH
part does not need to start back at0
.- Parameters:
version – Version to validate as potential update revision.
taken_versions – Existing versions that cannot be reused.
version_level – Minimum level to consider availability of versions as valid revision number for update.
- Returns:
Status of availability of the version.
- weaver.utils.is_uuid(maybe_uuid: Any) typing_extensions.TypeGuard[weaver.typedefs.AnyUUID] [source]
Evaluates if the provided input is a UUID-like string.
- weaver.utils.parse_extra_options(option_str: str, sep: str = ',') Dict[str, str | None] [source]
Parses the extra options parameter.
The
option_str
is a string with coma separatedopt=value
pairs.tempdir=/path/to/tempdir,archive_root=/path/to/archive
- Parameters:
option_str – A string parameter with the extra options.
sep – separator to employ in order to split the multiple values within the option string.
- Returns:
A dict with the parsed extra options.
- weaver.utils.extend_instance(obj: OriginalClass, cls: Type[ExtenderMixin]) ExtendedClass [source]
Extend an existing instance of a given class by applying new definitions from the specified mixin class type.
- weaver.utils.fully_qualified_name(obj: Any | Type[Any]) str [source]
Obtains the full path definition of the object to allow finding and importing it.
For classes, functions and exceptions, the following format is returned:
module.name
The
module
is omitted if it is a builtin object or type.For methods, the class is also represented, resulting in the following format:
module.class.name
- weaver.utils.import_target(target: str, default_root: str | None = None) Any | None [source]
Imports a target resource class or function from a Python script as module or directly from a module reference.
The Python script does not need to be defined within a module directory (i.e.: with
__init__.py
). Files can be imported from virtually anywhere. To avoid name conflicts in generated module references, each imported target employs its full escaped file path as module name.Formats expected as follows:
"path/to/script.py:function" "path/to/script.py:Class" "module.path.function" "module.path.Class"
- Parameters:
target – Resource to be imported.
default_root – Root directory to employ if target is relative (default
magpie.constants.MAGPIE_ROOT
).
- Returns:
Found and imported resource or None.
- weaver.utils.open_module_resource_file(module: str | types.ModuleType, file_path: str) IO[str] [source]
Opens a resource (data file) from an installed module.
- Returns:
File stream handler to read contents as needed.
- weaver.utils.now(tz_name: str | None = None) datetime.datetime [source]
Obtain the current time with timezone-awareness.
- Parameters:
tz_name – If specified, returned current time will be localized to specified timezone.
- weaver.utils.wait_secs(run_step: int = -1) int [source]
Obtain a wait time in seconds within increasing delta intervals based on iteration index.
- weaver.utils.localize_datetime(dt: datetime.datetime, tz_name: str | None = None) datetime.datetime [source]
Provide a timezone-aware datetime for a given datetime and timezone name.
Warning
Any datetime provided as input that is not already timezone-aware will be assumed to be relative to the current locale timezone. This is the default returned by naive
datetime.datetime
instances.If no timezone name is provided, the timezone-aware datatime will be localized with locale timezone offset. Otherwise, the desired localization will be applied with the specified timezone offset.
- weaver.utils.get_file_header_datetime(dt: datetime.datetime) str [source]
Obtains the standard header datetime representation.
See also
Format of the date defined in RFC 5322#section-3.3.
- weaver.utils.get_href_headers(path: str, download_headers: bool = False, location_headers: bool = True, content_headers: bool = False, content_type: str | None = None, content_disposition_type: weaver.typedefs.Literal[attachment, inline] = 'attachment', content_location: str | None = None, content_name: str | None = None, content_id: str | None = None, missing_ok: bool = False, settings: weaver.typedefs.SettingsType | None = None, **option_kwargs) MetadataResult [source]
Obtain headers applicable for the provided file or directory reference.
- Return type:
- Parameters:
path – File to describe. Either a local path or remote URL.
download_headers – If enabled, add the
Content-Disposition
header with attachment/inline filename for downloading the file. If the reference is a directory, this parameter is ignored, since files must be retrieved individually.location_headers – If enabled, add the
Content-Location
header referring to the input location.content_headers – If enabled, add other relevant
Content-
prefixed headers.content_type – Explicit
Content-Type
to provide. Otherwise, use default guessed by file system (oftenapplication/octet-stream
). If the reference is a directory, this parameter is ignored andapplication/directory
will be enforced. Requires thatcontent_headers
is enabled.content_disposition_type – Whether
inline
orattachment
should be used. Requires thatcontent_headers
anddownload_headers
are enabled.content_location – Override
Content-Location
to include in headers. Otherwise, defaults to thepath
. Requires thatlocation_headers
andcontent_headers
are enabled in each case.content_name – Optional
name
parameter to assign in theContent-Disposition
header. Requires thatcontent_headers
anddownload_headers
are enabled.content_id – Optional
Content-ID
to include in the headers. Requires thatcontent_headers
is enabled. This should be a uniquely identifiable reference across the server (not just within a specific response), which can be used for cross-referencing by{cid:<>}
within and between multipart document contents. For a generic ID or field name, employcontent_name
instead.missing_ok – If the referenced resource does not exist (locally or remotely as applicable), and that content information to describe it cannot be retrieved, either raise an error (default) or resume with the minimal information details that could be resolved.
settings – Application settings to pass down to relevant utility functions.
- Returns:
Headers for the reference.
- weaver.utils.make_link_header(href: str | weaver.typedefs.Link, hreflang: str | None = None, rel: str | None = None, type: str | None = None, title: str | None = None, charset: str | None = None) str [source]
Creates the HTTP Link (RFC 8288) header value from input parameters or a dictionary representation.
Parameter names are specifically selected to allow direct unpacking from the dictionary representation. Otherwise, a dictionary can be passed as the first parameter, allowing other parameters to act as override values. Alternatively, all parameters can be supplied individually.
Note
Parameter
rel
is optional to allow unpacking with a single parameter, but its value is required to form a validLink
header.
- weaver.utils.parse_link_header(link_header: str) weaver.typedefs.Link [source]
Parses the parameters of the
Link
header.
- weaver.utils.explode_headers(headers: weaver.typedefs.AnyHeadersContainer | None) webob.headers.ResponseHeaders [source]
Explodes comma-separated headers in containers to a flattened list with repeated header names for each value.
- weaver.utils.ows_context_href(href: str, partial: bool | None = False) weaver.typedefs.JSON [source]
Retrieves the complete or partial dictionary defining an
OWSContext
from a reference.
- weaver.utils.pass_http_error(exception: Exception, expected_http_error: Type[pyramid.httpexceptions.HTTPError] | Iterable[Type[pyramid.httpexceptions.HTTPError]]) None [source]
Silently ignore a raised HTTP error that matches the specified error code of the reference exception class.
Given an
HTTPError
of any type (pyramid
,requests
), ignores the exception if the actual error matches the status code. Other exceptions are re-raised. This is equivalent to capturing a specificException
within anexcept
block and callingpass
to drop it.- Parameters:
exception – Any
Exception
instance.expected_http_error – Single or list of specific pyramid HTTPError to handle and ignore.
- Raises:
exception – If it doesn’t match the status code or is not an HTTPError of any module.
- weaver.utils.raise_on_xml_exception(xml_node: weaver.xml_util.XML) NoReturn | None [source]
Raises an exception with the description if the XML response document defines an ExceptionReport.
- Parameters:
xml_node – instance of
XML
- Raises:
Exception – on found ExceptionReport document.
- weaver.utils.str2bytes(string: AnyStr) bytes [source]
Obtains the bytes representation of the string.
- weaver.utils.bytes2str(string: AnyStr) str [source]
Obtains the unicode representation of the string.
- weaver.utils.data2str(data: weaver.typedefs.AnyValueType | io.IOBase) str [source]
Converts literal data to a plain string representation.
- weaver.utils.get_path_kvp(path: str, sep: str = ',', **params: weaver.typedefs.AnyValueType | Sequence[weaver.typedefs.AnyValueType]) str [source]
Generates the URL with Key-Value-Pairs (KVP) query parameters.
- Parameters:
path – WPS URL or Path
sep – separator to employ when multiple values are provided.
params – keyword parameters and their corresponding single or multi values to generate KVP.
- Returns:
combined path and query parameters as KVP.
- weaver.utils.get_log_date_fmt() str [source]
Logging date format employed for job output reporting.
- weaver.utils.get_log_monitor_msg(job_id: str, status: str, percent: weaver.typedefs.Number, message: str, location: str) str [source]
- weaver.utils.get_job_log_msg(status: weaver.status.Status | str, message: str, progress: weaver.typedefs.Number | None = 0, duration: str | None = None) str [source]
- weaver.utils.setup_loggers(settings: weaver.typedefs.AnySettingsContainer | None = None, level: int | str | None = None, force_stdout: bool = False, message_format: str | None = None, datetime_format: str | None = None, log_file: str | None = None) logging.Logger [source]
Update logging configuration known loggers based on application settings.
When
weaver.log_level
exists in settings, it overrides any other INI configuration logging levels. Otherwise, undefined logger levels will be set according to whichever is found first betweenweaver.log_level
, thelevel
parameter or defaultlogging.INFO
.
- weaver.utils.make_dirs(path: str, mode: int = 493, exist_ok: bool = False) None [source]
Backward compatible
make_dirs
with reduced set of default mode flags.Alternative to
os.makedirs
withexists_ok
parameter only available forpython>3.5
. Also, using a reduced set of permissions755
instead of original default777
.Note
The method employed in this function is safer then
if os.pat.exists
orif os.pat.isdir
pre-check to callingos.makedirs
as this can result in race condition (between evaluation and actual creation).
- weaver.utils.get_caller_name(skip: int = 0, base_class: bool = False, unwrap: bool = True) str [source]
Find the name of a parent caller function or method.
The name is returned with respective formats
module.class.method
ormodule.function
.Supposing the following call stack
main -> func2 -> func1 -> func0 -> get_caller_name
. Callingget_caller_name()
orget_caller_name(skip=0)
would return the full package location offunc0
because it is the function wereget_caller_name
is called from. Usingget_caller_name(skip=1)
would returnfunc1
directly (parent 1-level abovefunc0
), andfunc2
forget_caller_name(skip=2)
.- Parameters:
skip – Specifies how many levels of stack to skip for getting the caller. By default, uses
skip=0
to obtain the immediate function that calledget_caller_name()
.base_class – Specified if the base class should be returned or the top-most class in case of inheritance If the caller is not a class, this doesn’t do anything.
unwrap – If the caller matching the
skip
position is detected to be a function decorated byfunctools.wraps()
, its parent function will be returned instead to reflect the function that was decorated rather than the decorator itself.
- Returns:
An empty string if skipped levels exceed stack height; otherwise, the requested caller name.
- weaver.utils.setup_cache(settings: weaver.typedefs.SettingsType, reset: bool = True) None [source]
Prepares the settings with default caching options.
- weaver.utils.reset_cache(regions: List[str] | None = None) None [source]
Invalidates caches for all regions and functions decorated by
beaker.cache.cache_region()
or manually cached.- Parameters:
regions – List of specific regions to reset. Others are unmodified. If omitted, clear all caches regardless of regions.
- weaver.utils.invalidate_region(caching_args: Tuple[Callable, str, Tuple[Any]]) None [source]
Caching region invalidation with handling to ignore errors generated by of unknown regions.
- Parameters:
caching_args – tuple of
(function, region, *function-args)
representing caching key to invalidate.
- weaver.utils.get_ssl_verify_option(method: str, url: str, settings: weaver.typedefs.AnySettingsContainer, request_options: RequestOptions | None = None) bool [source]
Obtains the SSL verification option considering multiple setting definitions and the provided request context.
Obtains the SSL verification option from combined settings from
weaver.ssl_verify
and parsedweaver.request_options
file for the corresponding request.- Parameters:
method – request method (GET, POST, etc.).
url – request URL.
settings – application setting container with preloaded request options specifications.
request_options – preprocessed request options for method/URL to avoid parsing the settings again.
- Returns:
SSL
verify
option to be passed down to somerequest
function.
- weaver.utils.get_no_cache_option(request_headers: weaver.typedefs.HeadersType, **cache_options: bool | RequestOptions) bool [source]
Obtains the
No-Cache
result from request headers and configured Request Options.See also
Request.headers()
- Parameters:
request_headers – specific request headers that could indicate
Cache-Control: no-cache
.cache_options – specific request options that could define
cache[_enabled]: True|False
.
- Returns:
whether to disable cache or not
- weaver.utils.get_request_options(method: str, url: str, settings: weaver.typedefs.AnySettingsContainer) RequestOptions [source]
Obtains the Request Options corresponding to the request from the configuration file.
The configuration file specified is expected to be pre-loaded within setting
weaver.request_options
. If no file was pre-loaded or no match is found for the request, an empty options dictionary is returned.- Parameters:
method – request method (GET, POST, etc.).
url – request URL.
settings – application setting container with pre-loaded request options specifications.
- Returns:
dictionary with keyword options to be applied to the corresponding request if matched.
- weaver.utils.retry_on_condition(operation: weaver.typedefs.AnyCallableAnyArgs, *args: RetryCondition, condition: int = Exception, retries: weaver.typedefs.Number = 1, interval=0, **kwargs) weaver.typedefs.Return [source]
Retries the operation call up to the amount of specified retries if the condition is encountered.
- Parameters:
operation – Any callable lambda, function, method, class that sporadically raises an exception to catch.
condition – Exception(s) to catch or callable that takes the raised exception to handle it with more conditions. In case of a callable, success/failure result should be returned to indicate if retry is needed. If retry is not requested by the handler for the specified exception, it is raised directly.
retries – Amount of retries to perform. If retries are exhausted, the final exception is re-raised.
interval – wait time interval (seconds) between retries.
- Returns:
Expected normal operation return value if it was handled within the specified amount of retries.
- weaver.utils.retry_on_cache_error(func: weaver.typedefs.AnyCallable) weaver.typedefs.AnyCallable [source]
Decorator to handle invalid cache setup.
Any function wrapped with this decorator will retry execution once if missing cache setup was the cause of error.
- weaver.utils._request_call(method: weaver.typedefs.AnyRequestMethod, url: str, kwargs: RequestCachingKeywords) requests.Response [source]
Request operation employed by
request_extra()
without caching.
- weaver.utils._request_cached(method: weaver.typedefs.AnyRequestMethod, url: str, kwargs: RequestCachingKeywords) requests.Response [source]
Cached-enabled request operation employed by
request_extra()
.
- weaver.utils._patch_cached_request_stream(response: weaver.typedefs.AnyResponseType, stream: bool = False) None [source]
Preserves a cached copy of a streamed response contents to allow resolution when reloaded from cache.
When response contents are streamed, the resulting
Response
object does not contain the contents until the aggregated result is obtained by callingResponse.contents()
,Response.text()
orResponse.`json()
methods. If no function ends up being called to aggregate the chunks withResponse.contents()
, and instead makes use of one of the interatorResponse.iter_contents()
,Response.iter_lines()
orResponse.__iter__()
methods, the object stored in cache ends up in an invalid state where it believes contents were already consumed (cannot re-iterate), but are not available anymore to provide them on following request calls that reloads it from cache. This patches the object by caching the contents after iterating the chunks to allow them to be retrieved for future cached requests.
- weaver.utils.request_extra(method: weaver.typedefs.AnyRequestMethod, url: str, retries: int | None = None, backoff: weaver.typedefs.Number | None = None, intervals: List[weaver.typedefs.Number] | None = None, retry_after: bool = True, allowed_codes: List[int] | None = None, only_server_errors: bool = True, ssl_verify: bool | None = None, cache_request: RequestCachingFunction = _request_cached, cache_enabled: bool = True, settings: weaver.typedefs.AnySettingsContainer | None = None, **request_kwargs) weaver.typedefs.AnyResponseType [source]
Standard library
requests
with additional functional utilities.Retry operation
Implements request retry if the previous request failed, up to the specified number of retries. Using
backoff
factor, you can control the interval between request attempts such as:delay = backoff * (2 ^ retry)
Alternatively, you can explicitly define
intervals=[...]
with the list values being the number of seconds to wait between each request attempt. In this case,backoff
is ignored andretries
is overridden accordingly with the number of items specified in the list.Furthermore,
retry_after
(default:True
) indicates if HTTP status code429 (Too Many Requests)
should be automatically handled during retries. If enabled and provided in the previously failed request response through theRetry-After
header, the next request attempt will be executed only after the server-specified delay instead of following the calculated delay fromretries
andbackoff
, or from corresponding index ofinterval
, accordingly to specified parameters. This will avoid uselessly calling the server and automatically receive a denied response. You can disable this feature by passingFalse
, which will result into requests being retried blindly without consideration of the called server instruction.Because different request implementations use different parameter naming conventions, all following keywords are looked for:
Both variants of
backoff
andbackoff_factor
are accepted.All variants of
retires
,retry
andmax_retries
are accepted.
Note
Total amount of executed request attempts will be +1 the number of
retries
orintervals
items as first request is done immediately, and following attempts are done with the appropriate delay.File Transport Scheme
Any request with
file://
scheme or empty scheme (no scheme specified) will be automatically handled as potential local file path. The path should be absolute to ensure it to be correctly resolved.All access errors due to file permissions return 403 status code, and missing file returns 404. Any other
IOError
types are converted to 400 responses.See also
FileAdapter
SSL Verification
Allows SSL verify option to be enabled or disabled according to configuration settings or explicit parameters. Any variation of
verify
orssl_verify
keyword arguments are considered. If they all resolve toTrue
, then application settings are retrieved fromweaver.ini
to parse additional SSL options that could disable it.- Following
weaver
settings are considered : weaver.ssl_verify = True|False
weaver.request_options = request_options.yml
Note
Argument
settings
must also be provided through any supported container byget_settings()
to retrieve and apply anyweaver
-specific configurations.- param method:
HTTP method to set request.
- param url:
URL of the request to execute.
- param retries:
Number of request retries to attempt if first attempt failed (according to allowed codes or error).
- param backoff:
Factor by which to multiply delays between retries.
- param intervals:
Explicit intervals in seconds between retries.
- param retry_after:
If enabled, honor
Retry-After
response header of provided by a failing request attempt.- param allowed_codes:
HTTP status codes that are considered valid to stop retrying (default: any non-4xx/5xx code).
- param ssl_verify:
Explicit parameter to disable SSL verification (overrides any settings, default: True).
- param cache_request:
Decorated function with
cache_region()
to perform the request if cache was not hit.- param cache_enabled:
Whether caching must be used for this request. Disable overrides request options and headers.
- param settings:
Additional settings from which to retrieve configuration details for requests.
- param only_server_errors:
Only HTTP status codes in the 5xx values will be considered for retrying the request (default: True). This catches sporadic server timeout, connection error, etc., but 4xx errors are still considered valid results. This parameter is ignored if allowed codes are explicitly specified.
- param request_kwargs:
All other keyword arguments are passed down to the request call.
- weaver.utils.get_secure_filename(file_name: str) str [source]
Obtain a secure file name.
Preserves leading and trailing underscores contrary to
secure_filename()
.
- weaver.utils.get_secure_directory_name(location: str) str [source]
Obtain a secure directory name from a full path location.
Takes a location path and finds the first secure base name available from path. If no secure base name is found, a random UUID value will be returned.
- weaver.utils.get_secure_path(location: str) str [source]
Obtain a secure path location with validation of each nested component.
- weaver.utils.download_file_http(file_reference: str, file_outdir: str, settings: weaver.typedefs.AnySettingsContainer | None = None, callback: Callable[[str], None] | None = None, **request_kwargs: Any) str [source]
Downloads the file referenced by an HTTP URL location.
Respects RFC 2183, RFC 5987 and RFC 6266 regarding
Content-Disposition
header handling to resolve any preferred file name. This value is employed if it fulfills validation criteria. Otherwise, the name is extracted from the last part of the URL path.- Parameters:
file_reference – HTTP URL where the file is hosted.
file_outdir – Output local directory path under which to place the downloaded file.
settings – Additional request-related settings from the application configuration (notably request-options).
callback – Function that gets called progressively with incoming chunks from downloaded file. Can be used to monitor download progress or raise an exception to abort it.
request_kwargs – Additional keywords to forward to request call (if needed).
- Returns:
Path of the local copy of the fetched file.
- Raises:
HTTPException – applicable HTTP-based exception if any unrecoverable problem occurred during fetch request.
ValueError – when resulting file name value is considered invalid.
- weaver.utils.validate_s3(*, region: Any, bucket: str) None [source]
Validate patterns and allowed values for AWS S3 client configuration.
- weaver.utils.resolve_s3_from_http(reference: str) Tuple[str, mypy_boto3_s3.literals.RegionName] [source]
Resolve an HTTP URL reference pointing to an S3 Bucket into the shorthand URL notation with S3 scheme.
The expected reference should be formatted with one of the following supported formats.
# Path-style URL https://s3.{Region}.amazonaws.com/{Bucket}/[{dirs}/][{file-key}] # Virtual-hosted–style URL https://{Bucket}.s3.{Region}.amazonaws.com/[{dirs}/][{file-key}] # Access-Point-style URL https://{AccessPointName}-{AccountId}.s3-accesspoint.{Region}.amazonaws.com/[{dirs}/][{file-key}] # Outposts-style URL https://{AccessPointName}-{AccountId}.{outpostID}.s3-outposts.{Region}.amazonaws.com/[{dirs}/][{file-key}]
See also
References on formats:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html
https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-access-points.html
https://docs.aws.amazon.com/AmazonS3/latest/userguide/S3onOutposts.html
See also
References on resolution:
- Parameters:
reference – HTTP-S3 URL reference.
- Returns:
Updated S3 reference and applicable S3 Region name.
- weaver.utils.resolve_s3_reference(s3_reference: str) Tuple[str, str, mypy_boto3_s3.literals.RegionName | None] [source]
Resolve a reference of S3 scheme into the appropriate formats expected by
boto3
.- Parameters:
s3_reference – Reference with
s3://
scheme with an ARN or literal Bucket/Object path.- Returns:
Tuple of resolved Bucket name, Object path and S3 Region.
- weaver.utils.resolve_s3_http_options(**request_kwargs: Any) Dict[str, botocore.config.Config | weaver.typedefs.JSON] [source]
Converts HTTP requests options to corresponding S3 configuration definitions.
Resolved parameters will only preserve valid options that can be passed directly to
botocore.client.S3
when initialized withboto3.client()
in combination with"s3"
service. Valid HTTP requests options that have been resolved will be nested underconfig
with aS3Config
where applicable.- Parameters:
request_kwargs – Request keywords to attempt mapping to S3 configuration.
- Returns:
Resolved S3 client parameters.
- weaver.utils.resolve_scheme_options(**kwargs: Any) Tuple[SchemeOptions, RequestOptions] [source]
Splits options into their relevant group by scheme prefix.
Handled schemes are defined by
SUPPORTED_FILE_SCHEMES
. HTTP and HTTPS are grouped together and share the same options.- Parameters:
kwargs – Keywords to categorise by scheme.
- Returns:
Categorised options by scheme and all other remaining keywords.
- class weaver.utils.OutputMethod[source]
Methodology employed to handle generation of a file or directory output that was fetched.
- weaver.utils.fetch_file(file_reference: str, file_outdir: str, *, out_method: OutputMethod = OutputMethod.AUTO, settings: weaver.typedefs.AnySettingsContainer | None = None, callback: Callable[[str], None] | None = None, **option_kwargs) weaver.typedefs.Path [source]
Fetches a file from local path, AWS-S3 bucket or remote URL, and dumps its content to the output directory.
The output directory is expected to exist prior to this function call. The file reference scheme (protocol) determines from where to fetch the content. Output file name and extension will be the same as the original (after link resolution if applicable). Requests will consider
weaver.request_options
when usinghttp(s)://
scheme.- Parameters:
file_reference – Local filesystem path (optionally prefixed with
file://
),s3://
bucket location orhttp(s)://
remote URL file reference. Referencehttps://s3.[...]
are also considered ass3://
.file_outdir – Output local directory path under which to place the fetched file.
settings – Additional request-related settings from the application configuration (notably request-options).
callback – Function that gets called progressively with incoming chunks from downloaded file. Only applicable when download occurs (remote file reference). Can be used to monitor download progress or raise an exception to abort it.
out_method – Method employed to handle the generation of the output file. Only applicable when the file reference is local. Remote location always generates a local copy.
option_kwargs – Additional keywords to forward to the relevant handling method by scheme. Keywords should be defined as
{scheme}_{option}
with one of the knownSUPPORTED_FILE_SCHEMES
. If not prefixed by any scheme, the option will apply to all handling methods (if applicable).
- Returns:
Path of the local copy of the fetched file.
- Raises:
HTTPException – applicable HTTP-based exception if any occurred during the operation.
ValueError – when the reference scheme cannot be identified.
- weaver.utils.adjust_file_local(file_reference: str, file_outdir: str, out_method: OutputMethod) weaver.typedefs.Path [source]
Adjusts the input file reference to the output location with the requested handling method.
Handling Methods
-
Force generation of a symbolic link instead of hard copy, regardless if source is directly a file or a link to one.
-
Force hard copy of the file to destination, regardless if source is directly a file or a link to one.
-
Move the local file to the output directory instead of copying or linking it. If the output directory already contains the local file, raises an
OSError
. OutputMethod.AUTO
(default):Resolve conditionally as follows.
When the source is a symbolic link itself, the destination will also be a link.
When the source is a direct file reference, the destination will be a hard copy of the file.
- param file_reference:
Original location of the file.
- param file_outdir:
Target directory of the file.
- param out_method:
Method employed to handle the generation of the output.
- returns:
Output file location or metadata.
-
- weaver.utils.filter_directory_forbidden(listing: Iterable[FilterType], key: Callable[[Ellipsis], str] | None = None) Iterator[FilterType] [source]
Filters out items that should always be removed from directory listing results.
- class weaver.utils.PathMatchingMethod[source]
Utility
enum.Enum
methods.Create an extended enum with these utilities as follows.
class CustomEnum(ExtendedEnum): ItemA = "A" ItemB = "B"
Warning
Must not define any enum value here to allow inheritance by subclasses.
- weaver.utils.filter_directory_patterns(listing: Iterable[FilterType], include: Iterable[str] | None, exclude: Iterable[str] | None, matcher: PathMatchingMethod, key: Callable[[Ellipsis], str] | None = None) List[FilterType] [source]
Filters a list of files according to a set of include/exclude patterns.
If a file is matched against an include pattern, it will take precedence over matches on exclude patterns. By default, any file that is not matched by an excluded pattern will remain in the resulting filtered set. Include patterns are only intended to “add back” previously excluded matches. They are NOT for defining “only desired items”. Adding include patterns without exclude patterns is redundant, as all files would be retained by default anyway.
Patterns can use regular expression definitions or Unix shell-style wildcards. The
matcher
should be selected accordingly to provided patterns matching method. Potential functions arere.match()
,re.fullmatch()
,fnmatch.fnmatch()
,fnmatch.fnmatchcase()
Literal strings for exact matches are also valid.Note
Provided patterns are applied directly without modifications. If the file listing contains different root directories than patterns, such as if patterns are specified with relative paths, obtained results could mismatch the intended behavior. Make sure to align paths accordingly for the expected filtering context.
- Parameters:
listing – Files to filter.
include – Any matching patterns for files that should be explicitly included.
exclude – Any matching patterns for files that should be excluded unless included.
matcher – Pattern matching method to evaluate if a file path matches include and exclude definitions.
key – Function to retrieve the file key (path) from objects containing it to be filtered.
- Returns:
Filtered files.
- weaver.utils.fetch_files_s3(location: str, out_dir: weaver.typedefs.Path, out_method: AnyMetadataOutputMethod, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB, settings: weaver.typedefs.SettingsType | None = None, **option_kwargs) List[MetadataResult] [source]
- weaver.utils.fetch_files_s3(location: str, out_dir: weaver.typedefs.Path, out_method: AnyDownloadOutputMethod, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB, settings: weaver.typedefs.SettingsType | None = None, **option_kwargs) List[DownloadResult]
Download all listed S3 files references under the output directory using the provided S3 bucket and client.
If nested directories are employed in the file paths, they will be downloaded with the same directory hierarchy under the requested output directory.
See also
Filtering is subject to
filter_directory_patterns()
andfilter_directory_forbidden()
.- Parameters:
location – S3 bucket location (with
s3://
scheme) targeted to retrieve files.out_dir – Desired output location of downloaded files.
out_method – Method employed to handle the generation of the output.
include – Any matching patterns for files that should be explicitly included.
exclude – Any matching patterns for files that should be excluded unless included.
matcher – Pattern matching method to evaluate if a file path matches include and exclude definitions.
settings – Additional request-related settings from the application configuration (notably request-options).
option_kwargs – Additional keywords to forward to the relevant handling method by scheme. Keywords should be defined as
{scheme}_{option}
with one of the knownSUPPORTED_FILE_SCHEMES
. If not prefixed by any scheme, the option will apply to all handling methods (if applicable).
- Returns:
Output locations of downloaded files.
- weaver.utils.fetch_files_url(file_references: Iterable[str], out_dir: weaver.typedefs.Path, out_method: AnyOutputMethod, base_url: str, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB, settings: weaver.typedefs.SettingsType | None = None, **option_kwargs) Iterator[AnyOutputResult] [source]
Download all listed files references under the output directory.
If nested directories are employed in file paths, relative to
base_url
, they will be downloaded with the same directory hierarchy under the requested output directory. If thebase_url
differs, they will simply be downloaded at the root of the output directory. If any conflict occurs in such case, anOSError
will be raised.See also
Use
download_files_s3()
instead if all files share the same S3 bucket.- Parameters:
file_references – Relative or full URL paths of the files to download.
out_dir – Desired output location of downloaded files.
out_method – Method employed to handle the generation of the output.
base_url – If full URL are specified, corresponding files will be retrieved using the appropriate scheme per file allowing flexible data sources. Otherwise, any relative locations use this base URL to resolve the full URL prior to downloading the file.
include – Any matching patterns for files that should be explicitly included.
exclude – Any matching patterns for files that should be excluded unless included.
matcher – Pattern matching method to evaluate if a file path matches include and exclude definitions.
settings – Additional request-related settings from the application configuration (notably request-options).
option_kwargs – Additional keywords to forward to the relevant handling method by scheme. Keywords should be defined as
{scheme}_{option}
with one of the knownSUPPORTED_FILE_SCHEMES
. If not prefixed by any scheme, the option will apply to all handling methods (if applicable).
- Returns:
Output locations of downloaded files.
- weaver.utils.fetch_files_html(html_data: str, out_dir: weaver.typedefs.Path, out_method: AnyMetadataOutputMethod, base_url: str, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB, settings: weaver.typedefs.AnySettingsContainer | None = None, **option_kwargs) Iterator[MetadataResult] [source]
- weaver.utils.fetch_files_html(html_data: str, out_dir: weaver.typedefs.Path, out_method: AnyDownloadOutputMethod, base_url: str, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB, settings: weaver.typedefs.AnySettingsContainer | None = None, **option_kwargs) Iterator[DownloadResult]
Retrieves files from a directory listing provided as an index of plain HTML with file references.
If the index itself provides directories that can be browsed down, the tree hierarchy will be downloaded recursively by following links. In such case, links are ignored if they cannot be resolved as a nested index pages.
Retrieval of file references from directory listing attempts to be as flexible as possible to the HTML response format, by ignoring style tags and looking only for
<a href=""/>
references. Examples of different supported format representations are presented at following locations:https://anaconda.org/anaconda/python/files (raw listing with text code style and minimal file metadata)
https://mirrors.edge.kernel.org/pub/ (listing within a formatted table with multiple other metadata fields)
See also
- Parameters:
html_data – HTML data contents with files references to download.
out_dir – Desired output location of retrieved files.
out_method – Method employed to handle the generation of the output.
base_url – If full URL are specified, corresponding files will be retrieved using the appropriate scheme per file allowing flexible data sources. Otherwise, any relative locations use this base URL to resolve the full URL prior to downloading the file.
include – Any matching patterns for files that should be explicitly included.
exclude – Any matching patterns for files that should be excluded unless included.
matcher – Pattern matching method to evaluate if a file path matches include and exclude definitions.
settings – Additional request-related settings from the application configuration (notably request-options).
option_kwargs – Additional keywords to forward to the relevant handling method by scheme. Keywords should be defined as
{scheme}_{option}
with one of the knownSUPPORTED_FILE_SCHEMES
. If not prefixed by any scheme, the option will apply to all handling methods (if applicable).
- Returns:
Output locations of downloaded files.
- weaver.utils.adjust_directory_local(location: weaver.typedefs.Path, out_dir: weaver.typedefs.Path, out_method: AnyMetadataOutputMethod, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB) List[MetadataResult] [source]
- weaver.utils.adjust_directory_local(location: weaver.typedefs.Path, out_dir: weaver.typedefs.Path, out_method: AnyDownloadOutputMethod, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB) List[DownloadResult]
Adjusts the input directory reference to the output location with the requested handling method.
Handling Methods
Source location is the output directory:
If the source location is exactly the same location as the output (after link resolution), nothing is applied, unless filtered listing produces a different set of files. In that case, files to be excluded will be removed from the file system. In other situations, below handling methods are considered.
-
Force generation of the output directory as a symbolic link pointing to the original location, without any copy, regardless if the source location is directly a directory or a link to one. Not applicable if filtered listing does not match exactly the original source location listing. In such case, resolution will use the second
OutputMethod.AUTO
handling approach instead. -
Force hard copy of the directory to the destination, and hard copy of all its underlying contents by resolving any symbolic link along the way, regardless if the source location is directly a directory or a link to one.
-
Move the local directory’s contents under the output directory instead of copying or linking it. If the output directory already contains anything, raises an
OSError
. If exclusion filters yield any item to be omitted, those items will be deleted entirely from the file system. OutputMethod.AUTO
(default):Resolve conditionally as follows.
When the source is a symbolic link itself, the destination will be a link to it (handled as
OutputMethod.LINK
), unless its restriction regarding filtered listing applies. In that case, switches to the other handling method below.When the source is a direct directory reference (or a link with differing listing after filter), the destination will be a recursive copy of the source directory, but any encountered links will remain links instead of resolving them and creating a copy (as accomplished by
OutputMethod.COPY
).
See also
- param location:
Local reference to the source directory.
- param out_dir:
Local reference to the output directory.
- param out_method:
Method employed to handle the generation of the output.
- param include:
Any matching patterns for files that should be explicitly included.
- param exclude:
Any matching patterns for files that should be excluded unless included.
- param matcher:
Pattern matching method to evaluate if a file path matches include and exclude definitions.
- returns:
Listing of files after resolution and filtering if applicable.
- weaver.utils.list_directory_recursive(directory: weaver.typedefs.Path, relative: bool = False) Iterator[weaver.typedefs.Path] [source]
Obtain a flat list of files recursively contained within a local directory.
- weaver.utils.fetch_directory(location: str, out_dir: weaver.typedefs.Path, *, out_method: AnyMetadataOutputMethod = OutputMethod.AUTO, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB, settings: weaver.typedefs.AnySettingsContainer | None = None, **option_kwargs) List[MetadataResult] [source]
- weaver.utils.fetch_directory(location: str, out_dir: weaver.typedefs.Path, *, out_method: AnyDownloadOutputMethod = OutputMethod.AUTO, include: List[str] | None = None, exclude: List[str] | None = None, matcher: PathMatchingMethod = PathMatchingMethod.GLOB, settings: weaver.typedefs.AnySettingsContainer | None = None, **option_kwargs) List[DownloadResult]
Fetches all files that can be listed from a directory in local or remote location.
See also
Note
When using include/exclude filters, items that do not match a valid entry from the real listing are ignored. Special directories such as
..
and.
for navigation purpose are always excluded regardless of filters.- Parameters:
location – Directory reference (URL, S3, local). Trailing slash required.
out_dir – Output local directory path under which to place fetched files.
out_method – Method employed to handle the generation of the output directory. Only applicable when the file reference is local. Remote location always generates a local copy.
include – Any matching patterns for files that should be explicitly included.
exclude – Any matching patterns for files that should be excluded unless included.
matcher – Pattern matching method to evaluate if a file path matches include and exclude definitions.
settings – Additional request-related settings from the application configuration (notably request-options).
option_kwargs – Additional keywords to forward to the relevant handling method by scheme. Keywords should be defined as
{scheme}_{option}
with one of the knownSUPPORTED_FILE_SCHEMES
. If not prefixed by any scheme, the option will apply to all handling methods (if applicable).
- Returns:
File locations retrieved from directory listing.
- weaver.utils.fetch_reference(reference: str, out_dir: weaver.typedefs.Path, *, out_listing: weaver.typedefs.Literal[False] = False, out_method: OutputMethod = OutputMethod.AUTO, settings: weaver.typedefs.AnySettingsContainer | None = None, **option_kwargs) str [source]
- weaver.utils.fetch_reference(reference: str, out_dir: weaver.typedefs.Path, *, out_listing: weaver.typedefs.Literal[True] = False, out_method: OutputMethod = OutputMethod.AUTO, settings: weaver.typedefs.AnySettingsContainer | None = None, **option_kwargs) List[str]
Fetches the single file or nested directory files from a local or remote location.
The appropriate method depends on the format of the location. If conditions from Directory Type are met, the reference will be considered a
Directory
. In every other situation, a singleFile
reference will be considered.See also
See the relevant handling methods below for other optional arguments.
- Parameters:
reference – Local filesystem path (optionally prefixed with
file://
),s3://
bucket location orhttp(s)://
remote URL file or directory reference. Referencehttps://s3.[...]
are also considered ass3://
.out_dir – Output local directory path under which to place the fetched file or directory.
out_listing – Request that the complete file listing of the directory reference is returned. Otherwise, return the local directory reference itself. In the event of a file reference as input, the returned path will always be the fetched file itself, but it will be contained within a single-item list if listing was
True
for consistency in the returned type with the corresponding call for a directory reference.settings – Additional request-related settings from the application configuration (notably request-options).
out_method – Method employed to handle the generation of the output file or directory. Only applicable when the reference is local. Remote location always generates a local copy.
option_kwargs – Additional keywords to forward to the relevant handling method by scheme. Keywords should be defined as
{scheme}_{option}
with one of the knownSUPPORTED_FILE_SCHEMES
. If not prefixed by any scheme, the option will apply to all handling methods (if applicable).
- Returns:
Path of the local copy of the fetched file, the directory, or the listing of the directory files.
- Raises:
HTTPException – applicable HTTP-based exception if any occurred during the operation.
ValueError – when the reference scheme cannot be identified.
- class weaver.utils.SizedUrlHandler(value, ref)[source]
Avoids an unnecessary request to obtain the content size if the expected file is already available locally.
- weaver.utils.create_metalink(files: List[FileLink], version: weaver.typedefs.Literal[3, 4] = 4, name: str | None = None, workdir: weaver.typedefs.Path | None = None) AnyMetalink [source]
Generates a MetaLink definition with provided link references.
If the link includes a local
file
path, or when thehref
itself is a local path, the IO handler will employ those references to avoid the usual behavior performed bypywps
that auto-fetches the remote file. To retain that behavior, simply make sure thathref
is a remote file and thatpath
is unset or does not exist.- Parameters:
files – File link, and optionally, with additional name, local path, media-type and encoding.
version – Desired metalink content as defined by the corresponding version.
name – Global name identifier for the metalink file.
workdir – Location where to store files when auto-fetching them.
- Returns:
Metalink object with appropriate template generation utilities.
Note
It is always preferable to use MetaLink V4 over V3 as it adds support for
mediaType
which can be critical for validating and/or mapping output formats in some cases. V3 also enforces “type=http” in thepywps
XML template, which is erroneous when other schemes such asfile://
ors3://
are employed.Warning
Regardless of MetaLink V3 or V4,
encoding
are not reported. This is a limitation of MetaLink specification itself.
- weaver.utils.load_file(file_path: weaver.typedefs.Path, text: bool = False) weaver.typedefs.JSON | str [source]
Load JSON or YAML file contents from local path or remote URL.
If URL, get the content and validate it by loading, otherwise load file directly.
- Parameters:
- Returns:
loaded contents either parsed and converted to Python objects or as plain text.
- Raises:
ValueError – if YAML or JSON cannot be parsed or loaded from location.
- weaver.utils.is_remote_file(file_location: str) typing_extensions.TypeGuard[str] [source]
Parses to file location to figure out if it is remotely available or a local path.
- weaver.utils.get_sane_name(name: str, min_len: int | None = 3, max_len: int | None | None = None, assert_invalid: bool | None = True, replace_character: str = '_') str | None [source]
Cleans up the name to allow only specified characters and conditions.
Returns a cleaned-up version of the
name
, replacing invalid characters not matched withREGEX_SEARCH_INVALID_CHARACTERS
byreplace_character
. Also, ensure that the resulting name respects specified length conditions.- Parameters:
name – Value to clean.
min_len – Minimal length of
name`
to be respected, raises or returnsNone
on fail according toassert_invalid
.max_len – Maximum length of
name
to be respected, raises or returns trimmedname
on fail according toassert_invalid
. IfNone
, condition is ignored for assertion or fullname
is returned respectively.assert_invalid – If
True
, fail conditions or invalid characters will raise an error instead of replacing.replace_character – Single character to use for replacement of invalid ones if
assert_invalid
isFalse
.
- weaver.utils.assert_sane_name(name: str, min_len: int = 3, max_len: int | None = None) None [source]
Asserts that the sane name respects conditions.
See also
argument details in
get_sane_name()
- weaver.utils.clean_json_text_body(body: str, remove_newlines: bool = True, remove_indents: bool = True, convert_quotes: bool = True) str [source]
Cleans a textual body field of superfluous characters to provide a better human-readable text in a JSON response.
- weaver.utils.transform_json(json_data: Dict[str, weaver.typedefs.JSON], rename: Dict[weaver.typedefs.AnyKey, Any] | None = None, remove: List[weaver.typedefs.AnyKey] | None = None, add: Dict[weaver.typedefs.AnyKey, Any] | None = None, extend: Dict[weaver.typedefs.AnyKey, Any] | None = None, replace_values: Dict[weaver.typedefs.AnyKey, Any] | None = None, replace_func: Dict[weaver.typedefs.AnyKey, Callable[[Any], Any]] | None = None) Dict[str, weaver.typedefs.JSON] [source]
Transforms the input JSON with different methods.
The transformations are applied in-place and in the same order as the arguments (rename, remove, add, etc.). All operations are applied onto the top-level fields of the mapping. No nested operations are applied, unless handled by replace functions.
Note
Because fields and values are iterated over the provided mappings, replacements of previous iterations could be re-replaced by following ones if the renamed item corresponds to a following item to match. For example, renaming
field1 -> field2
andfield2 -> field3` within the same operation type would result in successive replacements with ``field3
as result. The parameter order is important in this case as swapping the definitions would not findfield2
on the first iteration (not in mapping yet), and then findfield1
, making the result to befield2
.- Parameters:
json_data – JSON mapping structure to transform.
rename – rename matched fields key name to the associated value name.
remove – remove matched fields by name.
add – add or override the fields names with associated values.
extend – add or extend the fields names with associated values.
replace_values – replace matched values by the associated new values regardless of field names.
replace_func – Replace values under matched fields by name with the returned value from the associated function. Mapping functions will receive the original value as input. If the result is to be serialized to JSON, they should return a valid JSON-serializable value.
- Returns:
transformed JSON (same as modified in-place input JSON).
- weaver.utils.generate_diff(val: Any, ref: Any, val_name: str = 'Test', ref_name: str = 'Reference', val_show: bool = False, ref_show: bool = False, json: bool = True, indent: int | None = 2) str [source]
Generates a line-by-line diff result of the test value against the reference value.
Attempts to parse the contents as JSON to provide better diff of matched/sorted lines, and falls back to plain line-based string representations otherwise.
- Parameters:
val – Test input value.
ref – Reference input value.
val_name – Name to apply in diff for test input value.
ref_name – Name to apply in diff for reference input value.
val_show – Whether to include full contents of test value.
ref_show – Whether to include full contents of reference value.
json – Whether to consider contents as JSON for diff evaluation.
indent – Indentation to employ when using JSON contents.
- Returns:
Formatted multiline diff,
- weaver.utils.apply_number_with_unit(number: weaver.typedefs.Number, unit: str = '', binary: bool = False, decimals: int = 3) str [source]
Apply the relevant unit and prefix factor to the specified number to create a human-readable value.
- Parameters:
number – Numeric value with no unit.
unit – Unit to be applied. Auto-resolved to ‘B’ if binary requested. Factor applied accordingly to number.
binary – Use binary multiplier (powers of 2) instead of SI decimal multipliers (powers of 10).
decimals – Number of decimals to preserve after unit is applied.
- Returns:
String of the numeric value with appropriate unit.
- weaver.utils.parse_number_with_unit(number: str, binary: bool | None = None) weaver.typedefs.Number [source]
Parses a numeric value accompanied by a unit to generate the unit-less value without prefix factor.
- Parameters:
number – Numerical value and unit. Unit is dissociated from value with first non-numerical match. Unit is assumed to be present (not only the multiplier by itself). This is important to avoid confusion (e.g.:
m
used for meters vsm
prefix for “milli”).binary – Force use (
True
) or non-use (False
) of binary multiplier (powers of 2) instead of SI decimal multipliers (powers of 10) for converting value (with applicable unit multiplier if available). If unspecified (None
), auto-detect from unit (e.g.: powers of 2 forMiB
, powers of 10 forMB
). When unspecified, theB
character is used to auto-detect if binary should apply, SI multipliers are otherwise assumed.
- Returns:
Literal value.