satorbis_kit.vector_operation package¶

Vector Operations Module

This module contains Wherobots Cloud operations for vector data processing.

satorbis_kit.vector_operation.buffer_points_wherobots(input_path: str, output_path: str, buffer_distance: float, distance_unit: str = 'meter', api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'buffer-points') → dict[source]¶

Submit a buffer generation job to Wherobots Cloud.

Parameters:

input_path – Input GeoParquet file path (must contain point geometries)
output_path – Output GeoParquet file path
buffer_distance – Buffer distance in the specified unit
distance_unit – Unit for buffer distance (meter, kilometer, foot, mile)
api_key – Wherobots API key (optional, uses hardcoded default if None)
region – Wherobots region (optional, uses hardcoded default if None)
script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)
runtime – Runtime size (optional, defaults to “tiny”)
timeout_seconds – Job timeout (optional, defaults to 3600)
job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = buffer_points_wherobots(
...     input_path="s3://bucket/points.parquet",
...     output_path="s3://bucket/buffered.parquet",
...     buffer_distance=1000,
...     distance_unit="meter",
... )

satorbis_kit.vector_operation.cancel_job(api_key: str, run_id: str) → None[source]¶

Cancel a Wherobots job run.

Parameters:

api_key – Wherobots API key
run_id – Job run ID

Raises:

ImportError – If requests library is not available
requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.dissolve_simplify_wherobots(input_path: str, output_path: str, dissolve_by: str | None = None, simplify_tolerance: float | None = None, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'dissolve-simplify') → dict[source]¶

Submit a dissolve and/or simplify job to Wherobots Cloud.

Parameters:

input_path – Input GeoParquet file path
output_path – Output GeoParquet file path
dissolve_by – Column name to dissolve by (optional)
simplify_tolerance – Simplify tolerance (optional, Douglas-Peucker algorithm)
api_key – Wherobots API key (optional, uses hardcoded default if None)
region – Wherobots region (optional, uses hardcoded default if None)
script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)
runtime – Runtime size (optional, defaults to “tiny”)
timeout_seconds – Job timeout (optional, defaults to 3600)
job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = dissolve_simplify_wherobots(
...     input_path="s3://bucket/input.parquet",
...     output_path="s3://bucket/output.parquet",
...     dissolve_by="region_name",
...     simplify_tolerance=0.001,
... )

satorbis_kit.vector_operation.geojson_to_geoparquet_wherobots(input_paths: List[str], output_path: str, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'geojson-to-geoparquet') → dict[source]¶

Submit a GeoJSON to GeoParquet conversion job to Wherobots Cloud.

Parameters:

input_paths – List of input GeoJSON file paths (can include wildcards)
output_path – Output GeoParquet file path
api_key – Wherobots API key (optional, uses hardcoded default if None)
region – Wherobots region (optional, uses hardcoded default if None)
script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)
runtime – Runtime size (optional, defaults to “tiny”)
timeout_seconds – Job timeout (optional, defaults to 3600)
job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = geojson_to_geoparquet_wherobots(
...     input_paths=["s3://bucket/input.geojson"],
...     output_path="s3://bucket/output.parquet",
... )

satorbis_kit.vector_operation.get_job_logs(api_key: str, run_id: str, cursor: int = 0, size: int = 100) → dict[source]¶

Get logs for a Wherobots job run.

Parameters:

api_key – Wherobots API key
run_id – Job run ID
cursor – Pagination cursor
size – Number of log entries to fetch

Returns:

Logs dictionary with items, current_page, and next_page

Raises:

ImportError – If requests library is not available
requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.get_job_status(api_key: str, run_id: str) → dict[source]¶

Get the status of a Wherobots job run.

Parameters:

api_key – Wherobots API key
run_id – Job run ID

Returns:

Job run details dictionary

Raises:

ImportError – If requests library is not available
requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.list_jobs(api_key: str, region: str | None = None, status: List[str] | None = None, name: str | None = None, size: int = 50) → dict[source]¶

List Wherobots job runs.

Parameters:

api_key – Wherobots API key
region – Filter by region (optional)
status – Filter by status list (optional)
name – Filter by name pattern (optional)
size – Number of results per page

Returns:

Dictionary with items list and pagination info

Raises:

ImportError – If requests library is not available
requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.merge_vectors_wherobots(input_base_paths: List[str], output_base_path: str, vector_types: List[str] | None = None, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'vector-merge') → dict[source]¶

Submit a vector merge job to Wherobots Cloud with simplified interface.

This function abstracts away API configuration details. Users only need to provide input/output paths.

Parameters:

input_base_paths – List of base path patterns for input data (e.g., [“s3://bucket/path1///”, “s3://bucket/path2///”])
output_base_path – Base output path (e.g., “s3://bucket/output/”)
vector_types – List of vector types to process. If None, processes all types. Options: [“building”, “habitation”, “imaged_area”, etc.] or None for all
api_key – Wherobots API key. If None, uses hardcoded default.
region – Wherobots region. If None, uses hardcoded default.
script_base_uri – Base URI where merge scripts are stored. If None, uses hardcoded default.
runtime – Runtime size. Default: “large”
timeout_seconds – Job timeout in seconds. Default: 14400 (4 hours)
job_name_prefix – Prefix for job name. Default: “vector-merge”

Returns:

Dictionary with job submission result, including ‘id’ (run_id)

Example

>>> result = merge_vectors_wherobots(
...     input_base_paths=[
...         "s3://bucket/QC_PASSED/matched/*/*/*",
...         "s3://bucket/QC_PASSED/unmatched/*/*/*",
...     ],
...     output_base_path="s3://bucket/merged/",
... )
>>> run_id = result["id"]

satorbis_kit.vector_operation.merge_vectors_wherobots_simple(input_paths: List[str], output_path: str, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'vector-merge') → dict[source]¶

Submit a vector merge job to Wherobots Cloud with any list of input paths.

This function accepts any list of file paths (including regex/wildcards) and merges them.

Parameters:

input_paths – List of input file paths/patterns (can include wildcards/regex) (e.g., [“s3://bucket/path1/.parquet”, “s3://bucket/path2//*.parquet”])
output_path – Full output path where merged result will be saved
api_key – Wherobots API key (optional, uses hardcoded default if None)
region – Wherobots region (optional, uses hardcoded default if None)
script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)
runtime – Runtime size (optional, defaults to “medium”)
timeout_seconds – Job timeout (optional, defaults to 7200)
job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = merge_vectors_wherobots_simple(
...     input_paths=[
...         "s3://bucket/path1/*/*/*_building.parquet",
...         "s3://bucket/path2/*/*/*_building.parquet",
...     ],
...     output_path="s3://bucket/merged/building_footprint_polygon/",
... )

satorbis_kit.vector_operation.submit_job(api_key: str, region: str, script_uri: str, script_args: List[str] | None = None, runtime: str = 'tiny', name: str = 'wherobots-job', version: str = 'latest', timeout_seconds: int = 3600, dependencies: List[dict] | None = None, spark_configs: dict | None = None) → dict[source]¶

Submit a Python job to Wherobots Cloud.

Parameters:

api_key – Wherobots API key
region – Compute region (e.g., ‘aws-ap-south-1’, ‘aws-us-west-2’)
script_uri – S3 URI to the Python script
script_args – List of command-line arguments for the script
runtime – Runtime size (tiny, small, medium, large, etc.)
name – Job run name
version – Wherobots version (‘latest’ or ‘preview’)
timeout_seconds – Job timeout in seconds
dependencies – List of dependency objects (PyPI or FILE)
spark_configs – Dictionary of Spark configuration key-value pairs

Returns:

Response dictionary from Wherobots API

Raises:

ImportError – If requests library is not available
requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.vector_data_ingestion(s3_path: str, database: str, table: str, partition_column: str | None, unique_columns: List[str], region: str | None = None, column_renames: List[str] | None = None, zorder_columns: List[str] | None = None, format_version: str = '3', geohash_precision: int = 2, wait_for_completion: bool = False, poll_interval: int = 20, log_page_size: int = 200, job_name_prefix: str = 'vector-data-ingestion') → dict[source]¶

Submit a vector data ingestion job to Wherobots Cloud.

Parameters:

s3_path – S3 prefix containing shapefile components.
database – Destination database name. Must be vector_catalog.
table – Destination table name within vector_catalog.
partition_column – Column to partition by. If None, geohash is used.
unique_columns – Columns used as the MERGE key.
region – Wherobots region override (defaults to configured region).
column_renames – Optional column renames in key=value format.
zorder_columns – Optional columns for Z-order rewrite.
format_version – Iceberg table format version.
geohash_precision – Precision for geohash partitioning.
wait_for_completion – If True, stream logs and wait for completion.
poll_interval – Poll interval in seconds for status/logs.
log_page_size – Log page size per API call.
job_name_prefix – Prefix for the Wherobots job name.

Returns:

Response dictionary from the Wherobots API.

satorbis_kit.vector_operation package¶

Submodules¶