satorbis_kit.vector_operation package

Vector Operations Module

This module contains Wherobots Cloud operations for vector data processing.

satorbis_kit.vector_operation.buffer_points_wherobots(input_path: str, output_path: str, buffer_distance: float, distance_unit: str = 'meter', api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'buffer-points') dict[source]

Submit a buffer generation job to Wherobots Cloud.

Parameters:
  • input_path – Input GeoParquet file path (must contain point geometries)

  • output_path – Output GeoParquet file path

  • buffer_distance – Buffer distance in the specified unit

  • distance_unit – Unit for buffer distance (meter, kilometer, foot, mile)

  • api_key – Wherobots API key (optional, uses hardcoded default if None)

  • region – Wherobots region (optional, uses hardcoded default if None)

  • script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)

  • runtime – Runtime size (optional, defaults to “tiny”)

  • timeout_seconds – Job timeout (optional, defaults to 3600)

  • job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = buffer_points_wherobots(
...     input_path="s3://bucket/points.parquet",
...     output_path="s3://bucket/buffered.parquet",
...     buffer_distance=1000,
...     distance_unit="meter",
... )
satorbis_kit.vector_operation.cancel_job(api_key: str, run_id: str) None[source]

Cancel a Wherobots job run.

Parameters:
  • api_key – Wherobots API key

  • run_id – Job run ID

Raises:
  • ImportError – If requests library is not available

  • requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.dissolve_simplify_wherobots(input_path: str, output_path: str, dissolve_by: str | None = None, simplify_tolerance: float | None = None, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'dissolve-simplify') dict[source]

Submit a dissolve and/or simplify job to Wherobots Cloud.

Parameters:
  • input_path – Input GeoParquet file path

  • output_path – Output GeoParquet file path

  • dissolve_by – Column name to dissolve by (optional)

  • simplify_tolerance – Simplify tolerance (optional, Douglas-Peucker algorithm)

  • api_key – Wherobots API key (optional, uses hardcoded default if None)

  • region – Wherobots region (optional, uses hardcoded default if None)

  • script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)

  • runtime – Runtime size (optional, defaults to “tiny”)

  • timeout_seconds – Job timeout (optional, defaults to 3600)

  • job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = dissolve_simplify_wherobots(
...     input_path="s3://bucket/input.parquet",
...     output_path="s3://bucket/output.parquet",
...     dissolve_by="region_name",
...     simplify_tolerance=0.001,
... )
satorbis_kit.vector_operation.geojson_to_geoparquet_wherobots(input_paths: List[str], output_path: str, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'geojson-to-geoparquet') dict[source]

Submit a GeoJSON to GeoParquet conversion job to Wherobots Cloud.

Parameters:
  • input_paths – List of input GeoJSON file paths (can include wildcards)

  • output_path – Output GeoParquet file path

  • api_key – Wherobots API key (optional, uses hardcoded default if None)

  • region – Wherobots region (optional, uses hardcoded default if None)

  • script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)

  • runtime – Runtime size (optional, defaults to “tiny”)

  • timeout_seconds – Job timeout (optional, defaults to 3600)

  • job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = geojson_to_geoparquet_wherobots(
...     input_paths=["s3://bucket/input.geojson"],
...     output_path="s3://bucket/output.parquet",
... )
satorbis_kit.vector_operation.get_job_logs(api_key: str, run_id: str, cursor: int = 0, size: int = 100) dict[source]

Get logs for a Wherobots job run.

Parameters:
  • api_key – Wherobots API key

  • run_id – Job run ID

  • cursor – Pagination cursor

  • size – Number of log entries to fetch

Returns:

Logs dictionary with items, current_page, and next_page

Raises:
  • ImportError – If requests library is not available

  • requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.get_job_status(api_key: str, run_id: str) dict[source]

Get the status of a Wherobots job run.

Parameters:
  • api_key – Wherobots API key

  • run_id – Job run ID

Returns:

Job run details dictionary

Raises:
  • ImportError – If requests library is not available

  • requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.list_jobs(api_key: str, region: str | None = None, status: List[str] | None = None, name: str | None = None, size: int = 50) dict[source]

List Wherobots job runs.

Parameters:
  • api_key – Wherobots API key

  • region – Filter by region (optional)

  • status – Filter by status list (optional)

  • name – Filter by name pattern (optional)

  • size – Number of results per page

Returns:

Dictionary with items list and pagination info

Raises:
  • ImportError – If requests library is not available

  • requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.merge_vectors_wherobots(input_base_paths: List[str], output_base_path: str, vector_types: List[str] | None = None, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'vector-merge') dict[source]

Submit a vector merge job to Wherobots Cloud with simplified interface.

This function abstracts away API configuration details. Users only need to provide input/output paths.

Parameters:
  • input_base_paths – List of base path patterns for input data (e.g., [“s3://bucket/path1///”, “s3://bucket/path2///”])

  • output_base_path – Base output path (e.g., “s3://bucket/output/”)

  • vector_types – List of vector types to process. If None, processes all types. Options: [“building”, “habitation”, “imaged_area”, etc.] or None for all

  • api_key – Wherobots API key. If None, uses hardcoded default.

  • region – Wherobots region. If None, uses hardcoded default.

  • script_base_uri – Base URI where merge scripts are stored. If None, uses hardcoded default.

  • runtime – Runtime size. Default: “large”

  • timeout_seconds – Job timeout in seconds. Default: 14400 (4 hours)

  • job_name_prefix – Prefix for job name. Default: “vector-merge”

Returns:

Dictionary with job submission result, including ‘id’ (run_id)

Example

>>> result = merge_vectors_wherobots(
...     input_base_paths=[
...         "s3://bucket/QC_PASSED/matched/*/*/*",
...         "s3://bucket/QC_PASSED/unmatched/*/*/*",
...     ],
...     output_base_path="s3://bucket/merged/",
... )
>>> run_id = result["id"]
satorbis_kit.vector_operation.merge_vectors_wherobots_simple(input_paths: List[str], output_path: str, api_key: str | None = None, region: str | None = None, script_base_uri: str | None = None, runtime: str | None = None, timeout_seconds: int | None = None, job_name_prefix: str = 'vector-merge') dict[source]

Submit a vector merge job to Wherobots Cloud with any list of input paths.

This function accepts any list of file paths (including regex/wildcards) and merges them.

Parameters:
  • input_paths – List of input file paths/patterns (can include wildcards/regex) (e.g., [“s3://bucket/path1/.parquet”, “s3://bucket/path2//*.parquet”])

  • output_path – Full output path where merged result will be saved

  • api_key – Wherobots API key (optional, uses hardcoded default if None)

  • region – Wherobots region (optional, uses hardcoded default if None)

  • script_base_uri – Base URI for scripts (optional, uses hardcoded default if None)

  • runtime – Runtime size (optional, defaults to “medium”)

  • timeout_seconds – Job timeout (optional, defaults to 7200)

  • job_name_prefix – Job name prefix (optional)

Returns:

Dictionary with job submission result

Example

>>> result = merge_vectors_wherobots_simple(
...     input_paths=[
...         "s3://bucket/path1/*/*/*_building.parquet",
...         "s3://bucket/path2/*/*/*_building.parquet",
...     ],
...     output_path="s3://bucket/merged/building_footprint_polygon/",
... )
satorbis_kit.vector_operation.submit_job(api_key: str, region: str, script_uri: str, script_args: List[str] | None = None, runtime: str = 'tiny', name: str = 'wherobots-job', version: str = 'latest', timeout_seconds: int = 3600, dependencies: List[dict] | None = None, spark_configs: dict | None = None) dict[source]

Submit a Python job to Wherobots Cloud.

Parameters:
  • api_key – Wherobots API key

  • region – Compute region (e.g., ‘aws-ap-south-1’, ‘aws-us-west-2’)

  • script_uri – S3 URI to the Python script

  • script_args – List of command-line arguments for the script

  • runtime – Runtime size (tiny, small, medium, large, etc.)

  • name – Job run name

  • version – Wherobots version (‘latest’ or ‘preview’)

  • timeout_seconds – Job timeout in seconds

  • dependencies – List of dependency objects (PyPI or FILE)

  • spark_configs – Dictionary of Spark configuration key-value pairs

Returns:

Response dictionary from Wherobots API

Raises:
  • ImportError – If requests library is not available

  • requests.HTTPError – If the API request fails

satorbis_kit.vector_operation.vector_data_ingestion(s3_path: str, database: str, table: str, partition_column: str | None, unique_columns: List[str], region: str | None = None, column_renames: List[str] | None = None, zorder_columns: List[str] | None = None, format_version: str = '3', geohash_precision: int = 2, wait_for_completion: bool = False, poll_interval: int = 20, log_page_size: int = 200, job_name_prefix: str = 'vector-data-ingestion') dict[source]

Submit a vector data ingestion job to Wherobots Cloud.

Parameters:
  • s3_path – S3 prefix containing shapefile components.

  • database – Destination database name. Must be vector_catalog.

  • table – Destination table name within vector_catalog.

  • partition_column – Column to partition by. If None, geohash is used.

  • unique_columns – Columns used as the MERGE key.

  • region – Wherobots region override (defaults to configured region).

  • column_renames – Optional column renames in key=value format.

  • zorder_columns – Optional columns for Z-order rewrite.

  • format_version – Iceberg table format version.

  • geohash_precision – Precision for geohash partitioning.

  • wait_for_completion – If True, stream logs and wait for completion.

  • poll_interval – Poll interval in seconds for status/logs.

  • log_page_size – Log page size per API call.

  • job_name_prefix – Prefix for the Wherobots job name.

Returns:

Response dictionary from the Wherobots API.

Submodules