worldpoppy.raster module

This is the main module of WorldPopPy. It provides logic to fetch raster data from WorldPop through several alternative specifications for the geographic area of interest.

Main methods

  • wp_raster()

    Retrieve WorldPop data for arbitrary geographical areas and multiple years (where applicable).

  • wp_warp()

    Reproject or resample a WorldPop raster.

  • merge_rasters()

    Merge multiple raster files and optionally clip the result.

  • bbox_from_location()

    Generate a bounding box from a location name or GPS coordinate. The result can be used to specify the AoI for wp_raster.

exception worldpoppy.raster.IncompatibleRasterError

Bases: Exception

Raised when trying to merge incompatible WorldPop source rasters.

exception worldpoppy.raster.RasterReadError

Bases: Exception

Raised when reading a WorldPop source raster fails.

worldpoppy.raster.bbox_from_location(centre, width_degrees=None, width_km=None)

Construct a bounding box centered on a given geographic location.

The centre argument can be either a place name (which is geocoded using geolocate_name) or a (longitude, latitude) coordinate pair.

If width_km is specified, the bounding box is computed in a local Azimuthal Equidistant projection centered on the specified location, and then reprojected back to WGS84 longitude/latitude coordinates.

Parameters:
  • centre (str or Tuple(float, float)) – Either a human-readable location name (e.g., “Nairobi, Kenya”) or a tuple of (longitude, latitude).

  • width_degrees (float, optional) – Width/height of the bounding box in decimal degrees. Must be None if width_km is specified.

  • width_km (float, optional) – Width/height of the bounding box in kilometers. Must be None if width_degrees is specified.

Returns:

Geo-coordinates of the bounding box using the format (min_lon, min_lat, max_lon, max_lat) [WGS84].

Return type:

Tuple[float, float, float, float]

Raises:

ValueError – If either both or neither of width_degrees and width_km are specified.

worldpoppy.raster.merge_rasters(raster_fpaths, *, name=None, chunks=None, masked=True, mask_and_scale=True, other_read_kwargs=None, pre_clip_bbox=None, clipping_gdf=None, suppress_pre_clip=False)

Merge multiple rasters.

This function validates that all input rasters share the same critical metadata (CRS, FillValue, etc.) and then creates a new, synthetic set of metadata for the final merged raster.

Parameters:
  • raster_fpaths (List[Path] or List[str]) – List of paths to the input raster files that are to be merged.

  • name (str, optional) – A custom name for the returned DataArray.

  • chunks (str, int, dict or None, optional, default='auto') –

    If chunks is provided, the raster data is loaded into a Dask array for better memory management.

    • If ‘auto’ (default), Dask chooses the chunk size.

    • If int K, the data is loaded in chunks of size (K, K). Equivalent to passing {‘x’: K, ‘y’: K}.

    • If dict (e.g., {‘x’: 1024, ‘y’: 1024}), that specific chunking is used.

    • If None, data loading with Dask is disabled.

  • masked (bool, optional, default=True) – If True, read the mask of all input rasters and set masked values to NaN. This argument is passed to rioxarray.open_rasterio when reading input rasters. Note: The default here is True, unlike in rioxarray.open_rasterio.

  • mask_and_scale (bool, default=True) – Lazily scale (using the scales and offsets from rasterio) all input rasters and mask them. If the _Unsigned attribute is present treat integer arrays as unsigned. This argument is passed to rioxarray.open_rasterio when reading input rasters. Note: The default here is True, unlike in rioxarray.open_rasterio.

  • other_read_kwargs (dict, optional) – Dictionary with additional keyword arguments that are passed to rioxarray.open_rasterio when reading input rasters (e.g., lock or band_as_variable).

  • pre_clip_bbox (Tuple[float, float, float, float], optional) – An explicit bounding box to use for pre-clipping the input rasters. If provided, this overrides the default buffered pre-clipping.

  • clipping_gdf (geopandas.GeoDataFrame, optional) – GeoDataFrame with geometries used to clip the merged raster. This is used for the final precise clip.

  • suppress_pre_clip (bool, optional, default=False) – If True, disables all pre-clipping optimisations.

Returns:

The merged and optionally clipped raster.

Return type:

xarray.DataArray

Raises:
  • RasterReadError – If reading an input raster fails.

  • IncompatibleRasterError – This function validates input-raster attributes before merging. - crs is always validated. - _FillValue is validated only if masked=False and mask_and_scale=False. - scale_factor and add_offset are validated only if mask_and_scale=False. (This function thus trusts rioxarray to correctly normalise input rasters whenever mask_and_scale=True is passed, even if the underlying source files have different _FillValue, scale_factor or add_offset attributes.)

Notes

Performance Warning: This function uses xarray.combine_first iteratively to merge rasters lazily. While this preserves memory, it builds a nested Dask task graph whose depth is proportional to the number of input files.

Merging a large number of files may result in a RecursionError or significant overhead during graph construction. If you are processing a large number of raster tiles, consider merging them in smaller batches first.

worldpoppy.raster.wp_raster(product_name, aoi, years=None, *, name=None, chunks=None, pre_clip_bbox=None, cache_downloads=True, skip_download_if_exists=True, masked=True, mask_and_scale=True, other_read_kwargs=None, suppress_pre_clip=False, download_chunk_size=4194304, download_dry_run=False)

Return WorldPop data for the user-defined area of interest (AoI) and the specified years (where applicable).

Note that WorldPop organises its raster files by country. If the AoI spans multiple countries, this function will automatically merge all corresponding raster files. If multiple years are requested, the raster data is stacked along a new ‘year’ dimension.

By default, this function returns a regular xarray.DataArray. If users provide the chunks argument, a lazy-loaded Dask array is returned instead.

This function implements several optimisation techniques to minimise the memory footprint involved when working with raster data from large countries.

Parameters:
  • product_name (str) – The name of the WorldPop data product of interest.

  • aoi (str, List[str], List[float], Tuple[float], or geopandas.GeoDataFrame) –

    The area of interest (AoI) for which to obtain the raster data. Users can specify this area using:

    • one or more three-letter country codes (alpha-3 IS0 codes);

    • a GeoDataFrame with one or more polygonal geometries; or

    • a bounding box of the format (min_lon, min_lat, max_lon, max_lat).

    In the latter two cases, WorldPop data is first downloaded and merged for all countries that intersect the area of interest, regardless of how large this intersection is. Subsequently, the merged raster is then clipped using the AoI.

  • name (str, optional) – A custom name for the returned DataArray. This is useful for plotting (it appears as the title/label) or when converting to a Dataset. Defaults to product_name if not provided.

  • chunks (str, int, dict or None, optional, default=None) –

    If chunks is provided, the raster data is loaded into a Dask array for better memory management.

    • If ‘auto’, Dask chooses the chunk size.

    • If int K, the data is loaded in chunks of size (K, K). Equivalent to passing {‘x’: K, ‘y’: K}.

    • If dict (e.g., {‘x’: 1024, ‘y’: 1024}), that specific chunking is used.

  • years (int or List[Union[int, str]] or str or None, optional) –

    One or more years of interest or a keyword string. For static data products, this argument is usually None (default).

    • ’all’ Retrieve all available data for the specified product.
      • For multi-year products, this returns a 3D array stacked along the year dimension (unless only one year exists, in which case it returns a 2D array).

      • For static products, returns the single available raster.

    • ’first’: Retrieve data for the earliest available year.

    • ’last’: Retrieve data for the most recent available year.

    • List: A list containing integers and/or keywords (e.g., [2010, 'last']).

    • None: (Default) for static data products.

  • pre_clip_bbox (Tuple[float, float, float, float], optional) – A bounding box (min_lon, min_lat, max_lon, max_lat) to which input rasters will immediately be clipped after loading them from disk. This is the manual pre-clipping boundary. If provided, this overrides the automatic buffered pre-clipping mechanism (which is applied by default when users pass the AoI as either a GeoDataFrame or BBOx). Manual pre-clipping is useful when working with country-code AoIs like Chile, where remote outlying islands result in a merged raster that is largely empty, causing RAM explosions. Mutually exclusive with suppress_pre_clip.

  • cache_downloads (bool, optional, default=True) – Whether to cache downloaded source rasters.

  • skip_download_if_exists (bool, optional, default=True) – Whether to skip downloading source rasters that already exist in the local cache.

  • masked (bool, optional, default=True) – If True, read the mask of all input rasters and set masked values to NaN. This argument is passed to rioxarray.open_rasterio when reading input rasters. Note: The default here is True, unlike in rioxarray.open_rasterio.

  • mask_and_scale (bool, default=True) – Lazily scale (using the scales and offsets from rasterio) all input rasters and mask them. If the _Unsigned attribute is present treat integer arrays as unsigned. This argument is passed to rioxarray.open_rasterio when reading input rasters. Note: The default here is True, unlike in rioxarray.open_rasterio.

  • other_read_kwargs (dict, optional) – Dictionary with additional keyword arguments that are passed to rioxarray.open_rasterio when reading input rasters (e.g., lock or band_as_variable). Note that chunks passed here will be ignored in favour of the explicit chunks argument.

  • suppress_pre_clip (bool, optional, default=False) – If True, no automatic or manual pre-clipping is ever applied when loading input rasters. Mutually exclusive with pre_clip_bbox.

  • download_chunk_size (int, optional, default=4MB) – The size (in bytes) of chunks to read/write during raster downloads. The large, default chunk size aims to improve performance on systems with real-time file scanning (e.g., antivirus).

  • download_dry_run (bool, optional, default=False) – If True, only check how many raster files would need to be downloaded from WorldPop if download_dry_run was False. Report the number and size of required file downloads, but do not actually fetch or process any files.

Returns:

The combined raster data.

  • For static products, dimensions are (y, x).

  • For multi-years products, dimensions are likewise (y, x) IF users only request a single year.

  • If multiple years are requested, dimensions are (year, y, x).

Returns None if download_dry_run is True.

Return type:

xarray.DataArray or None

Raises:
  • RasterReadError – If reading an input raster fails.

  • IncompatibleRasterError – This function validates input-raster attributes before merging. - crs is always validated. - _FillValue is validated only if masked=False and mask_and_scale=False. - scale_factor and add_offset are validated only if mask_and_scale=False. (This function thus trusts rioxarray to correctly normalise input rasters whenever mask_and_scale=True is passed, even if the underlying source files have different _FillValue, scale_factor or add_offset attributes.)

worldpoppy.raster.wp_warp(da, to_crs=None, res=None, resampling=None, **kwargs)

Reproject or resample a raster.

This is a convenience wrapper around rioxarray.reproject that handles Nodata values, memory materialisation (handling Dask vs Eager), and Enum-based “resampling” arguments.

WARNING: This function is EAGER. It triggers immediate computation. If ‘da’ is a lazy Dask array, it will be loaded into memory (materialised) before processing.

It supports three modes: 1. Reprojection: Change CRS (provide to_crs). 2. Resampling: Change resolution in current CRS (provide res, leave to_crs=None). 3. Both: Change both CRS and resolution.

Parameters:
  • da (xarray.DataArray) – The input raster data (usually from wp_raster).

  • to_crs (str or pyproj.CRS, optional) – The target Coordinate Reference System (e.g., “EPSG:3035”). If None, the raster’s current CRS is used (enabling pure resampling).

  • res (tuple or float, optional) – Target resolution in the units of to_crs (if provided) or in the units of the source CRS (if to_crs is not provided). If a single float is provided, it is used for both X and Y axes. If None, the resolution is determined automatically by rioxarray.

  • resampling (str or rasterio.enums.Resampling, optional) – The resampling method to use (e.g., ‘nearest’, ‘bilinear’, ‘sum’). If None, defaults to ‘nearest’.

  • **kwargs (dict) – Additional keyword arguments passed directly to rioxarray.reproject. Useful for passing specific GDAL warp options.

Returns:

The warped raster.

Return type:

xarray.DataArray

worldpoppy.download module

This module provides logic to download WorldPop data asynchronously, with support for automatic retry, file caching, and a preview of required download sizes (dry run).

Note

The implementation of this module draws on the “download.py” module from the blackmarblepy package by Gabriel Stefanini Vicente and Robert Marty. blackmarblepy is licensed under the Mozilla Public License (MPL-2.0), as is WorldPopPy.

Main classes

Main methods

  • purge_cache()

    Delete all files in the WorldPop local cache directory, with optional dry-run.

exception worldpoppy.download.DownloadError

Bases: Exception

Raised when one or more files fail to download.

exception worldpoppy.download.DownloadSizeCheckError

Bases: DownloadError

Raised when one or more HEAD requests fail during dry-run size checking.

class worldpoppy.download.WorldPopDownloader(directory=None)

Bases: object

An HTTP downloader to retrieve country-specific raster data from the WorldPop project.

directory

Local directory to which to download the data.

Type:

Path

URL: ClassVar[str] = 'https://data.worldpop.org'
download(product_name, iso3_codes, years=None, skip_download_if_exists=True, dry_run=False, chunk_size=4194304)

Asynchronously download a collection of country-specific WorldPop rasters.

Parameters:
  • product_name (str) – The name of the WorldPop data product of interest.

  • iso3_codes (str or List[str]) – One or more three-letter ISO codes, denoting the countries of interest.

  • years (int or List[int] or str, optional) – For annual data products, the year (or years) of interest, or the ‘all’ keyword (str) indicating that all available years for the requested data product should be downloaded. For static data products, this argument must be None (default).

  • skip_download_if_exists (bool, optional, default=True) – Whether to skip downloading raster files that already exist locally.

  • dry_run (bool, optional, default=False) – If True, only check how many files would need to be downloaded if dry_run was False. Report the number and size of required file downloads, but do not actually fetch or return any data.

  • chunk_size (int, optional, default=4MB) – The size (in bytes) of chunks to read/write during download. Larger chunks may improve performance, especially on systems with real-time file scanning (e.g., antivirus).

Returns:

A lexically sorted list of local download paths.

Return type:

list of pathlib.Path

Raises:

RuntimeError – If not all requested files were successfully downloaded

worldpoppy.download.purge_cache(dry_run=True, keep_country_borders=False)

Purge the local cache directory and any of its subdirectories.

Parameters:
  • dry_run (bool, optional) – If True (default), do not delete any files and simply report what would be deleted without the dry_run flag.

  • keep_country_borders (bool, optional, default=False) – If True, do not delete any cached data related to country borders. This data is assumed to be the only one which includes the ‘level0’ keyword in a file name.

Returns:

Summary of how many files and total size (bytes) would be or were deleted.

Return type:

dict

worldpoppy.manifest_builder

Core “engine” for building a raw data manifest for the worldpoppy library.

This module contains all the logic for traversing the WorldPop metadata API, parsing the results, and saving them to a new local cache file.

Main methods

  • build_raw_manifest_from_api()

    Query WorldPop’s meta-data API and analyse the results to build a new, raw manifest of raster datasets for worldpoppy. Note: Users will rarely need to import this function directly. Instead, it is called by the separate manifest_loader module whenever a cached version of the raw manifest does not exist.

For a detailed, high-level explanation of this module’s API traversal strategy, related terminology (e.g., “Leaf Node”, “Sample Payload”), and data-parsing logic, please see the manifest_build_strategy.md document in the project root.


A note on the complexity of this module:

The module’s size is a result of two challenges:

  1. To minimise API calls, this module implements a “sample and infer” strategy, in which a general template for download URLs of a raster-data series is inferred from only a single example per series.

  2. We must parse and “flatten” three different data organisation schemes used by the WorldPop project: ‘flat’ (1-to-1), ‘multi-year’ (1-to-N by year), and ‘multi-band’ (1-to-N by class).

When processing operations fail, an entire raster-data series is dropped from the raw manifest and will hence not be supported by the worldpoppy package.

exception worldpoppy.manifest_builder.APIRequestError

Bases: Exception

Raised when an API request fails permanently or after all retries.

worldpoppy.manifest_builder.build_raw_manifest_from_api(force_rebuild=False)

Query WorldPop’s meta-data API and analyse the results to build a new, raw manifest of raster datasets for worldpoppy. We call this manifest “raw” because it will be further checked and filtered (where needed) by the manifest_loader module.

This is a SERIAL (single-threaded) implementation.

Phase 1: Discover “Leaf Nodes” by recursively crawling the API hierarchy (using _discover_leaf_nodes).

Phase 2: Processes each discovered “Leaf Node” (using _process_leaf_node) by applying a “Sample -> Analyse -> Parse” strategy that generates our final list of raw manifest rows. This phase is “robust”, meaning a failure on one Leaf Node will be logged and skipped, allowing the raw manifest build to continue.

Parameters:

force_rebuild (bool, optional) – If True, forces a full re-crawl and re-processing of WorldPop’s meta-data API, even if cached results from a previous run exist on disk. Default is False.

worldpoppy.manifest_loader

Main public API for loading and interacting with the data manifest for worldpoppy.

This module is the primary entry point for both the end-user and other parts of the worldpoppy package. It provides the public-facing functions (e.g., wp_manifest, show_supported_data_products) for discovering and filtering supported raster datasets.

Main methods

The module is responsible for:

  1. Loading the raw manifest from disk (which is created by the separate manifest_builder module).

  2. Triggering the manifest_builder to run if the raw manifest cache is missing.

  3. Cleaning and validating the raw manifest data.

  4. Assigning a curated product_name to raster datasets for easier user queries.

Note on Terminology

In worldpoppy’s terminology, a “dataset” is always single downloadable raster file with a single band. Each such dataset conveys a concrete, thematic measurement and is uniquely identified either by a single country or a single country-year.

We speak of a “data product” to denote a collection of the same measurement across a larger set of countries (=static data product) or country-years (=multi-year data product).

The Cleaned Manifest DataFrame Schema

The primary output of this module (from wp_manifest) is a pandas.DataFrame with one row per supported, downloadable raster dataset. It contains the following columns:

Curated Columns:

  • wpy_id (str):

    The unique worldpoppy ID for this specific dataset entry. It is identical to WorldPop’s API ‘id’ for only “flat” (1-to-1) data products (e.g., “12345”). For “multi-year” or “multi-band” products, wpy_id is a synthetic ID generated by worldpoppy to ensure uniqueness (e.g., “12345_2020” or “67890_dst011”).

  • product_name (str):

    The unique, curated name of the data product in which a specific raster dataset is conceptually nested (see also our separate note on terminology). This is a worldpoppy-specific alias (e.g., “pop_g1”, “dist_esalc_g1_cultivated”) and based on definitions contained in the ‘product_definitions.toml’ file.

  • iso3 (str):

    The 3-letter ISO code for the country for which the raster dataset (=file) contains data. Currently, worldpoppy only supports country-specific raster data, but not files with continental or global coverage.

  • year (Int64):

    The 4-digit year of the dataset, or <NA> for static datasets that are not associated with a specific year of measurement.

  • multi_year (bool):

    True if the dataset is part of a multi-year time series. False for all static datasets.

  • product_notes (str):

    The curated, human-readable note for the data product to which the raster file belongs.

  • data_series (str):

    The larger data series to which the raster file belongs (either “global1” or “global2”).

  • arcsecs (Int64):

    The inferred resolution in arc-seconds (e.g., 3, 30) or <NA>.

Download & File Columns:

  • dataset_name (str):

    The raw filename stem without file-type suffix (e.g., “afg_wpgp_2000_100m”).

  • remote_path (str):

    The full URL to download the .tif file.

  • remote_name (str):

    The filename of the remote .tif file.

  • summary_url (str):

    A sample URL to the WorldPop summary page associated with this dataset.

Raw API Columns (for advanced use):

  • api_path (str):

    The path to the “Leaf Node” in WorldPop’s meta-data API that is associated with this dataset (e.g., “pop/wpgp” or “covariates/G2_NT_lights”). This is a powerful field for debugging and advanced use. You can use it to explore the raw meta-data API data directly. For example:

    (For a full explanation of these terms, see the ‘manifest_build_strategy.md’ documentation file).

  • api_slug (str):

    The final part of the api_path (e.g., “G2_NT_lights”).

  • api_entry_title (str):

    The raw title text from the meta-data API.

  • api_project (str):

    The raw project name from the meta-data API (e.g., “Covariates”).

  • api_series_category (str):

    The raw category text from the meta-data API.

  • api_series_desc (str):

    The raw long-form description from the meta-data API.

  • api_source (str):

    The raw source text from the meta-data API.

worldpoppy.manifest_loader.get_all_dataset_names()

Gets all unique dataset names (filename stems) from the manifest.

worldpoppy.manifest_loader.get_all_isos()

Gets all unique ISO3 codes from the manifest.

worldpoppy.manifest_loader.get_all_product_names()

Gets all unique product names from the manifest.

worldpoppy.manifest_loader.get_product_info(product_name)

Retrieve coverage and type information for a specific data product.

This function queries the manifest to determine if a product is part of a multi-year time series and returns the sets of available years and countries (ISO3 codes).

Parameters:

product_name (str) – The exact, curated name of the data product to query (e.g., ‘pop_g1’).

Returns:

A dictionary containing the following keys:

  • is_multi_year (bool): True if the product is a multi-year time series, False otherwise.

  • years (set[int]): A set of all unique years (non-nullable integers) for which this product has data. For static products, this set may be empty or contain a single year (the year of data recording).

  • iso3_codes (set[str]): A set of all unique ISO3 codes indicating the countries for which this product has data.

Return type:

dict

Note

This function returns the union of all available years and countries. It does not guarantee that data exists for every combination of year and country (i.e., the data matrix may be sparse).

worldpoppy.manifest_loader.get_static_product_names()

Gets all unique static product names from the manifest.

worldpoppy.manifest_loader.resolve_product_years(product_name, years)

Resolve a user’s ‘years’ argument (including keywords) into a concrete list of years or None, consistent with the worldpoppy manifest.

Logic:

  • If years is None: Return None.

  • If years is ‘all’:
    • Multi-year product: Return all available years (List[int]).

    • Static product (no year): Return None.

  • If years is a list/tuple (e.g. [2010, ‘last’]):
    • Resolves integers as-is.

    • Resolves ‘first’/’last’ to concrete years.

    • Returns a sorted, unique list of integers.

  • If years is a single int or ‘first’/’last’: Treated as a list of length 1.

Parameters:
  • product_name (str) – The exact name of the WorldPop product.

  • years (int, List[Union[int, str]], str or None) – The years argument to resolve. Supports integers and keywords (‘all’, ‘first’, ‘last’).

Returns:

The concrete list of years for the downloader, or None for static products without a recorded year.

Return type:

List[int] or None

worldpoppy.manifest_loader.show_supported_data_products(iso3_codes=None, years=None, keywords=None, static_only=False)

Display supported WorldPop data products, with optional filtering.

Parameters:
  • iso3_codes (str or List[str], optional) – One or more three-letter ISO codes indicating the countries for which to show supported data products.

  • years (int or List[int] or str, optional) –

    For annual data products, either one or more years (int or List[int]) for which to show results, or the ‘all’ keyword (str).

    • If specific years are provided, products are filtered to match those years.

    • If ‘all’ or None is provided, all products (including static ones) are considered.

  • keywords (str or List[str], optional) – A single search term (str) or a list of search terms. If None or empty, the original DataFrame is returned.

  • static_only (bool, optional) – If True, only static data products will be shown.

Returns:

This function prints to the console or notebook.

Return type:

None

worldpoppy.manifest_loader.wp_manifest(product_name=None, iso3_codes=None, years=None, static_only=False, multi_year_only=False, keywords=None)

Load the worldpoppy raster-data manifest and filter it.

Together with show_supported_data_products, this function offers users a way of discovering and selecting for download. While show_supported_data_products only provides a high-level overview of available data, this function will return the full raster-data manifest for worldpoppy (when called without any filter arguments).

Parameters:
  • product_name (str, optional) – The exact, curate name of the data product (e.g., ‘pop_g1’) for which to return manifest entries. Mutually exclusive with ‘keyword’.

  • iso3_codes (str or List[str], optional) – One or more three-letter ISO codes indicating the countries of interest.

  • years (int, List[Union[int, str]], or str, optional) –

    One or more years of interest or a keyword string.

    • If years are provided (as ints or keywords), filters the manifest to those specific years.

    • If ‘all’: Returns entries for all available years.

    • If ‘first’: Returns the earliest available year (requires product_name).

    • If ‘last’: Returns the most recent available year (requires product_name).

    • If None (default), no filtering on year is performed.

  • static_only (bool, optional) – If True, only return manifest entries for static datasets (i.e., those not part of a multi-year time series). Mutually exclusive with ‘multi_year_only’.

  • multi_year_only (bool, optional) – If True, only return manifest entries for multi-year time series. Mutually exclusive with ‘static_only’.

  • keywords (str, optional) – A single search term (str) or a list of search terms by which to perform a case-insensitive search of manifest entries. Mutually exclusive with ‘product_name’. Note that the search is performed using certain product-level fields only: ‘product_name’, ‘product_notes’, ‘project’, and ‘data_series’.

Returns:

A DataFrame containing the (optionally filtered) manifest, with one row per downloadable raster dataset.

For a detailed description of all columns, see the manifest_loader module documentation.

Return type:

pandas.DataFrame

worldpoppy.manifest_loader.wp_manifest_constrained(product_name, iso3_codes, years=None)

A strict wrapper for wp_manifest that validates a specific download request.

This function is NOT for data discovery. Its purpose is to validate that a download request is unambiguous and that all requested data is available, based on the WorldPopPy manifest.

Parameters:
  • product_name (str) – The exact name of the WorldPop product of interest.

  • iso3_codes (str or List[str]) – One or more three-letter ISO codes indicating the countries of interest.

  • years (int, List[Union[int, str]], or str, optional) –

    The specific year(s) of interest.

    • If years are provided, validates availability for those specific years. (Supports lists mixed with keywords, e.g. [2015, 'last']).

    • If ‘all’: Checks availability for every year recorded in the manifest.

    • If ‘first’: Checks availability for the earliest year.

    • If ‘last’: Checks availability for the most recent year.

    • If None, validates that the product is static and has no year dimension.

Returns:

The filtered manifest, if the request is valid.

Return type:

pandas.DataFrame

Raises:

ValueError – If the request is flawed (e.g., missing years for a multi-year product) or if data is not available for all requested countries and/or years.

worldpoppy.borders module

This auxiliary module provides helper functions to build and load simplified country polygons for the whole world, based on down-sampled level0_100m rasters from WorldPop.

worldpoppy.borders.build_country_borders(overwrite=False)

Build a GeoDataFrame with country borders for the whole world by converting WorldPop admin0 rasters into simplified vector polygons. The output is saved to disk as a Feather file for future use.

Notes

  • The border data generated by this function is not intended for display or any geo-data analysis. Its sole purpose is to power raster-data queries in worldpoppy.WorldPopDownloader.

  • This function requires an installation of osgeo.gdal.

Parameters:

overwrite (bool, optional) – If True, force regeneration of the border polygons even if they already exist on disk. Default is False.

Raises:

ModuleNotFoundError – If an installation of the optional ogeo.gdal library is not available.

worldpoppy.borders.load_country_borders()

Return a GeoDataFrame with simplified, buffered country polygons extracted from WorldPop level0_100m rasters.

If the cached border file does not exist, this function will trigger the build process.

Returns:

A GeoDataFrame with simplified, buffered polygons for all countries and an ‘iso3’ column.

Return type:

geopandas.GeoDataFrame

worldpoppy.func_utils module

Collection of various helper functions.

Note: Plotting utilities are located in a separate module.

worldpoppy.func_utils.geolocate_name(nominatim_query, to_crs=None)

Return the geo-coordinate associated with a given location name, based on search results from OSM’s ‘Nominatim’ service.

Parameters:
  • nominatim_query (str) – A location name to be geocoded.

  • to_crs (pyproj.CRS or str, optional) – If specified, transforms the returned coordinate from (lon, lat) to this CRS.

Returns:

The (x, y) coordinate in the target CRS, or (lon, lat) in WGS84 if to_crs is None.

Return type:

Tuple[float, float]

Raises:

NominatimSearchEmptyError – If the Nominatim query crashed or returned None.

worldpoppy.plot_utils module

Collection of various plotting utility functions for worldpoppy.

This module provides helpers to visualise geospatial data, including plotting country borders, marking locations, and cleaning up map axes.

Main methods

worldpoppy.plot_utils.clean_axes(ax=None, title=None, remove_xy_ticks=True, **title_kwargs)

Clean up matplotlib axes by setting equal aspect and removing labels.

This function is polymorphic: it accepts a single Axes object, a list of Axes, a numpy array of Axes, or an xarray FacetGrid.

Parameters:
  • ax (matplotlib.axes.Axes, array-like, or FacetGrid, optional) – The axis or collection of axes to clean. If None, uses current axis.

  • title (str, optional) – Title to set. If ‘ax’ is a collection, this sets the title for the first axis only (often effectively titling the figure), to avoid repeating the title on every subplot.

  • remove_xy_ticks (bool, optional, default=True) – If True, remove both x and y ticks.

  • **title_kwargs – Additional keyword arguments passed to set_title.

worldpoppy.plot_utils.plot_country_borders(iso3_codes, ax=None, to_crs=None, **kwargs)

Plot country borders on a matplotlib axis.

Parameters:
  • iso3_codes (str or list of str) – One or more ISO3 country codes, or the ‘all’ keyword.

  • ax (matplotlib.axes.Axes, optional) – Axis on which to plot. If None, uses current axis.

  • to_crs (pyproj.CRS or str, optional) – If specified, projects the country borders from WGS84 to this CRS.

  • **kwargs – Additional keywords passed to GeoDataFrame.plot.

worldpoppy.plot_utils.plot_location_markers(locations, ax=None, annotate=True, color='k', fontsize=None, fontweight=None, textcoords='offset points', xytext=(0, -10), ha='left', va='center', other_annotate_kwargs=None, to_crs=None, **scatter_kwargs)

Plot markers for geolocated place names or raw coordinates on a matplotlib axis. Optionally annotate the location markers as well.

Parameters:
  • locations (str, tuple, or list of (str or tuple)) –

    The locations to plot. Can be:

    • A location name string (e.g., “Nairobi”).

    • A tuple of (location_name, display_label) to search for “location_name” but annotate with “display_label”.

    • A coordinate tuple (longitude, latitude) in WGS84.

    • A coordinate tuple with a label (longitude, latitude, label).

    • A mixed list of strings and tuples.

  • ax (matplotlib.axes.Axes, optional) – Axis on which to plot. If None, uses current axis.

  • annotate (bool, default=True) – Whether to annotate points with their names (or coordinates).

  • color (str, default='k') – Colour to use for both the scatter marker and the annotation text.

  • fontsize (int or str, optional) – Font size for the annotation text (e.g., 10, ‘small’, ‘medium’).

  • fontweight (int or str, optional) – Font weight for the annotation text (e.g., ‘bold’, ‘normal’, 700).

  • textcoords (str, default="offset points") – Coordinate system for annotation positioning.

  • xytext (tuple of int, default=(7, -7)) – Offset of annotation text from the marker.

  • ha (str, default='left') – Horizontal alignment of the annotation text.

  • va (str, default='center') – Vertical alignment of the annotation text.

  • other_annotate_kwargs (dict, optional) – Additional keyword arguments passed to annotate (e.g., rotation, bbox).

  • to_crs (pyproj.CRS or str, optional) – If specified, projects the geo-coordinate from WGS84 to this CRS.

  • **scatter_kwargs – Additional keywords passed to scatter.

Notes

Geocoding Reliability Warning When users pass a location name, this function tries to resolve it into a GPS coordinate via OpenStreetMap’s Nominatim service. Nominatim may occasionally return coordinates for a different location than intended.

For precise control over the plotted location, it is strongly recommended to pass explicit GPS coordinates as tuples: (longitude, latitude, label) or (longitude, latitude).

worldpoppy.config module

worldpoppy.config.get_cache_dir()

Return the local cache directory for downloaded WorldPop datasets.

Note

You can override the default cache directory by setting the “WORLDPOPPY_CACHE_DIR” environment variable.

worldpoppy.config.get_max_concurrency()

Return the maximum concurrency for parallel raster downloads.

Note

You can override the default concurrency limit by setting the “WORLDPOPPY_MAX_CONCURRENCY” environment variable.