Skip to content

discovery

napt.discovery.base

Discovery strategy base protocol and registry for NAPT.

This module defines the foundational components for the discovery system:

  • DiscoveryStrategy protocol: Interface that all strategies must implement
  • Strategy registry: Global dict mapping strategy names to implementations
  • Registration and lookup functions: register_strategy() and get_strategy()

The discovery system uses a strategy pattern to support multiple ways of obtaining application installers and their versions:

  • url_download: Direct download from a static URL (FILE-FIRST)
  • web_scrape: Scrape vendor download pages to find links and extract versions (VERSION-FIRST)
  • api_github: Fetch from GitHub releases API (VERSION-FIRST)
  • api_json: Query JSON API endpoints for version and download URL (VERSION-FIRST)
Design Philosophy
  • Strategies are Protocol classes (structural subtyping, not inheritance)
  • Registration happens at module import time (strategies self-register)
  • Registry is a simple dict (no complex dependency injection needed)
  • Each strategy is stateless and can be instantiated on-demand

Protocol Benefits:

Using typing.Protocol instead of ABC allows:

  • Duck typing: Classes don't need explicit inheritance
  • Better IDE support: Type checkers verify interface compliance
  • Flexibility: Third-party code can add strategies without touching base
Example

Implementing a custom strategy:

from napt.discovery.base import register_strategy, DiscoveryStrategy
from pathlib import Path
from typing import Any
from napt.versioning.keys import DiscoveredVersion

class MyCustomStrategy:
    def discover_version(
        self, app_config: dict[str, Any], output_dir: Path
    ) -> tuple[DiscoveredVersion, Path, str]:
        # Implement your discovery logic here
        ...

# Register it (typically at module import)
register_strategy("my_custom", MyCustomStrategy)

# Now it can be used in recipes:
# source:
#   strategy: my_custom
#   ...

DiscoveryStrategy

Bases: Protocol

Protocol for version discovery strategies.

Each strategy must implement discover_version() which downloads and extracts version information based on the app config.

Strategies may optionally implement validate_config() to provide strategy-specific configuration validation without network calls.

Source code in napt/discovery/base.py
class DiscoveryStrategy(Protocol):
    """Protocol for version discovery strategies.

    Each strategy must implement discover_version() which downloads
    and extracts version information based on the app config.

    Strategies may optionally implement validate_config() to provide
    strategy-specific configuration validation without network calls.
    """

    def discover_version(
        self, app_config: dict[str, Any], output_dir: Path
    ) -> tuple[DiscoveredVersion, Path, str, dict]:
        """Discover and download an application version.

        Args:
            app_config: The app configuration from the recipe
                (`config["app"]`).
            output_dir: Directory to download the installer to.

        Returns:
            A tuple (discovered_version, file_path, sha256, headers), where
                discovered_version is the version information, file_path is
                the path to the downloaded file, sha256 is the SHA-256 hash,
                and headers contains HTTP response headers for caching.

        Raises:
            ValueError: On discovery or download failures.
            RuntimeError: On discovery or download failures.

        """
        ...

    def validate_config(self, app_config: dict[str, Any]) -> list[str]:
        """Validate strategy-specific configuration (optional).

        This method validates the app configuration for strategy-specific
        requirements without making network calls or downloading files.
        Useful for quick feedback during recipe development.

        Args:
            app_config: The app configuration from the recipe
                (`config["app"]`).

        Returns:
            List of error messages. Empty list if configuration is valid.
            Each error should be a human-readable description of the issue.

        Example:
            Check required fields:
                ```python
                def validate_config(self, app_config):
                    errors = []
                    source = app_config.get("source", {})
                    if "url" not in source:
                        errors.append("Missing required field: source.url")
                    return errors
                ```

        Note:
            This method is optional; strategies without it will skip validation.
            Should NOT make network calls or download files. Should check field
            presence, types, and format only. Used by 'napt validate' command
            for fast recipe checking.

        """
        ...

discover_version

discover_version(app_config: dict[str, Any], output_dir: Path) -> tuple[DiscoveredVersion, Path, str, dict]

Discover and download an application version.

Parameters:

Name Type Description Default
app_config dict[str, Any]

The app configuration from the recipe (config["app"]).

required
output_dir Path

Directory to download the installer to.

required

Returns:

Type Description
tuple[DiscoveredVersion, Path, str, dict]

A tuple (discovered_version, file_path, sha256, headers), where discovered_version is the version information, file_path is the path to the downloaded file, sha256 is the SHA-256 hash, and headers contains HTTP response headers for caching.

Raises:

Type Description
ValueError

On discovery or download failures.

RuntimeError

On discovery or download failures.

Source code in napt/discovery/base.py
def discover_version(
    self, app_config: dict[str, Any], output_dir: Path
) -> tuple[DiscoveredVersion, Path, str, dict]:
    """Discover and download an application version.

    Args:
        app_config: The app configuration from the recipe
            (`config["app"]`).
        output_dir: Directory to download the installer to.

    Returns:
        A tuple (discovered_version, file_path, sha256, headers), where
            discovered_version is the version information, file_path is
            the path to the downloaded file, sha256 is the SHA-256 hash,
            and headers contains HTTP response headers for caching.

    Raises:
        ValueError: On discovery or download failures.
        RuntimeError: On discovery or download failures.

    """
    ...

validate_config

validate_config(app_config: dict[str, Any]) -> list[str]

Validate strategy-specific configuration (optional).

This method validates the app configuration for strategy-specific requirements without making network calls or downloading files. Useful for quick feedback during recipe development.

Parameters:

Name Type Description Default
app_config dict[str, Any]

The app configuration from the recipe (config["app"]).

required

Returns:

Type Description
list[str]

List of error messages. Empty list if configuration is valid.

list[str]

Each error should be a human-readable description of the issue.

Example

Check required fields:

def validate_config(self, app_config):
    errors = []
    source = app_config.get("source", {})
    if "url" not in source:
        errors.append("Missing required field: source.url")
    return errors

Note

This method is optional; strategies without it will skip validation. Should NOT make network calls or download files. Should check field presence, types, and format only. Used by 'napt validate' command for fast recipe checking.

Source code in napt/discovery/base.py
def validate_config(self, app_config: dict[str, Any]) -> list[str]:
    """Validate strategy-specific configuration (optional).

    This method validates the app configuration for strategy-specific
    requirements without making network calls or downloading files.
    Useful for quick feedback during recipe development.

    Args:
        app_config: The app configuration from the recipe
            (`config["app"]`).

    Returns:
        List of error messages. Empty list if configuration is valid.
        Each error should be a human-readable description of the issue.

    Example:
        Check required fields:
            ```python
            def validate_config(self, app_config):
                errors = []
                source = app_config.get("source", {})
                if "url" not in source:
                    errors.append("Missing required field: source.url")
                return errors
            ```

    Note:
        This method is optional; strategies without it will skip validation.
        Should NOT make network calls or download files. Should check field
        presence, types, and format only. Used by 'napt validate' command
        for fast recipe checking.

    """
    ...

register_strategy

register_strategy(name: str, strategy_class: type[DiscoveryStrategy]) -> None

Register a discovery strategy by name in the global registry.

This function should be called when a strategy module is imported, typically at module level. Registering the same name twice will overwrite the previous registration (allows monkey-patching for tests).

Parameters:

Name Type Description Default
name str

Strategy name (e.g., "url_download"). This is the value used in recipe YAML files under source.strategy. Names should be lowercase with underscores for readability.

required
strategy_class type[DiscoveryStrategy]

The strategy class to register. Must implement the DiscoveryStrategy protocol (have a discover_version method with the correct signature).

required
Example

Register at module import time:

# In discovery/my_strategy.py
from .base import register_strategy

class MyStrategy:
    def discover_version(self, app_config, output_dir):
        ...

register_strategy("my_strategy", MyStrategy)

Note

No validation is performed at registration time. Type checkers will verify protocol compliance at static analysis time. Runtime errors occur at strategy instantiation or invocation.

Source code in napt/discovery/base.py
def register_strategy(name: str, strategy_class: type[DiscoveryStrategy]) -> None:
    """Register a discovery strategy by name in the global registry.

    This function should be called when a strategy module is imported,
    typically at module level. Registering the same name twice will
    overwrite the previous registration (allows monkey-patching for tests).

    Args:
        name: Strategy name (e.g., "url_download"). This is the value
            used in recipe YAML files under source.strategy. Names should be
            lowercase with underscores for readability.
        strategy_class: The strategy class to
            register. Must implement the DiscoveryStrategy protocol (have a
            discover_version method with the correct signature).

    Example:
        Register at module import time:
            ```python
            # In discovery/my_strategy.py
            from .base import register_strategy

            class MyStrategy:
                def discover_version(self, app_config, output_dir):
                    ...

            register_strategy("my_strategy", MyStrategy)
            ```

    Note:
        No validation is performed at registration time. Type checkers will
        verify protocol compliance at static analysis time. Runtime errors
        occur at strategy instantiation or invocation.

    """
    _STRATEGY_REGISTRY[name] = strategy_class

get_strategy

get_strategy(name: str) -> DiscoveryStrategy

Get a discovery strategy instance by name from the global registry.

The strategy is instantiated on-demand (strategies are stateless, so a new instance is created for each call). The strategy module must have been imported first for registration to occur.

Parameters:

Name Type Description Default
name str

Strategy name (e.g., "url_download"). Must exactly match a name registered via register_strategy(). Case-sensitive.

required

Returns:

Type Description
DiscoveryStrategy

A new instance of the requested strategy, ready to use.

Raises:

Type Description
ConfigError

If the strategy name is not registered. The error message includes a list of available strategies for troubleshooting.

Example

Get and use a strategy:

from napt.discovery import get_strategy
strategy = get_strategy("url_download")
# Use strategy.discover_version(...)

Handle unknown strategy:

try:
    strategy = get_strategy("nonexistent")
except ConfigError as e:
    print(f"Strategy not found: {e}")

Note

Strategies must be registered before they can be retrieved. The url_download strategy is auto-registered when imported. New strategies can be added by creating a module and registering.

Source code in napt/discovery/base.py
def get_strategy(name: str) -> DiscoveryStrategy:
    """Get a discovery strategy instance by name from the global registry.

    The strategy is instantiated on-demand (strategies are stateless, so
    a new instance is created for each call). The strategy module must
    have been imported first for registration to occur.

    Args:
        name: Strategy name (e.g., "url_download"). Must exactly match
            a name registered via register_strategy(). Case-sensitive.

    Returns:
        A new instance of the requested strategy, ready
            to use.

    Raises:
        ConfigError: If the strategy name is not registered. The error message
            includes a list of available strategies for troubleshooting.

    Example:
        Get and use a strategy:
            ```python
            from napt.discovery import get_strategy
            strategy = get_strategy("url_download")
            # Use strategy.discover_version(...)
            ```

        Handle unknown strategy:
            ```python
            try:
                strategy = get_strategy("nonexistent")
            except ConfigError as e:
                print(f"Strategy not found: {e}")
            ```

    Note:
        Strategies must be registered before they can be retrieved. The
        url_download strategy is auto-registered when imported. New strategies
        can be added by creating a module and registering.

    """
    if name not in _STRATEGY_REGISTRY:
        available = ", ".join(_STRATEGY_REGISTRY.keys())
        raise ConfigError(
            f"Unknown discovery strategy: {name!r}. Available: {available or '(none)'}"
        )
    return _STRATEGY_REGISTRY[name]()

napt.discovery.url_download

URL download discovery strategy for NAPT.

This is a FILE-FIRST strategy that downloads an installer from a fixed HTTP(S) URL and extracts version information from the downloaded file. Uses HTTP ETag conditional requests to avoid re-downloading unchanged files.

Key Advantages:

  • Works with any fixed URL (version not required in URL)
  • Extracts accurate version directly from installer metadata
  • Uses ETag-based conditional requests for efficiency (~500ms vs full download)
  • Simple and reliable for vendors with stable download URLs
  • Fallback strategy when version not available via API/URL pattern

Supported Version Extraction:

  • MSI files (.msi extension): Automatically detected, extracts ProductVersion property from MSI files
  • Other file types: Not supported. Use a version-first strategy (api_github, api_json, web_scrape) or ensure file is an MSI installer.
  • (Future) EXE files: Auto-detect and extract FileVersion from PE headers

Use Cases:

  • Google Chrome: Fixed enterprise MSI URL, version embedded in MSI
  • Mozilla Firefox: Fixed enterprise MSI URL, version embedded in MSI
  • Vendors with stable download URLs and embedded version metadata
  • When version not available via API, URL pattern, or GitHub tags

Recipe Configuration:

source:
  strategy: url_download
  url: "https://vendor.com/installer.msi"          # Required: download URL

Configuration Fields:

  • url (str, required): HTTP(S) URL to download the installer from. The URL should be stable and point to the latest version.

Version Extraction: Automatically detected by file extension. MSI files (.msi extension) have versions extracted from ProductVersion property. Other file types are not supported for version extraction - use a version-first strategy (api_github, api_json, web_scrape) instead.

Error Handling:

  • ConfigError: Missing or invalid configuration fields
  • NetworkError: Download failures, version extraction errors
  • Errors are chained with 'from err' for better debugging
Example

In a recipe YAML:

apps:
  - name: "My App"
    id: "my-app"
    source:
      strategy: url_download
      url: "https://example.com/myapp-setup.msi"

From Python:

from pathlib import Path
from napt.discovery.url_download import UrlDownloadStrategy

strategy = UrlDownloadStrategy()
app_config = {
    "source": {
        "url": "https://example.com/app.msi",
    }
}

# With cache for ETag optimization
cache = {"etag": 'W/"abc123"', "sha256": "..."}
discovered, file_path, sha256, headers = strategy.discover_version(
    app_config, Path("./downloads"), cache=cache
)
print(f"Version {discovered.version} at {file_path}")

From Python (using core orchestration):

from pathlib import Path
from napt.core import discover_recipe

# Automatically uses ETag optimization
result = discover_recipe(Path("recipe.yaml"), Path("./downloads"))
print(f"Version {result.version} at {result.file_path}")

Note
  • Must download file to extract version (architectural constraint)
  • ETag optimization reduces bandwidth but still requires network round-trip
  • Core orchestration automatically provides cached ETag if available
  • Server must support ETag or Last-Modified headers for optimization
  • If server doesn't support conditional requests, full download occurs every time
  • Consider version-first strategies (web_scrape, api_github, api_json) for better performance when version available via web scraping or API

UrlDownloadStrategy

Discovery strategy for static HTTP(S) URLs.

Configuration example

source: strategy: url_download url: "https://example.com/installer.msi"

Source code in napt/discovery/url_download.py
class UrlDownloadStrategy:
    """Discovery strategy for static HTTP(S) URLs.

    Configuration example:
        source:
          strategy: url_download
          url: "https://example.com/installer.msi"
    """

    def discover_version(
        self,
        app_config: dict[str, Any],
        output_dir: Path,
        cache: dict[str, Any] | None = None,
    ) -> tuple[DiscoveredVersion, Path, str, dict]:
        """Download from static URL and extract version from the file.

        Args:
            app_config: App configuration containing source.url and
                source.version.
            output_dir: Directory to save the downloaded file.
            cache: Cached state with etag, last_modified,
                file_path, and sha256 for conditional requests. If provided
                and file is unchanged (HTTP 304), the cached file is returned.

        Returns:
            A tuple (version_info, file_path, sha256, headers), where
                version_info contains the discovered version information,
                file_path is the Path to the downloaded file, sha256 is the
                SHA-256 hash, and headers contains HTTP response headers.

        Raises:
            ConfigError: If required config fields are missing or invalid.
            NetworkError: If download or version extraction fails.

        """
        from napt.logging import get_global_logger

        logger = get_global_logger()
        source = app_config.get("source", {})
        url = source.get("url")
        if not url:
            raise ConfigError("url_download strategy requires 'source.url' in config")

        app_id = app_config.get("id", "")

        logger.verbose("DISCOVERY", "Strategy: url_download (file-first)")
        logger.verbose("DISCOVERY", f"Source URL: {url}")

        # Extract ETag/Last-Modified from cache if available
        etag = cache.get("etag") if cache else None
        last_modified = cache.get("last_modified") if cache else None

        if etag:
            logger.verbose("DISCOVERY", f"Using cached ETag: {etag}")
        if last_modified:
            logger.verbose("DISCOVERY", f"Using cached Last-Modified: {last_modified}")

        # Download the file (with conditional request if cache available)
        try:
            dl = download_file(
                url,
                output_dir / app_id,
                etag=etag,
                last_modified=last_modified,
            )
            file_path, sha256, headers = dl.file_path, dl.sha256, dl.headers
        except NotModifiedError:
            # File unchanged (HTTP 304), use cached version
            # Use convention-based path: derive filename from URL
            logger.info(
                "CACHE", "File not modified (HTTP 304), using cached version"
            )

            if not cache or "sha256" not in cache:
                raise NetworkError(
                    "Cache indicates file not modified, but missing SHA-256. "
                    "Try running with --stateless to force re-download."
                ) from None

            # Derive file path from URL (convention-based, schema v2)
            from urllib.parse import urlparse

            filename = Path(urlparse(url).path).name
            cached_file = output_dir / app_id / filename

            if not cached_file.exists():
                raise NetworkError(
                    f"Cached file {cached_file} not found. "
                    f"File may have been deleted. Try running with --stateless."
                ) from None

            # Extract version from cached file (auto-detect by extension)
            if cached_file.suffix.lower() == ".msi":
                logger.verbose(
                    "DISCOVERY", "Auto-detected MSI file, extracting version"
                )
                try:
                    discovered = version_from_msi_product_version(cached_file)
                except Exception as err:
                    raise NetworkError(
                        f"Failed to extract MSI ProductVersion from cached "
                        f"file {cached_file}: {err}"
                    ) from err
            else:
                raise ConfigError(
                    f"Cannot extract version from file type: {cached_file.suffix!r}. "
                    f"url_download strategy currently supports MSI files only. "
                    f"For other file types, use a version-first strategy (api_github, "
                    f"api_json, web_scrape) or ensure the file is an MSI installer."
                ) from None

            # Return cached info with preserved headers (prevents overwriting ETag)
            # When 304, no new headers received, so return cached values to
            # preserve them
            preserved_headers = {}
            if cache.get("etag"):
                preserved_headers["ETag"] = cache["etag"]
            if cache.get("last_modified"):
                preserved_headers["Last-Modified"] = cache["last_modified"]

            return discovered, cached_file, cache["sha256"], preserved_headers
        except Exception as err:
            if isinstance(err, (NetworkError, ConfigError)):
                raise
            raise NetworkError(f"Failed to download {url}: {err}") from err

        # File was downloaded (not cached), extract version from it (auto-detect by extension)
        if file_path.suffix.lower() == ".msi":
            logger.verbose("DISCOVERY", "Auto-detected MSI file, extracting version")
            try:
                discovered = version_from_msi_product_version(file_path)
            except Exception as err:
                raise NetworkError(
                    f"Failed to extract MSI ProductVersion from {file_path}: {err}"
                ) from err
        else:
            raise ConfigError(
                f"Cannot extract version from file type: {file_path.suffix!r}. "
                f"url_download strategy currently supports MSI files only. "
                f"For other file types, use a version-first strategy (api_github, "
                f"api_json, web_scrape) or ensure the file is an MSI installer."
            )

        return discovered, file_path, sha256, headers

    def validate_config(self, app_config: dict[str, Any]) -> list[str]:
        """Validate url_download strategy configuration.

        Checks for required fields and correct types without making network calls.

        Args:
            app_config: The app configuration from the recipe.

        Returns:
            List of error messages (empty if valid).

        """
        errors = []
        source = app_config.get("source", {})

        # Check required fields
        if "url" not in source:
            errors.append("Missing required field: source.url")
        elif not isinstance(source["url"], str):
            errors.append("source.url must be a string")
        elif not source["url"].strip():
            errors.append("source.url cannot be empty")

        # Version extraction is now auto-detected by file extension
        # No version configuration validation needed

        return errors

discover_version

discover_version(app_config: dict[str, Any], output_dir: Path, cache: dict[str, Any] | None = None) -> tuple[DiscoveredVersion, Path, str, dict]

Download from static URL and extract version from the file.

Parameters:

Name Type Description Default
app_config dict[str, Any]

App configuration containing source.url and source.version.

required
output_dir Path

Directory to save the downloaded file.

required
cache dict[str, Any] | None

Cached state with etag, last_modified, file_path, and sha256 for conditional requests. If provided and file is unchanged (HTTP 304), the cached file is returned.

None

Returns:

Type Description
tuple[DiscoveredVersion, Path, str, dict]

A tuple (version_info, file_path, sha256, headers), where version_info contains the discovered version information, file_path is the Path to the downloaded file, sha256 is the SHA-256 hash, and headers contains HTTP response headers.

Raises:

Type Description
ConfigError

If required config fields are missing or invalid.

NetworkError

If download or version extraction fails.

Source code in napt/discovery/url_download.py
def discover_version(
    self,
    app_config: dict[str, Any],
    output_dir: Path,
    cache: dict[str, Any] | None = None,
) -> tuple[DiscoveredVersion, Path, str, dict]:
    """Download from static URL and extract version from the file.

    Args:
        app_config: App configuration containing source.url and
            source.version.
        output_dir: Directory to save the downloaded file.
        cache: Cached state with etag, last_modified,
            file_path, and sha256 for conditional requests. If provided
            and file is unchanged (HTTP 304), the cached file is returned.

    Returns:
        A tuple (version_info, file_path, sha256, headers), where
            version_info contains the discovered version information,
            file_path is the Path to the downloaded file, sha256 is the
            SHA-256 hash, and headers contains HTTP response headers.

    Raises:
        ConfigError: If required config fields are missing or invalid.
        NetworkError: If download or version extraction fails.

    """
    from napt.logging import get_global_logger

    logger = get_global_logger()
    source = app_config.get("source", {})
    url = source.get("url")
    if not url:
        raise ConfigError("url_download strategy requires 'source.url' in config")

    app_id = app_config.get("id", "")

    logger.verbose("DISCOVERY", "Strategy: url_download (file-first)")
    logger.verbose("DISCOVERY", f"Source URL: {url}")

    # Extract ETag/Last-Modified from cache if available
    etag = cache.get("etag") if cache else None
    last_modified = cache.get("last_modified") if cache else None

    if etag:
        logger.verbose("DISCOVERY", f"Using cached ETag: {etag}")
    if last_modified:
        logger.verbose("DISCOVERY", f"Using cached Last-Modified: {last_modified}")

    # Download the file (with conditional request if cache available)
    try:
        dl = download_file(
            url,
            output_dir / app_id,
            etag=etag,
            last_modified=last_modified,
        )
        file_path, sha256, headers = dl.file_path, dl.sha256, dl.headers
    except NotModifiedError:
        # File unchanged (HTTP 304), use cached version
        # Use convention-based path: derive filename from URL
        logger.info(
            "CACHE", "File not modified (HTTP 304), using cached version"
        )

        if not cache or "sha256" not in cache:
            raise NetworkError(
                "Cache indicates file not modified, but missing SHA-256. "
                "Try running with --stateless to force re-download."
            ) from None

        # Derive file path from URL (convention-based, schema v2)
        from urllib.parse import urlparse

        filename = Path(urlparse(url).path).name
        cached_file = output_dir / app_id / filename

        if not cached_file.exists():
            raise NetworkError(
                f"Cached file {cached_file} not found. "
                f"File may have been deleted. Try running with --stateless."
            ) from None

        # Extract version from cached file (auto-detect by extension)
        if cached_file.suffix.lower() == ".msi":
            logger.verbose(
                "DISCOVERY", "Auto-detected MSI file, extracting version"
            )
            try:
                discovered = version_from_msi_product_version(cached_file)
            except Exception as err:
                raise NetworkError(
                    f"Failed to extract MSI ProductVersion from cached "
                    f"file {cached_file}: {err}"
                ) from err
        else:
            raise ConfigError(
                f"Cannot extract version from file type: {cached_file.suffix!r}. "
                f"url_download strategy currently supports MSI files only. "
                f"For other file types, use a version-first strategy (api_github, "
                f"api_json, web_scrape) or ensure the file is an MSI installer."
            ) from None

        # Return cached info with preserved headers (prevents overwriting ETag)
        # When 304, no new headers received, so return cached values to
        # preserve them
        preserved_headers = {}
        if cache.get("etag"):
            preserved_headers["ETag"] = cache["etag"]
        if cache.get("last_modified"):
            preserved_headers["Last-Modified"] = cache["last_modified"]

        return discovered, cached_file, cache["sha256"], preserved_headers
    except Exception as err:
        if isinstance(err, (NetworkError, ConfigError)):
            raise
        raise NetworkError(f"Failed to download {url}: {err}") from err

    # File was downloaded (not cached), extract version from it (auto-detect by extension)
    if file_path.suffix.lower() == ".msi":
        logger.verbose("DISCOVERY", "Auto-detected MSI file, extracting version")
        try:
            discovered = version_from_msi_product_version(file_path)
        except Exception as err:
            raise NetworkError(
                f"Failed to extract MSI ProductVersion from {file_path}: {err}"
            ) from err
    else:
        raise ConfigError(
            f"Cannot extract version from file type: {file_path.suffix!r}. "
            f"url_download strategy currently supports MSI files only. "
            f"For other file types, use a version-first strategy (api_github, "
            f"api_json, web_scrape) or ensure the file is an MSI installer."
        )

    return discovered, file_path, sha256, headers

validate_config

validate_config(app_config: dict[str, Any]) -> list[str]

Validate url_download strategy configuration.

Checks for required fields and correct types without making network calls.

Parameters:

Name Type Description Default
app_config dict[str, Any]

The app configuration from the recipe.

required

Returns:

Type Description
list[str]

List of error messages (empty if valid).

Source code in napt/discovery/url_download.py
def validate_config(self, app_config: dict[str, Any]) -> list[str]:
    """Validate url_download strategy configuration.

    Checks for required fields and correct types without making network calls.

    Args:
        app_config: The app configuration from the recipe.

    Returns:
        List of error messages (empty if valid).

    """
    errors = []
    source = app_config.get("source", {})

    # Check required fields
    if "url" not in source:
        errors.append("Missing required field: source.url")
    elif not isinstance(source["url"], str):
        errors.append("source.url must be a string")
    elif not source["url"].strip():
        errors.append("source.url cannot be empty")

    # Version extraction is now auto-detected by file extension
    # No version configuration validation needed

    return errors

napt.discovery.web_scrape

Web scraping discovery strategy for NAPT.

This is a VERSION-FIRST strategy that scrapes vendor download pages to find download links and extract version information from those links. This enables version discovery for vendors that don't provide APIs or static URLs.

Key Advantages:

  • Discovers versions from vendor download pages
  • Works for vendors without APIs or GitHub releases
  • Version-first caching (can skip downloads when version unchanged)
  • Supports both CSS selectors (recommended) and regex (fallback)
  • No dependency on HTML structure stability (with good selectors)
  • Handles relative and absolute URLs automatically

Supported Link Finding:

  • CSS selectors: Modern, robust, recommended approach
  • Regex patterns: Fallback for edge cases or when CSS won't work

Version Extraction:

  • Extract version from the discovered download URL using regex
  • Support for captured groups with formatting
  • Transform version numbers (e.g., "2501" -> "25.01")

Use Cases:

  • Vendors with download pages listing multiple versions (7-Zip, etc.)
  • Legacy software without modern APIs
  • Small vendors with simple download pages
  • When GitHub releases and JSON APIs aren't available
Recipe Configuration
source:
  strategy: web_scrape
  page_url: "https://www.7-zip.org/download.html"
  link_selector: 'a[href$="-x64.msi"]'        # CSS (recommended)
  version_pattern: "7z(\d{2})(\d{2})-x64"   # Extract from URL
  version_format: "{0}.{1}"                    # Transform to "25.01"
Alternative with regex
source:
  strategy: web_scrape
  page_url: "https://vendor.com/downloads"
  link_pattern: 'href="(/files/app-v[0-9.]+-x64\.msi)"'
  version_pattern: "app-v([0-9.]+)-x64"

Configuration Fields:

  • page_url (str, required): URL of the page to scrape for download links
  • link_selector (str, optional): CSS selector to find download link. Recommended approach. Example: 'a[href$=".msi"]' finds links ending with .msi
  • link_pattern (str, optional): Regex pattern as fallback when CSS won't work. Must have one capture group for the URL. Example: 'href="([^"]*.msi)"'
  • version_pattern (str, required): Regex pattern to extract version from the discovered URL. Use capture groups to extract version parts. Example: "app-(\d+.\d+)" or "7z(\d{2})(\d{2})"
  • version_format (str, optional): Python format string to combine captured groups. Use {0}, {1}, etc. for groups. Example: "{0}.{1}" transforms captures "25", "01" into "25.01". Defaults to "{0}" (first capture group only).

Error Handling:

  • ValueError: Missing or invalid configuration fields
  • RuntimeError: Page download failures, selector/pattern not found
  • Errors are chained with 'from err' for better debugging

Finding CSS Selectors:

Use browser DevTools:

1. Open download page in Chrome/Edge/Firefox
2. Right-click download link -> Inspect
3. Right-click highlighted element -> Copy -> Copy selector
4. Simplify selector (e.g., 'a[href$=".msi"]' instead of complex nth-child)

Common CSS Patterns:

  • 'a[href$=".msi"]' - Links ending with .msi
  • 'a[href*="x64"]' - Links containing "x64"
  • 'a.download' - Links with class="download"
  • 'a[href$="-x64.msi"]:first-of-type' - First matching link
Example

In a recipe YAML:

apps:
  - name: "7-Zip"
    id: "napt-7zip"
    source:
      strategy: web_scrape
      page_url: "https://www.7-zip.org/download.html"
      link_selector: 'a[href$="-x64.msi"]'
      version_pattern: "7z(\d{2})(\d{2})-x64"
      version_format: "{0}.{1}"

From Python (version-first approach):

from napt.discovery.web_scrape import WebScrapeStrategy
from napt.download import download_file

strategy = WebScrapeStrategy()
app_config = {
    "source": {
        "page_url": "https://www.7-zip.org/download.html",
        "link_selector": 'a[href$="-x64.msi"]',
        "version_pattern": "7z(\d{2})(\d{2})-x64",
        "version_format": "{0}.{1}",
    }
}

# Get version WITHOUT downloading installer
version_info = strategy.get_version_info(app_config)
print(f"Latest version: {version_info.version}")

# Download only if needed
if need_to_download:
    result = download_file(
        version_info.download_url, Path("./downloads/my-app")
    )
    print(f"Downloaded to {result.file_path}")

From Python (using core orchestration):

from pathlib import Path
from napt.core import discover_recipe

# Automatically uses version-first optimization
result = discover_recipe(Path("recipe.yaml"), Path("./downloads"))
print(f"Version {result.version} at {result.file_path}")

Note
  • Version discovery via web scraping (no installer download required)
  • Core orchestration automatically skips download if version unchanged
  • CSS selectors are recommended (more robust than regex)
  • Use browser DevTools to find selectors easily
  • Selector should match exactly one link (first match is used)
  • BeautifulSoup4 required for CSS selectors
  • Regex fallback works without BeautifulSoup

WebScrapeStrategy

Discovery strategy for web scraping download pages.

Configuration example
source:
  strategy: web_scrape
  page_url: "https://vendor.com/download.html"
  link_selector: 'a[href$=".msi"]'
  version_pattern: "app-v([0-9.]+)"
Source code in napt/discovery/web_scrape.py
class WebScrapeStrategy:
    """Discovery strategy for web scraping download pages.

    Configuration example:
        ```yaml
        source:
          strategy: web_scrape
          page_url: "https://vendor.com/download.html"
          link_selector: 'a[href$=".msi"]'
          version_pattern: "app-v([0-9.]+)"
        ```
    """

    def get_version_info(
        self,
        app_config: dict[str, Any],
    ) -> VersionInfo:
        """Scrape download page for version and URL without downloading
        (version-first path).

        This method scrapes an HTML page, finds a download link using CSS selector
        or regex, extracts the version from that link, and returns version info.
        If the version matches cached state, the download can be skipped entirely.

        Args:
            app_config: App configuration containing source.page_url,
                source.link_selector or source.link_pattern, and
                source.version_pattern.

        Returns:
            Version info with version string, download URL, and
                source name.

        Raises:
            ValueError: If required config fields are missing, invalid, or if
                selectors/patterns don't match anything.
            RuntimeError: If page download fails (chained with 'from err').

        Example:
            Scrape 7-Zip download page:
                ```python
                strategy = WebScrapeStrategy()
                config = {
                    "source": {
                        "page_url": "https://www.7-zip.org/download.html",
                        "link_selector": 'a[href$="-x64.msi"]',
                        "version_pattern": "7z(\\d{2})(\\d{2})-x64",
                        "version_format": "{0}.{1}"
                    }
                }
                version_info = strategy.get_version_info(config)
                # version_info.version returns: '25.01'
                ```

        """
        from napt.logging import get_global_logger

        logger = get_global_logger()
        # Validate configuration
        source = app_config.get("source", {})
        page_url = source.get("page_url")
        if not page_url:
            raise ConfigError(
                "web_scrape strategy requires 'source.page_url' in config"
            )

        link_selector = source.get("link_selector")
        link_pattern = source.get("link_pattern")

        if not link_selector and not link_pattern:
            raise ConfigError(
                "web_scrape strategy requires either 'source.link_selector' or "
                "'source.link_pattern' in config"
            )

        version_pattern = source.get("version_pattern")
        if not version_pattern:
            raise ConfigError(
                "web_scrape strategy requires 'source.version_pattern' in config"
            )

        version_format = source.get("version_format", "{0}")

        logger.verbose("DISCOVERY", "Strategy: web_scrape (version-first)")
        logger.verbose("DISCOVERY", f"Page URL: {page_url}")
        if link_selector:
            logger.verbose("DISCOVERY", f"Link selector (CSS): {link_selector}")
        if link_pattern:
            logger.verbose("DISCOVERY", f"Link pattern (regex): {link_pattern}")
        logger.verbose("DISCOVERY", f"Version pattern: {version_pattern}")

        # Download the HTML page
        logger.verbose("DISCOVERY", f"Fetching page: {page_url}")
        try:
            response = requests.get(page_url, timeout=30)
            response.raise_for_status()
        except requests.exceptions.HTTPError as err:
            raise NetworkError(
                f"Failed to fetch page: {response.status_code} {response.reason}"
            ) from err
        except requests.exceptions.RequestException as err:
            raise NetworkError(f"Failed to fetch page: {err}") from err

        html_content = response.text
        logger.verbose("DISCOVERY", f"Page fetched ({len(html_content)} bytes)")

        # Find download link using CSS selector or regex
        download_url = None

        if link_selector:
            # Use CSS selector with BeautifulSoup4
            soup = BeautifulSoup(html_content, "html.parser")
            element = soup.select_one(link_selector)

            if not element:
                raise ConfigError(
                    f"CSS selector {link_selector!r} did not match any elements on page"
                )

            # Get href attribute
            href = element.get("href")
            if not href:
                raise ConfigError(
                    f"Element matched by {link_selector!r} has no href attribute"
                )

            logger.verbose("DISCOVERY", f"Found link via CSS: {href}")

            # Build absolute URL
            download_url = urljoin(page_url, href)

        elif link_pattern:
            # Use regex fallback
            try:
                pattern = re.compile(link_pattern)
                match = pattern.search(html_content)

                if not match:
                    raise ConfigError(
                        f"Regex pattern {link_pattern!r} did not match anything on page"
                    )

                # Get first capture group or full match
                if pattern.groups > 0:
                    href = match.group(1)
                else:
                    href = match.group(0)

                logger.verbose("DISCOVERY", f"Found link via regex: {href}")

                # Build absolute URL
                download_url = urljoin(page_url, href)

            except re.error as err:
                raise ConfigError(
                    f"Invalid link_pattern regex: {link_pattern!r}"
                ) from err

        logger.verbose("DISCOVERY", f"Download URL: {download_url}")

        # Extract version from the download URL
        try:
            version_regex = re.compile(version_pattern)
            match = version_regex.search(download_url)

            if not match:
                raise ConfigError(
                    f"Version pattern {version_pattern!r} did not match "
                    f"URL {download_url!r}"
                )

            # Get captured groups
            groups = match.groups()

            if not groups:
                # No capture groups, use full match
                version_str = match.group(0)
            else:
                # Format using captured groups
                try:
                    version_str = version_format.format(*groups)
                except (IndexError, KeyError) as err:
                    raise ConfigError(
                        f"version_format {version_format!r} failed with "
                        f"groups {groups}: {err}"
                    ) from err

        except re.error as err:
            raise ConfigError(
                f"Invalid version_pattern regex: {version_pattern!r}"
            ) from err

        logger.verbose("DISCOVERY", f"Extracted version: {version_str}")

        return VersionInfo(
            version=version_str,
            download_url=download_url,
            source="web_scrape",
        )

    def validate_config(self, app_config: dict[str, Any]) -> list[str]:
        """Validate web_scrape strategy configuration.

        Checks for required fields and correct types without making network calls.

        Args:
            app_config: The app configuration from the recipe.

        Returns:
            List of error messages (empty if valid).

        """
        errors = []
        source = app_config.get("source", {})

        # Check page_url
        if "page_url" not in source:
            errors.append("Missing required field: source.page_url")
        elif not isinstance(source["page_url"], str):
            errors.append("source.page_url must be a string")
        elif not source["page_url"].strip():
            errors.append("source.page_url cannot be empty")

        # Check that at least one link finding method is provided
        link_selector = source.get("link_selector")
        link_pattern = source.get("link_pattern")

        if not link_selector and not link_pattern:
            errors.append(
                "Missing required field: must provide either "
                "source.link_selector or source.link_pattern"
            )

        # Validate link_selector if provided
        if link_selector:
            if not isinstance(link_selector, str):
                errors.append("source.link_selector must be a string")
            elif not link_selector.strip():
                errors.append("source.link_selector cannot be empty")
            else:
                # Try to validate CSS selector syntax
                try:
                    # Test if selector is parseable
                    soup = BeautifulSoup("<html></html>", "html.parser")
                    soup.select_one(link_selector)  # Will raise if invalid
                except Exception as err:
                    errors.append(f"Invalid CSS selector: {err}")

        # Validate link_pattern if provided
        if link_pattern:
            if not isinstance(link_pattern, str):
                errors.append("source.link_pattern must be a string")
            elif not link_pattern.strip():
                errors.append("source.link_pattern cannot be empty")
            else:
                # Validate regex compiles
                try:
                    re.compile(link_pattern)
                except re.error as err:
                    errors.append(f"Invalid link_pattern regex: {err}")

        # Check version_pattern
        if "version_pattern" not in source:
            errors.append("Missing required field: source.version_pattern")
        elif not isinstance(source["version_pattern"], str):
            errors.append("source.version_pattern must be a string")
        elif not source["version_pattern"].strip():
            errors.append("source.version_pattern cannot be empty")
        else:
            # Validate regex compiles
            try:
                re.compile(source["version_pattern"])
            except re.error as err:
                errors.append(f"Invalid version_pattern regex: {err}")

        # Validate version_format if provided
        if "version_format" in source:
            if not isinstance(source["version_format"], str):
                errors.append("source.version_format must be a string")
            elif not source["version_format"].strip():
                errors.append("source.version_format cannot be empty")

        return errors

get_version_info

get_version_info(app_config: dict[str, Any]) -> VersionInfo

Scrape download page for version and URL without downloading (version-first path).

This method scrapes an HTML page, finds a download link using CSS selector or regex, extracts the version from that link, and returns version info. If the version matches cached state, the download can be skipped entirely.

Parameters:

Name Type Description Default
app_config dict[str, Any]

App configuration containing source.page_url, source.link_selector or source.link_pattern, and source.version_pattern.

required

Returns:

Type Description
VersionInfo

Version info with version string, download URL, and source name.

Raises:

Type Description
ValueError

If required config fields are missing, invalid, or if selectors/patterns don't match anything.

RuntimeError

If page download fails (chained with 'from err').

Example

Scrape 7-Zip download page:

strategy = WebScrapeStrategy()
config = {
    "source": {
        "page_url": "https://www.7-zip.org/download.html",
        "link_selector": 'a[href$="-x64.msi"]',
        "version_pattern": "7z(\d{2})(\d{2})-x64",
        "version_format": "{0}.{1}"
    }
}
version_info = strategy.get_version_info(config)
# version_info.version returns: '25.01'

Source code in napt/discovery/web_scrape.py
def get_version_info(
    self,
    app_config: dict[str, Any],
) -> VersionInfo:
    """Scrape download page for version and URL without downloading
    (version-first path).

    This method scrapes an HTML page, finds a download link using CSS selector
    or regex, extracts the version from that link, and returns version info.
    If the version matches cached state, the download can be skipped entirely.

    Args:
        app_config: App configuration containing source.page_url,
            source.link_selector or source.link_pattern, and
            source.version_pattern.

    Returns:
        Version info with version string, download URL, and
            source name.

    Raises:
        ValueError: If required config fields are missing, invalid, or if
            selectors/patterns don't match anything.
        RuntimeError: If page download fails (chained with 'from err').

    Example:
        Scrape 7-Zip download page:
            ```python
            strategy = WebScrapeStrategy()
            config = {
                "source": {
                    "page_url": "https://www.7-zip.org/download.html",
                    "link_selector": 'a[href$="-x64.msi"]',
                    "version_pattern": "7z(\\d{2})(\\d{2})-x64",
                    "version_format": "{0}.{1}"
                }
            }
            version_info = strategy.get_version_info(config)
            # version_info.version returns: '25.01'
            ```

    """
    from napt.logging import get_global_logger

    logger = get_global_logger()
    # Validate configuration
    source = app_config.get("source", {})
    page_url = source.get("page_url")
    if not page_url:
        raise ConfigError(
            "web_scrape strategy requires 'source.page_url' in config"
        )

    link_selector = source.get("link_selector")
    link_pattern = source.get("link_pattern")

    if not link_selector and not link_pattern:
        raise ConfigError(
            "web_scrape strategy requires either 'source.link_selector' or "
            "'source.link_pattern' in config"
        )

    version_pattern = source.get("version_pattern")
    if not version_pattern:
        raise ConfigError(
            "web_scrape strategy requires 'source.version_pattern' in config"
        )

    version_format = source.get("version_format", "{0}")

    logger.verbose("DISCOVERY", "Strategy: web_scrape (version-first)")
    logger.verbose("DISCOVERY", f"Page URL: {page_url}")
    if link_selector:
        logger.verbose("DISCOVERY", f"Link selector (CSS): {link_selector}")
    if link_pattern:
        logger.verbose("DISCOVERY", f"Link pattern (regex): {link_pattern}")
    logger.verbose("DISCOVERY", f"Version pattern: {version_pattern}")

    # Download the HTML page
    logger.verbose("DISCOVERY", f"Fetching page: {page_url}")
    try:
        response = requests.get(page_url, timeout=30)
        response.raise_for_status()
    except requests.exceptions.HTTPError as err:
        raise NetworkError(
            f"Failed to fetch page: {response.status_code} {response.reason}"
        ) from err
    except requests.exceptions.RequestException as err:
        raise NetworkError(f"Failed to fetch page: {err}") from err

    html_content = response.text
    logger.verbose("DISCOVERY", f"Page fetched ({len(html_content)} bytes)")

    # Find download link using CSS selector or regex
    download_url = None

    if link_selector:
        # Use CSS selector with BeautifulSoup4
        soup = BeautifulSoup(html_content, "html.parser")
        element = soup.select_one(link_selector)

        if not element:
            raise ConfigError(
                f"CSS selector {link_selector!r} did not match any elements on page"
            )

        # Get href attribute
        href = element.get("href")
        if not href:
            raise ConfigError(
                f"Element matched by {link_selector!r} has no href attribute"
            )

        logger.verbose("DISCOVERY", f"Found link via CSS: {href}")

        # Build absolute URL
        download_url = urljoin(page_url, href)

    elif link_pattern:
        # Use regex fallback
        try:
            pattern = re.compile(link_pattern)
            match = pattern.search(html_content)

            if not match:
                raise ConfigError(
                    f"Regex pattern {link_pattern!r} did not match anything on page"
                )

            # Get first capture group or full match
            if pattern.groups > 0:
                href = match.group(1)
            else:
                href = match.group(0)

            logger.verbose("DISCOVERY", f"Found link via regex: {href}")

            # Build absolute URL
            download_url = urljoin(page_url, href)

        except re.error as err:
            raise ConfigError(
                f"Invalid link_pattern regex: {link_pattern!r}"
            ) from err

    logger.verbose("DISCOVERY", f"Download URL: {download_url}")

    # Extract version from the download URL
    try:
        version_regex = re.compile(version_pattern)
        match = version_regex.search(download_url)

        if not match:
            raise ConfigError(
                f"Version pattern {version_pattern!r} did not match "
                f"URL {download_url!r}"
            )

        # Get captured groups
        groups = match.groups()

        if not groups:
            # No capture groups, use full match
            version_str = match.group(0)
        else:
            # Format using captured groups
            try:
                version_str = version_format.format(*groups)
            except (IndexError, KeyError) as err:
                raise ConfigError(
                    f"version_format {version_format!r} failed with "
                    f"groups {groups}: {err}"
                ) from err

    except re.error as err:
        raise ConfigError(
            f"Invalid version_pattern regex: {version_pattern!r}"
        ) from err

    logger.verbose("DISCOVERY", f"Extracted version: {version_str}")

    return VersionInfo(
        version=version_str,
        download_url=download_url,
        source="web_scrape",
    )

validate_config

validate_config(app_config: dict[str, Any]) -> list[str]

Validate web_scrape strategy configuration.

Checks for required fields and correct types without making network calls.

Parameters:

Name Type Description Default
app_config dict[str, Any]

The app configuration from the recipe.

required

Returns:

Type Description
list[str]

List of error messages (empty if valid).

Source code in napt/discovery/web_scrape.py
def validate_config(self, app_config: dict[str, Any]) -> list[str]:
    """Validate web_scrape strategy configuration.

    Checks for required fields and correct types without making network calls.

    Args:
        app_config: The app configuration from the recipe.

    Returns:
        List of error messages (empty if valid).

    """
    errors = []
    source = app_config.get("source", {})

    # Check page_url
    if "page_url" not in source:
        errors.append("Missing required field: source.page_url")
    elif not isinstance(source["page_url"], str):
        errors.append("source.page_url must be a string")
    elif not source["page_url"].strip():
        errors.append("source.page_url cannot be empty")

    # Check that at least one link finding method is provided
    link_selector = source.get("link_selector")
    link_pattern = source.get("link_pattern")

    if not link_selector and not link_pattern:
        errors.append(
            "Missing required field: must provide either "
            "source.link_selector or source.link_pattern"
        )

    # Validate link_selector if provided
    if link_selector:
        if not isinstance(link_selector, str):
            errors.append("source.link_selector must be a string")
        elif not link_selector.strip():
            errors.append("source.link_selector cannot be empty")
        else:
            # Try to validate CSS selector syntax
            try:
                # Test if selector is parseable
                soup = BeautifulSoup("<html></html>", "html.parser")
                soup.select_one(link_selector)  # Will raise if invalid
            except Exception as err:
                errors.append(f"Invalid CSS selector: {err}")

    # Validate link_pattern if provided
    if link_pattern:
        if not isinstance(link_pattern, str):
            errors.append("source.link_pattern must be a string")
        elif not link_pattern.strip():
            errors.append("source.link_pattern cannot be empty")
        else:
            # Validate regex compiles
            try:
                re.compile(link_pattern)
            except re.error as err:
                errors.append(f"Invalid link_pattern regex: {err}")

    # Check version_pattern
    if "version_pattern" not in source:
        errors.append("Missing required field: source.version_pattern")
    elif not isinstance(source["version_pattern"], str):
        errors.append("source.version_pattern must be a string")
    elif not source["version_pattern"].strip():
        errors.append("source.version_pattern cannot be empty")
    else:
        # Validate regex compiles
        try:
            re.compile(source["version_pattern"])
        except re.error as err:
            errors.append(f"Invalid version_pattern regex: {err}")

    # Validate version_format if provided
    if "version_format" in source:
        if not isinstance(source["version_format"], str):
            errors.append("source.version_format must be a string")
        elif not source["version_format"].strip():
            errors.append("source.version_format cannot be empty")

    return errors

napt.discovery.api_github

GitHub API discovery strategy for NAPT.

This is a VERSION-FIRST strategy that queries the GitHub API to get version and download URL WITHOUT downloading the installer. This enables fast version checks and efficient caching.

Key Advantages:

  • Fast version discovery (GitHub API call ~100ms)
  • Can skip downloads entirely when version unchanged
  • Direct access to latest releases via stable GitHub API
  • Version extraction from Git tags (semantic versioning friendly)
  • Asset pattern matching for multi-platform releases
  • Optional authentication for higher rate limits
  • No web scraping required
  • Ideal for CI/CD with scheduled checks

Supported Version Extraction:

  • Tag-based: Extract version from release tag names
    • Supports named capture groups: (?P...)
    • Default pattern strips "v" prefix: v1.2.3 -> 1.2.3
    • Falls back to full tag if no pattern match

Use Cases:

  • Open-source projects (Git, VS Code, Node.js, etc.)
  • Projects with GitHub releases (Firefox, Chrome alternatives)
  • Vendors who publish installers as release assets
  • Projects with semantic versioned tags
  • CI/CD pipelines with frequent version checks
Recipe Configuration
source:
    strategy: api_github
    repo: "git-for-windows/git"                    # Required: owner/repo
    asset_pattern: "Git-.*-64-bit\.exe$"          # Required: regex for asset
    version_pattern: "v?([0-9.]+)"                 # Optional: version extraction
    prerelease: false                              # Optional: include prereleases
    token: "${GITHUB_TOKEN}"                       # Optional: auth token

Configuration Fields:

  • repo (str, required): GitHub repository in "owner/name" format (e.g., "git-for-windows/git")
  • asset_pattern (str, required): Regular expression to match asset filename. If multiple assets match, the first match is used. Example: ".*-x64.msi$" matches assets ending with "-x64.msi"
  • version_pattern (str, optional): Regular expression to extract version from the release tag name. Use a named capture group (?P...) or the entire match. Default: "v?([0-9.]+)" strips optional "v" prefix. Example: "release-([0-9.]+)" for tags like "release-1.2.3".
    • prerelease (bool, optional): If True, include pre-release versions. If False (default), only stable releases are considered. Uses GitHub's prerelease flag.
    • token (str, optional): GitHub personal access token for authentication. Increases rate limit from 60 to 5000 requests per hour. Can use environment variable substitution: "${GITHUB_TOKEN}". No special permissions needed for public repositories.

Error Handling:

  • ValueError: Missing or invalid configuration fields
  • RuntimeError: API failures, no releases, no matching assets
  • Errors are chained with 'from err' for better debugging

Rate Limits:

  • Unauthenticated: 60 requests/hour per IP
  • Authenticated: 5000 requests/hour per token
  • Tip: Use a token for production use or frequent checks
Example

In a recipe YAML:

apps:
  - name: "Git for Windows"
    id: "git"
    source:
      strategy: api_github
      repo: "git-for-windows/git"
      asset_pattern: "Git-.*-64-bit\.exe$"

From Python (version-first approach):

from napt.discovery.api_github import ApiGithubStrategy
from napt.download import download_file

strategy = ApiGithubStrategy()
app_config = {
    "source": {
        "repo": "git-for-windows/git",
        "asset_pattern": ".*-64-bit\.exe$",
    }
}

# Get version WITHOUT downloading
version_info = strategy.get_version_info(app_config)
print(f"Latest version: {version_info.version}")

# Download only if needed
if need_to_download:
    result = download_file(
        version_info.download_url, Path("./downloads/my-app")
    )
    print(f"Downloaded to {result.file_path}")

From Python (using core orchestration):

from pathlib import Path
from napt.core import discover_recipe

# Automatically uses version-first optimization
result = discover_recipe(Path("recipe.yaml"), Path("./downloads"))
print(f"Version {result.version} at {result.file_path}")

Note

Version discovery via API only (no download required). Core orchestration automatically skips download if version unchanged. The GitHub API is stable and well-documented. Releases are fetched in order (latest first). Asset matching is case-sensitive by default (use (?i) for case-insensitive). Consider url_download if you need a direct download URL instead.

ApiGithubStrategy

Discovery strategy for GitHub releases.

Configuration example

source: strategy: api_github repo: "owner/repository" asset_pattern: ".*.msi$" version_pattern: "v?([0-9.]+)" prerelease: false token: "${GITHUB_TOKEN}"

Source code in napt/discovery/api_github.py
class ApiGithubStrategy:
    """Discovery strategy for GitHub releases.

    Configuration example:
        source:
          strategy: api_github
          repo: "owner/repository"
          asset_pattern: ".*\\.msi$"
          version_pattern: "v?([0-9.]+)"
          prerelease: false
          token: "${GITHUB_TOKEN}"
    """

    def get_version_info(
        self,
        app_config: dict[str, Any],
    ) -> VersionInfo:
        """Fetch latest release from GitHub API without downloading
        (version-first path).

        This method queries the GitHub API for the latest release and extracts
        the version from the tag name and the download URL from matching assets.
        If the version matches cached state, the download can be skipped entirely.

        Args:
            app_config: App configuration containing source.repo and
                optional fields.

        Returns:
            Version info with version string, download URL, and
                source name.

        Raises:
            ValueError: If required config fields are missing, invalid, or if
                no matching assets are found.
            RuntimeError: If API call fails or release has no assets.

        Example:
            Get version from GitHub releases:
                ```python
                strategy = ApiGithubStrategy()
                config = {
                    "source": {
                        "repo": "owner/repo",
                        "asset_pattern": ".*\\.msi$"
                    }
                }
                version_info = strategy.get_version_info(config)
                # version_info.version returns: '1.0.0'
                ```

        """
        from napt.logging import get_global_logger

        logger = get_global_logger()
        # Validate configuration
        source = app_config.get("source", {})
        repo = source.get("repo")
        if not repo:
            raise ConfigError("api_github strategy requires 'source.repo' in config")

        # Validate repo format
        if "/" not in repo or repo.count("/") != 1:
            raise ConfigError(
                f"Invalid repo format: {repo!r}. Expected 'owner/repository'"
            )

        # Optional configuration
        asset_pattern = source.get("asset_pattern")
        if not asset_pattern:
            raise ConfigError(
                "api_github strategy requires 'source.asset_pattern' in config"
            )

        version_pattern = source.get("version_pattern", r"v?([0-9.]+)")
        prerelease = source.get("prerelease", False)
        token = source.get("token")

        # Expand environment variables in token (e.g., ${GITHUB_TOKEN})
        if token:
            if token.startswith("${") and token.endswith("}"):
                env_var = token[2:-1]
                token = os.environ.get(env_var)
                if not token:
                    logger.verbose(
                        "DISCOVERY",
                        f"Warning: Environment variable {env_var} not set",
                    )

        logger.verbose("DISCOVERY", "Strategy: api_github (version-first)")
        logger.verbose("DISCOVERY", f"Repository: {repo}")
        logger.verbose("DISCOVERY", f"Version pattern: {version_pattern}")
        if asset_pattern:
            logger.verbose("DISCOVERY", f"Asset pattern: {asset_pattern}")
        if prerelease:
            logger.verbose("DISCOVERY", "Including pre-releases")

        # Fetch latest release from GitHub API
        api_url = f"https://api.github.com/repos/{repo}/releases/latest"
        headers = {
            "Accept": "application/vnd.github+json",
            "X-GitHub-Api-Version": "2022-11-28",
        }

        # Add authentication if token provided
        if token:
            headers["Authorization"] = f"token {token}"
            logger.verbose("DISCOVERY", "Using authenticated API request")

        logger.verbose("DISCOVERY", f"Fetching release from: {api_url}")

        try:
            response = requests.get(api_url, headers=headers, timeout=30)
            response.raise_for_status()
        except requests.exceptions.HTTPError as err:
            if response.status_code == 404:
                raise NetworkError(
                    f"Repository {repo!r} not found or has no releases"
                ) from err
            elif response.status_code == 403:
                raise NetworkError(
                    f"GitHub API rate limit exceeded. Consider using a token. "
                    f"Status: {response.status_code}"
                ) from err
            else:
                raise NetworkError(
                    f"GitHub API request failed: {response.status_code} "
                    f"{response.reason}"
                ) from err
        except requests.exceptions.RequestException as err:
            raise NetworkError(f"Failed to fetch GitHub release: {err}") from err

        release_data = response.json()

        # Check if this is a prerelease and we don't want those
        if release_data.get("prerelease", False) and not prerelease:
            raise NetworkError(
                f"Latest release is a pre-release and prerelease=false. "
                f"Tag: {release_data.get('tag_name')}"
            )

        # Extract version from tag name
        tag_name = release_data.get("tag_name", "")
        if not tag_name:
            raise NetworkError("Release has no tag_name field")

        logger.verbose("DISCOVERY", f"Release tag: {tag_name}")

        try:
            pattern = re.compile(version_pattern)
            match = pattern.search(tag_name)
            if not match:
                raise ConfigError(
                    f"Version pattern {version_pattern!r} did not match "
                    f"tag {tag_name!r}"
                )

            # Try to get named capture group 'version' first, else use group 1,
            # else full match
            if "version" in pattern.groupindex:
                version_str = match.group("version")
            elif pattern.groups > 0:
                version_str = match.group(1)
            else:
                version_str = match.group(0)

        except re.error as err:
            raise ConfigError(
                f"Invalid version_pattern regex: {version_pattern!r}"
            ) from err
        except (ValueError, IndexError) as err:
            raise ConfigError(
                f"Failed to extract version from tag {tag_name!r} "
                f"using pattern {version_pattern!r}: {err}"
            ) from err

        logger.verbose("DISCOVERY", f"Extracted version: {version_str}")

        # Find matching asset
        assets = release_data.get("assets", [])
        if not assets:
            raise NetworkError(
                f"Release {tag_name} has no assets. "
                f"Check if assets were uploaded to the release."
            )

        logger.verbose("DISCOVERY", f"Release has {len(assets)} asset(s)")

        # Match asset by pattern
        matched_asset = None
        try:
            pattern = re.compile(asset_pattern)
        except re.error as err:
            raise ConfigError(
                f"Invalid asset_pattern regex: {asset_pattern!r}"
            ) from err

        for asset in assets:
            asset_name = asset.get("name", "")
            if pattern.search(asset_name):
                matched_asset = asset
                logger.verbose("DISCOVERY", f"Matched asset: {asset_name}")
                break

        if not matched_asset:
            available = [a.get("name", "(unnamed)") for a in assets]
            raise ConfigError(
                f"No assets matched pattern {asset_pattern!r}. "
                f"Available assets: {', '.join(available)}"
            )

        # Get download URL
        download_url = matched_asset.get("browser_download_url")
        if not download_url:
            raise NetworkError(f"Asset {matched_asset.get('name')} has no download URL")

        logger.verbose("DISCOVERY", f"Download URL: {download_url}")

        return VersionInfo(
            version=version_str,
            download_url=download_url,
            source="api_github",
        )

    def validate_config(self, app_config: dict[str, Any]) -> list[str]:
        """Validate api_github strategy configuration.

        Checks for required fields and correct types without making network calls.

        Args:
            app_config: The app configuration from the recipe.

        Returns:
            List of error messages (empty if valid).

        """
        errors = []
        source = app_config.get("source", {})

        # Check required fields
        if "repo" not in source:
            errors.append("Missing required field: source.repo")
        elif not isinstance(source["repo"], str):
            errors.append("source.repo must be a string")
        elif not source["repo"].strip():
            errors.append("source.repo cannot be empty")
        else:
            # Validate repo format
            repo = source["repo"]
            if repo.count("/") != 1:
                errors.append(
                    "source.repo must be in format 'owner/repo' (e.g., 'git/git')"
                )

        if "asset_pattern" not in source:
            errors.append("Missing required field: source.asset_pattern")
        elif not isinstance(source["asset_pattern"], str):
            errors.append("source.asset_pattern must be a string")
        elif not source["asset_pattern"].strip():
            errors.append("source.asset_pattern cannot be empty")
        else:
            # Validate regex pattern syntax
            pattern = source["asset_pattern"]
            import re

            try:
                re.compile(pattern)
            except re.error as err:
                errors.append(f"Invalid asset_pattern regex: {err}")

        # Optional fields validation
        if "version_pattern" in source:
            if not isinstance(source["version_pattern"], str):
                errors.append("source.version_pattern must be a string")
            else:
                pattern = source["version_pattern"]
                import re

                try:
                    re.compile(pattern)
                except re.error as err:
                    errors.append(f"Invalid version_pattern regex: {err}")

        return errors

get_version_info

get_version_info(app_config: dict[str, Any]) -> VersionInfo

Fetch latest release from GitHub API without downloading (version-first path).

This method queries the GitHub API for the latest release and extracts the version from the tag name and the download URL from matching assets. If the version matches cached state, the download can be skipped entirely.

Parameters:

Name Type Description Default
app_config dict[str, Any]

App configuration containing source.repo and optional fields.

required

Returns:

Type Description
VersionInfo

Version info with version string, download URL, and source name.

Raises:

Type Description
ValueError

If required config fields are missing, invalid, or if no matching assets are found.

RuntimeError

If API call fails or release has no assets.

Example

Get version from GitHub releases:

strategy = ApiGithubStrategy()
config = {
    "source": {
        "repo": "owner/repo",
        "asset_pattern": ".*\.msi$"
    }
}
version_info = strategy.get_version_info(config)
# version_info.version returns: '1.0.0'

Source code in napt/discovery/api_github.py
def get_version_info(
    self,
    app_config: dict[str, Any],
) -> VersionInfo:
    """Fetch latest release from GitHub API without downloading
    (version-first path).

    This method queries the GitHub API for the latest release and extracts
    the version from the tag name and the download URL from matching assets.
    If the version matches cached state, the download can be skipped entirely.

    Args:
        app_config: App configuration containing source.repo and
            optional fields.

    Returns:
        Version info with version string, download URL, and
            source name.

    Raises:
        ValueError: If required config fields are missing, invalid, or if
            no matching assets are found.
        RuntimeError: If API call fails or release has no assets.

    Example:
        Get version from GitHub releases:
            ```python
            strategy = ApiGithubStrategy()
            config = {
                "source": {
                    "repo": "owner/repo",
                    "asset_pattern": ".*\\.msi$"
                }
            }
            version_info = strategy.get_version_info(config)
            # version_info.version returns: '1.0.0'
            ```

    """
    from napt.logging import get_global_logger

    logger = get_global_logger()
    # Validate configuration
    source = app_config.get("source", {})
    repo = source.get("repo")
    if not repo:
        raise ConfigError("api_github strategy requires 'source.repo' in config")

    # Validate repo format
    if "/" not in repo or repo.count("/") != 1:
        raise ConfigError(
            f"Invalid repo format: {repo!r}. Expected 'owner/repository'"
        )

    # Optional configuration
    asset_pattern = source.get("asset_pattern")
    if not asset_pattern:
        raise ConfigError(
            "api_github strategy requires 'source.asset_pattern' in config"
        )

    version_pattern = source.get("version_pattern", r"v?([0-9.]+)")
    prerelease = source.get("prerelease", False)
    token = source.get("token")

    # Expand environment variables in token (e.g., ${GITHUB_TOKEN})
    if token:
        if token.startswith("${") and token.endswith("}"):
            env_var = token[2:-1]
            token = os.environ.get(env_var)
            if not token:
                logger.verbose(
                    "DISCOVERY",
                    f"Warning: Environment variable {env_var} not set",
                )

    logger.verbose("DISCOVERY", "Strategy: api_github (version-first)")
    logger.verbose("DISCOVERY", f"Repository: {repo}")
    logger.verbose("DISCOVERY", f"Version pattern: {version_pattern}")
    if asset_pattern:
        logger.verbose("DISCOVERY", f"Asset pattern: {asset_pattern}")
    if prerelease:
        logger.verbose("DISCOVERY", "Including pre-releases")

    # Fetch latest release from GitHub API
    api_url = f"https://api.github.com/repos/{repo}/releases/latest"
    headers = {
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    }

    # Add authentication if token provided
    if token:
        headers["Authorization"] = f"token {token}"
        logger.verbose("DISCOVERY", "Using authenticated API request")

    logger.verbose("DISCOVERY", f"Fetching release from: {api_url}")

    try:
        response = requests.get(api_url, headers=headers, timeout=30)
        response.raise_for_status()
    except requests.exceptions.HTTPError as err:
        if response.status_code == 404:
            raise NetworkError(
                f"Repository {repo!r} not found or has no releases"
            ) from err
        elif response.status_code == 403:
            raise NetworkError(
                f"GitHub API rate limit exceeded. Consider using a token. "
                f"Status: {response.status_code}"
            ) from err
        else:
            raise NetworkError(
                f"GitHub API request failed: {response.status_code} "
                f"{response.reason}"
            ) from err
    except requests.exceptions.RequestException as err:
        raise NetworkError(f"Failed to fetch GitHub release: {err}") from err

    release_data = response.json()

    # Check if this is a prerelease and we don't want those
    if release_data.get("prerelease", False) and not prerelease:
        raise NetworkError(
            f"Latest release is a pre-release and prerelease=false. "
            f"Tag: {release_data.get('tag_name')}"
        )

    # Extract version from tag name
    tag_name = release_data.get("tag_name", "")
    if not tag_name:
        raise NetworkError("Release has no tag_name field")

    logger.verbose("DISCOVERY", f"Release tag: {tag_name}")

    try:
        pattern = re.compile(version_pattern)
        match = pattern.search(tag_name)
        if not match:
            raise ConfigError(
                f"Version pattern {version_pattern!r} did not match "
                f"tag {tag_name!r}"
            )

        # Try to get named capture group 'version' first, else use group 1,
        # else full match
        if "version" in pattern.groupindex:
            version_str = match.group("version")
        elif pattern.groups > 0:
            version_str = match.group(1)
        else:
            version_str = match.group(0)

    except re.error as err:
        raise ConfigError(
            f"Invalid version_pattern regex: {version_pattern!r}"
        ) from err
    except (ValueError, IndexError) as err:
        raise ConfigError(
            f"Failed to extract version from tag {tag_name!r} "
            f"using pattern {version_pattern!r}: {err}"
        ) from err

    logger.verbose("DISCOVERY", f"Extracted version: {version_str}")

    # Find matching asset
    assets = release_data.get("assets", [])
    if not assets:
        raise NetworkError(
            f"Release {tag_name} has no assets. "
            f"Check if assets were uploaded to the release."
        )

    logger.verbose("DISCOVERY", f"Release has {len(assets)} asset(s)")

    # Match asset by pattern
    matched_asset = None
    try:
        pattern = re.compile(asset_pattern)
    except re.error as err:
        raise ConfigError(
            f"Invalid asset_pattern regex: {asset_pattern!r}"
        ) from err

    for asset in assets:
        asset_name = asset.get("name", "")
        if pattern.search(asset_name):
            matched_asset = asset
            logger.verbose("DISCOVERY", f"Matched asset: {asset_name}")
            break

    if not matched_asset:
        available = [a.get("name", "(unnamed)") for a in assets]
        raise ConfigError(
            f"No assets matched pattern {asset_pattern!r}. "
            f"Available assets: {', '.join(available)}"
        )

    # Get download URL
    download_url = matched_asset.get("browser_download_url")
    if not download_url:
        raise NetworkError(f"Asset {matched_asset.get('name')} has no download URL")

    logger.verbose("DISCOVERY", f"Download URL: {download_url}")

    return VersionInfo(
        version=version_str,
        download_url=download_url,
        source="api_github",
    )

validate_config

validate_config(app_config: dict[str, Any]) -> list[str]

Validate api_github strategy configuration.

Checks for required fields and correct types without making network calls.

Parameters:

Name Type Description Default
app_config dict[str, Any]

The app configuration from the recipe.

required

Returns:

Type Description
list[str]

List of error messages (empty if valid).

Source code in napt/discovery/api_github.py
def validate_config(self, app_config: dict[str, Any]) -> list[str]:
    """Validate api_github strategy configuration.

    Checks for required fields and correct types without making network calls.

    Args:
        app_config: The app configuration from the recipe.

    Returns:
        List of error messages (empty if valid).

    """
    errors = []
    source = app_config.get("source", {})

    # Check required fields
    if "repo" not in source:
        errors.append("Missing required field: source.repo")
    elif not isinstance(source["repo"], str):
        errors.append("source.repo must be a string")
    elif not source["repo"].strip():
        errors.append("source.repo cannot be empty")
    else:
        # Validate repo format
        repo = source["repo"]
        if repo.count("/") != 1:
            errors.append(
                "source.repo must be in format 'owner/repo' (e.g., 'git/git')"
            )

    if "asset_pattern" not in source:
        errors.append("Missing required field: source.asset_pattern")
    elif not isinstance(source["asset_pattern"], str):
        errors.append("source.asset_pattern must be a string")
    elif not source["asset_pattern"].strip():
        errors.append("source.asset_pattern cannot be empty")
    else:
        # Validate regex pattern syntax
        pattern = source["asset_pattern"]
        import re

        try:
            re.compile(pattern)
        except re.error as err:
            errors.append(f"Invalid asset_pattern regex: {err}")

    # Optional fields validation
    if "version_pattern" in source:
        if not isinstance(source["version_pattern"], str):
            errors.append("source.version_pattern must be a string")
        else:
            pattern = source["version_pattern"]
            import re

            try:
                re.compile(pattern)
            except re.error as err:
                errors.append(f"Invalid version_pattern regex: {err}")

    return errors

napt.discovery.api_json

JSON API discovery strategy for NAPT.

This is a VERSION-FIRST strategy that queries JSON API endpoints to get version and download URL WITHOUT downloading the installer. This enables fast version checks and efficient caching.

Key Advantages:

  • Fast version discovery (API call ~100ms)
  • Can skip downloads entirely when version unchanged
  • Direct API access for version and download URL
  • Support for complex JSON structures with JSONPath
  • Custom headers for authentication
  • Support for GET and POST requests
  • No file parsing required
  • Ideal for CI/CD with scheduled checks

Supported Features:

  • JSONPath navigation for nested structures
  • Array indexing and filtering
  • Custom HTTP headers (Authorization, etc.)
  • POST requests with JSON body
  • Environment variable expansion in values

Use Cases:

  • Vendors with JSON APIs (Microsoft, Mozilla, etc.)
  • Cloud services with version endpoints
  • CDNs that provide metadata APIs
  • Applications with update check APIs
  • APIs requiring authentication or custom headers
  • CI/CD pipelines with frequent version checks
Recipe Configuration
source:
    strategy: api_json
    api_url: "https://vendor.com/api/latest"
    version_path: "version"                      # JSONPath to version
    download_url_path: "download_url"            # JSONPath to URL
    method: "GET"                                # Optional: GET or POST
    headers:                                     # Optional: custom headers
    Authorization: "Bearer ${API_TOKEN}"
    Accept: "application/json"
    body:                                        # Optional: POST body
    platform: "windows"
    arch: "x64"
    timeout: 30                                  # Optional: timeout in seconds

Configuration Fields:

  • api_url (str, required): API endpoint URL that returns JSON with version and download information
  • version_path (str, required): JSONPath expression to extract version from the API response. Examples: "version", "release.version", "data.version"
  • download_url_path (str, required): JSONPath expression to extract download URL from the API response. Examples: "download_url", "assets.url", "platforms.windows.x64"
  • method (str, optional): HTTP method to use. Either "GET" or "POST". Default is "GET"
  • headers (dict, optional): Custom HTTP headers to send with the request. Useful for authentication or setting Accept headers. Values support environment variable expansion. Example: {"Authorization": "Bearer ${API_TOKEN}"}
  • body (dict, optional): Request body for POST requests. Sent as JSON. Only used when method="POST". Example: {"platform": "windows", "arch": "x64"}
    • timeout (int, optional): Request timeout in seconds. Default is 30.

JSONPath Syntax:

  • Simple paths: "version", "release.version"
  • Array indexing: "data.version", "releases.version"
  • Nested paths: "data.latest.download.url", "response.assets.browser_download_url"

Error Handling:

  • ValueError: Missing or invalid configuration, invalid JSONPath, path not found
  • RuntimeError: API failures, invalid JSON response
  • Errors are chained with 'from err' for better debugging
Example

In a recipe YAML (simple API):

apps:
  - name: "My App"
    id: "my-app"
    source:
      strategy: api_json
      api_url: "https://api.vendor.com/latest"
      version_path: "version"
      download_url_path: "download_url"

In a recipe YAML (nested structure):

apps:
  - name: "My App"
    id: "my-app"
    source:
      strategy: api_json
      api_url: "https://api.vendor.com/releases"
      version_path: "stable.version"
      download_url_path: "stable.platforms.windows.x64"
      headers:
        Authorization: "Bearer ${API_TOKEN}"

From Python (version-first approach):

from napt.discovery.api_json import ApiJsonStrategy
from napt.download import download_file

strategy = ApiJsonStrategy()
app_config = {
    "source": {
        "api_url": "https://api.vendor.com/latest",
        "version_path": "version",
        "download_url_path": "download_url",
    }
}

# Get version WITHOUT downloading
version_info = strategy.get_version_info(app_config)
print(f"Latest version: {version_info.version}")

# Download only if needed
if need_to_download:
    result = download_file(
        version_info.download_url, Path("./downloads/my-app")
    )
    print(f"Downloaded to {result.file_path}")

From Python (using core orchestration):

from pathlib import Path
from napt.core import discover_recipe

# Automatically uses version-first optimization
result = discover_recipe(Path("recipe.yaml"), Path("./downloads"))
print(f"Version {result.version} at {result.file_path}")

Note
  • Version discovery via API only (no download required)
  • Core orchestration automatically skips download if version unchanged
  • JSONPath uses jsonpath-ng library for robust parsing
  • Environment variable expansion works in headers and other string values
  • POST body is sent as JSON (Content-Type: application/json)
  • Timeout defaults to 30 seconds to prevent hanging on slow APIs

ApiJsonStrategy

Discovery strategy for JSON API endpoints.

Configuration example

source: strategy: api_json api_url: "https://api.vendor.com/latest" version_path: "version" download_url_path: "download_url" method: "GET" headers: Authorization: "Bearer ${API_TOKEN}"

Source code in napt/discovery/api_json.py
class ApiJsonStrategy:
    """Discovery strategy for JSON API endpoints.

    Configuration example:
        source:
          strategy: api_json
          api_url: "https://api.vendor.com/latest"
          version_path: "version"
          download_url_path: "download_url"
          method: "GET"
          headers:
            Authorization: "Bearer ${API_TOKEN}"
    """

    def get_version_info(
        self,
        app_config: dict[str, Any],
    ) -> VersionInfo:
        """Query JSON API for version and download URL without downloading
        (version-first path).

        This method calls a JSON API, extracts version and download URL using
        JSONPath expressions. If the version matches cached state, the download
        can be skipped entirely.

        Args:
            app_config: App configuration containing source.api_url,
                source.version_path, and source.download_url_path.

        Returns:
            Version info with version string, download URL, and
                source name.

        Raises:
            ValueError: If required config fields are missing, invalid, or if
                JSONPath expressions don't match anything in the response.
            RuntimeError: If API call fails (chained with 'from err').

        Example:
            Get version info from JSON API:
                ```python
                strategy = ApiJsonStrategy()
                config = {
                    "source": {
                        "api_url": "https://api.vendor.com/latest",
                        "version_path": "version",
                        "download_url_path": "download_url"
                    }
                }
                version_info = strategy.get_version_info(config)
                # version_info.version returns: '1.0.0'
                ```

        """
        from napt.logging import get_global_logger

        logger = get_global_logger()
        # Validate configuration
        source = app_config.get("source", {})
        api_url = source.get("api_url")
        if not api_url:
            raise ConfigError("api_json strategy requires 'source.api_url' in config")

        version_path = source.get("version_path")
        if not version_path:
            raise ConfigError(
                "api_json strategy requires 'source.version_path' in config"
            )

        download_url_path = source.get("download_url_path")
        if not download_url_path:
            raise ConfigError(
                "api_json strategy requires 'source.download_url_path' in config"
            )

        # Optional configuration
        method = source.get("method", "GET").upper()
        if method not in ("GET", "POST"):
            raise ConfigError(f"Invalid method: {method!r}. Must be 'GET' or 'POST'")

        headers = source.get("headers", {})
        body = source.get("body", {})
        timeout = source.get("timeout", 30)

        logger.verbose("DISCOVERY", "Strategy: api_json (version-first)")
        logger.verbose("DISCOVERY", f"API URL: {api_url}")
        logger.verbose("DISCOVERY", f"Method: {method}")
        logger.verbose("DISCOVERY", f"Version path: {version_path}")
        logger.verbose("DISCOVERY", f"Download URL path: {download_url_path}")

        # Expand environment variables in headers
        expanded_headers = {}
        for key, value in headers.items():
            if (
                isinstance(value, str)
                and value.startswith("${")
                and value.endswith("}")
            ):
                env_var = value[2:-1]
                env_value = os.environ.get(env_var)
                if not env_value:
                    logger.verbose(
                        "DISCOVERY",
                        f"Warning: Environment variable {env_var} not set",
                    )
                else:
                    expanded_headers[key] = env_value
            else:
                expanded_headers[key] = value

        # Make API request
        logger.verbose("DISCOVERY", f"Calling API: {method} {api_url}")
        try:
            if method == "GET":
                response = requests.get(
                    api_url, headers=expanded_headers, timeout=timeout
                )
            else:  # POST
                response = requests.post(
                    api_url,
                    headers=expanded_headers,
                    json=body,
                    timeout=timeout,
                )
            response.raise_for_status()
        except requests.exceptions.HTTPError as err:
            raise NetworkError(
                f"API request failed: {response.status_code} {response.reason}"
            ) from err
        except requests.exceptions.RequestException as err:
            raise NetworkError(f"Failed to call API: {err}") from err

        logger.verbose("DISCOVERY", f"API response: {response.status_code} OK")

        # Parse JSON response
        try:
            json_data = response.json()
        except json.JSONDecodeError as err:
            raise NetworkError(
                f"Invalid JSON response from API. Response: {response.text[:200]}"
            ) from err

        logger.debug("DISCOVERY", f"JSON response: {json.dumps(json_data, indent=2)}")

        # Extract version using JSONPath
        logger.verbose("DISCOVERY", f"Extracting version from path: {version_path}")
        try:
            version_expr = jsonpath_parse(version_path)
            version_matches = version_expr.find(json_data)

            if not version_matches:
                raise ConfigError(
                    f"Version path {version_path!r} did not match anything "
                    f"in API response"
                )

            version_str = str(version_matches[0].value)
        except Exception as err:
            if isinstance(err, ConfigError):
                raise
            raise ConfigError(
                f"Failed to extract version using path {version_path!r}: {err}"
            ) from err

        logger.verbose("DISCOVERY", f"Extracted version: {version_str}")

        # Extract download URL using JSONPath
        logger.verbose(
            "DISCOVERY", f"Extracting download URL from path: {download_url_path}"
        )
        try:
            url_expr = jsonpath_parse(download_url_path)
            url_matches = url_expr.find(json_data)

            if not url_matches:
                raise ConfigError(
                    f"Download URL path {download_url_path!r} did not match "
                    f"anything in API response"
                )

            download_url = str(url_matches[0].value)
        except Exception as err:
            if isinstance(err, ConfigError):
                raise
            raise ConfigError(
                f"Failed to extract download URL using path "
                f"{download_url_path!r}: {err}"
            ) from err

        logger.verbose("DISCOVERY", f"Download URL: {download_url}")

        return VersionInfo(
            version=version_str,
            download_url=download_url,
            source="api_json",
        )

    def validate_config(self, app_config: dict[str, Any]) -> list[str]:
        """Validate api_json strategy configuration.

        Checks for required fields and correct types without making network calls.

        Args:
            app_config: The app configuration from the recipe.

        Returns:
            List of error messages (empty if valid).

        """
        errors = []
        source = app_config.get("source", {})

        # Check required fields
        if "api_url" not in source:
            errors.append("Missing required field: source.api_url")
        elif not isinstance(source["api_url"], str):
            errors.append("source.api_url must be a string")
        elif not source["api_url"].strip():
            errors.append("source.api_url cannot be empty")

        if "version_path" not in source:
            errors.append("Missing required field: source.version_path")
        elif not isinstance(source["version_path"], str):
            errors.append("source.version_path must be a string")
        elif not source["version_path"].strip():
            errors.append("source.version_path cannot be empty")
        else:
            # Validate JSONPath syntax
            from jsonpath_ng import parse as jsonpath_parse

            try:
                jsonpath_parse(source["version_path"])
            except Exception as err:
                errors.append(f"Invalid version_path JSONPath: {err}")

        if "download_url_path" not in source:
            errors.append("Missing required field: source.download_url_path")
        elif not isinstance(source["download_url_path"], str):
            errors.append("source.download_url_path must be a string")
        elif not source["download_url_path"].strip():
            errors.append("source.download_url_path cannot be empty")
        else:
            # Validate JSONPath syntax
            from jsonpath_ng import parse as jsonpath_parse

            try:
                jsonpath_parse(source["download_url_path"])
            except Exception as err:
                errors.append(f"Invalid download_url_path JSONPath: {err}")

        # Optional fields validation
        if "method" in source:
            method = source["method"]
            if not isinstance(method, str):
                errors.append("source.method must be a string")
            elif method.upper() not in ["GET", "POST"]:
                errors.append("source.method must be 'GET' or 'POST'")

        if "headers" in source and not isinstance(source["headers"], dict):
            errors.append("source.headers must be a dictionary")

        if "body" in source and not isinstance(source["body"], dict):
            errors.append("source.body must be a dictionary")

        return errors

get_version_info

get_version_info(app_config: dict[str, Any]) -> VersionInfo

Query JSON API for version and download URL without downloading (version-first path).

This method calls a JSON API, extracts version and download URL using JSONPath expressions. If the version matches cached state, the download can be skipped entirely.

Parameters:

Name Type Description Default
app_config dict[str, Any]

App configuration containing source.api_url, source.version_path, and source.download_url_path.

required

Returns:

Type Description
VersionInfo

Version info with version string, download URL, and source name.

Raises:

Type Description
ValueError

If required config fields are missing, invalid, or if JSONPath expressions don't match anything in the response.

RuntimeError

If API call fails (chained with 'from err').

Example

Get version info from JSON API:

strategy = ApiJsonStrategy()
config = {
    "source": {
        "api_url": "https://api.vendor.com/latest",
        "version_path": "version",
        "download_url_path": "download_url"
    }
}
version_info = strategy.get_version_info(config)
# version_info.version returns: '1.0.0'

Source code in napt/discovery/api_json.py
def get_version_info(
    self,
    app_config: dict[str, Any],
) -> VersionInfo:
    """Query JSON API for version and download URL without downloading
    (version-first path).

    This method calls a JSON API, extracts version and download URL using
    JSONPath expressions. If the version matches cached state, the download
    can be skipped entirely.

    Args:
        app_config: App configuration containing source.api_url,
            source.version_path, and source.download_url_path.

    Returns:
        Version info with version string, download URL, and
            source name.

    Raises:
        ValueError: If required config fields are missing, invalid, or if
            JSONPath expressions don't match anything in the response.
        RuntimeError: If API call fails (chained with 'from err').

    Example:
        Get version info from JSON API:
            ```python
            strategy = ApiJsonStrategy()
            config = {
                "source": {
                    "api_url": "https://api.vendor.com/latest",
                    "version_path": "version",
                    "download_url_path": "download_url"
                }
            }
            version_info = strategy.get_version_info(config)
            # version_info.version returns: '1.0.0'
            ```

    """
    from napt.logging import get_global_logger

    logger = get_global_logger()
    # Validate configuration
    source = app_config.get("source", {})
    api_url = source.get("api_url")
    if not api_url:
        raise ConfigError("api_json strategy requires 'source.api_url' in config")

    version_path = source.get("version_path")
    if not version_path:
        raise ConfigError(
            "api_json strategy requires 'source.version_path' in config"
        )

    download_url_path = source.get("download_url_path")
    if not download_url_path:
        raise ConfigError(
            "api_json strategy requires 'source.download_url_path' in config"
        )

    # Optional configuration
    method = source.get("method", "GET").upper()
    if method not in ("GET", "POST"):
        raise ConfigError(f"Invalid method: {method!r}. Must be 'GET' or 'POST'")

    headers = source.get("headers", {})
    body = source.get("body", {})
    timeout = source.get("timeout", 30)

    logger.verbose("DISCOVERY", "Strategy: api_json (version-first)")
    logger.verbose("DISCOVERY", f"API URL: {api_url}")
    logger.verbose("DISCOVERY", f"Method: {method}")
    logger.verbose("DISCOVERY", f"Version path: {version_path}")
    logger.verbose("DISCOVERY", f"Download URL path: {download_url_path}")

    # Expand environment variables in headers
    expanded_headers = {}
    for key, value in headers.items():
        if (
            isinstance(value, str)
            and value.startswith("${")
            and value.endswith("}")
        ):
            env_var = value[2:-1]
            env_value = os.environ.get(env_var)
            if not env_value:
                logger.verbose(
                    "DISCOVERY",
                    f"Warning: Environment variable {env_var} not set",
                )
            else:
                expanded_headers[key] = env_value
        else:
            expanded_headers[key] = value

    # Make API request
    logger.verbose("DISCOVERY", f"Calling API: {method} {api_url}")
    try:
        if method == "GET":
            response = requests.get(
                api_url, headers=expanded_headers, timeout=timeout
            )
        else:  # POST
            response = requests.post(
                api_url,
                headers=expanded_headers,
                json=body,
                timeout=timeout,
            )
        response.raise_for_status()
    except requests.exceptions.HTTPError as err:
        raise NetworkError(
            f"API request failed: {response.status_code} {response.reason}"
        ) from err
    except requests.exceptions.RequestException as err:
        raise NetworkError(f"Failed to call API: {err}") from err

    logger.verbose("DISCOVERY", f"API response: {response.status_code} OK")

    # Parse JSON response
    try:
        json_data = response.json()
    except json.JSONDecodeError as err:
        raise NetworkError(
            f"Invalid JSON response from API. Response: {response.text[:200]}"
        ) from err

    logger.debug("DISCOVERY", f"JSON response: {json.dumps(json_data, indent=2)}")

    # Extract version using JSONPath
    logger.verbose("DISCOVERY", f"Extracting version from path: {version_path}")
    try:
        version_expr = jsonpath_parse(version_path)
        version_matches = version_expr.find(json_data)

        if not version_matches:
            raise ConfigError(
                f"Version path {version_path!r} did not match anything "
                f"in API response"
            )

        version_str = str(version_matches[0].value)
    except Exception as err:
        if isinstance(err, ConfigError):
            raise
        raise ConfigError(
            f"Failed to extract version using path {version_path!r}: {err}"
        ) from err

    logger.verbose("DISCOVERY", f"Extracted version: {version_str}")

    # Extract download URL using JSONPath
    logger.verbose(
        "DISCOVERY", f"Extracting download URL from path: {download_url_path}"
    )
    try:
        url_expr = jsonpath_parse(download_url_path)
        url_matches = url_expr.find(json_data)

        if not url_matches:
            raise ConfigError(
                f"Download URL path {download_url_path!r} did not match "
                f"anything in API response"
            )

        download_url = str(url_matches[0].value)
    except Exception as err:
        if isinstance(err, ConfigError):
            raise
        raise ConfigError(
            f"Failed to extract download URL using path "
            f"{download_url_path!r}: {err}"
        ) from err

    logger.verbose("DISCOVERY", f"Download URL: {download_url}")

    return VersionInfo(
        version=version_str,
        download_url=download_url,
        source="api_json",
    )

validate_config

validate_config(app_config: dict[str, Any]) -> list[str]

Validate api_json strategy configuration.

Checks for required fields and correct types without making network calls.

Parameters:

Name Type Description Default
app_config dict[str, Any]

The app configuration from the recipe.

required

Returns:

Type Description
list[str]

List of error messages (empty if valid).

Source code in napt/discovery/api_json.py
def validate_config(self, app_config: dict[str, Any]) -> list[str]:
    """Validate api_json strategy configuration.

    Checks for required fields and correct types without making network calls.

    Args:
        app_config: The app configuration from the recipe.

    Returns:
        List of error messages (empty if valid).

    """
    errors = []
    source = app_config.get("source", {})

    # Check required fields
    if "api_url" not in source:
        errors.append("Missing required field: source.api_url")
    elif not isinstance(source["api_url"], str):
        errors.append("source.api_url must be a string")
    elif not source["api_url"].strip():
        errors.append("source.api_url cannot be empty")

    if "version_path" not in source:
        errors.append("Missing required field: source.version_path")
    elif not isinstance(source["version_path"], str):
        errors.append("source.version_path must be a string")
    elif not source["version_path"].strip():
        errors.append("source.version_path cannot be empty")
    else:
        # Validate JSONPath syntax
        from jsonpath_ng import parse as jsonpath_parse

        try:
            jsonpath_parse(source["version_path"])
        except Exception as err:
            errors.append(f"Invalid version_path JSONPath: {err}")

    if "download_url_path" not in source:
        errors.append("Missing required field: source.download_url_path")
    elif not isinstance(source["download_url_path"], str):
        errors.append("source.download_url_path must be a string")
    elif not source["download_url_path"].strip():
        errors.append("source.download_url_path cannot be empty")
    else:
        # Validate JSONPath syntax
        from jsonpath_ng import parse as jsonpath_parse

        try:
            jsonpath_parse(source["download_url_path"])
        except Exception as err:
            errors.append(f"Invalid download_url_path JSONPath: {err}")

    # Optional fields validation
    if "method" in source:
        method = source["method"]
        if not isinstance(method, str):
            errors.append("source.method must be a string")
        elif method.upper() not in ["GET", "POST"]:
            errors.append("source.method must be 'GET' or 'POST'")

    if "headers" in source and not isinstance(source["headers"], dict):
        errors.append("source.headers must be a dictionary")

    if "body" in source and not isinstance(source["body"], dict):
        errors.append("source.body must be a dictionary")

    return errors