discovery
napt.discovery.base
Discovery strategy protocol, registry, and shared helpers.
A discovery strategy answers a single question: "what is the latest version of this app, and where can it be downloaded from?" Strategies return that answer as a RemoteVersion dataclass. They do not download files or touch the cache themselves; the orchestrator does.
Built-in strategies
- api_github: queries the GitHub releases API for the latest tag.
- api_json: extracts version and download URL from a JSON endpoint.
- web_scrape: parses a vendor download page for both fields.
The fourth flow (url_download) is not a registered strategy. It
downloads a fixed URL and extracts the version from the file itself,
which is a different shape than the strategies in this module. The
discovery orchestrator dispatches to that flow directly when a recipe
uses strategy: url_download.
Design Philosophy
- Strategies are
typing.Protocoltypes. Implementations are matched structurally; no inheritance is required. - Strategies are pure functions of configuration. They have no state, no I/O of files, and no awareness of the cache.
- Registration is a side effect of importing each strategy module.
- The resolve_with_cache helper turns a RemoteVersion into a StrategyResult by checking the cache and downloading if needed. Strategies don't call it themselves; the orchestrator does.
Example
Adding a new strategy to the codebase:
from napt.discovery.base import (
RemoteVersion, register_strategy,
)
class GitlabReleasesStrategy:
def discover(self, app_config):
# Query GitLab API and parse the response...
return RemoteVersion(
version="1.2.3",
download_url="https://gitlab.example.com/.../installer.msi",
source="gitlab_releases",
)
def validate_config(self, app_config):
errors = []
if "project" not in app_config.get("discovery", {}):
errors.append("Missing required field: discovery.project")
return errors
register_strategy("gitlab_releases", GitlabReleasesStrategy)
RemoteVersion
dataclass
Version and download URL discovered from a remote source.
Returned by every DiscoveryStrategy implementation. The orchestrator passes this to resolve_with_cache to decide whether the file needs to be re-downloaded.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
str
|
Raw version string extracted from the remote source
(for example, |
download_url |
str
|
URL the installer can be fetched from. |
source |
str
|
Name of the strategy that produced this result, used
for logging and result reporting (for example, |
Source code in napt/discovery/base.py
StrategyResult
dataclass
Resolved discovery result, ready to be saved to state.
Returned by both the version-first flow (via resolve_with_cache) and the url_download flow. Captures everything the orchestrator needs to update the state cache and build a public DiscoverResult.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
str
|
Version string for the resolved file. |
version_source |
str
|
Strategy name that produced this version
(for example, |
file_path |
Path
|
Path to the resolved installer on disk. This is either a freshly downloaded file or a previously cached file when the cache was reused. |
sha256 |
str
|
SHA-256 hex digest of the resolved file. |
headers |
dict[str, str]
|
HTTP response headers from the download. Empty when the
cache was reused without a network call. Used to persist
|
download_url |
str
|
URL the file came from. Stored in state so that future runs know where to re-fetch from if needed. |
cached |
bool
|
True when the file was reused from cache; False when it was downloaded. |
Source code in napt/discovery/base.py
DiscoveryStrategy
Bases: Protocol
Protocol for version discovery strategies.
A strategy queries a remote source (API, web page, etc.) and returns the latest version plus its download URL. Strategies do not download files, touch the cache, or write to disk. Those concerns belong to the orchestrator.
Implementations need only a discover and a validate_config
method with the signatures below.
Source code in napt/discovery/base.py
discover
Discovers the latest version and its download URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
Merged recipe configuration dict. |
required |
Returns:
| Type | Description |
|---|---|
RemoteVersion
|
Latest version, the URL it can be downloaded from, and the |
RemoteVersion
|
strategy's own name as the source identifier. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
On missing or invalid required configuration. |
NetworkError
|
On HTTP failures or version-extraction errors. |
Source code in napt/discovery/base.py
validate_config
Validates strategy-specific configuration fields without network calls.
Implementations should check field presence, types, and format only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
Merged recipe configuration dict. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
Human-readable error messages. Empty when configuration is valid. |
Source code in napt/discovery/base.py
register_strategy
Registers a discovery strategy by name in the global registry.
Strategies call this at module import time so they're available when the orchestrator looks them up. Registering the same name twice overwrites the previous entry (intentional, to allow test monkey-patching).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Strategy name. This is the value used in recipe YAML files
under |
required |
strategy_class
|
type[DiscoveryStrategy]
|
Class implementing DiscoveryStrategy. Type checkers verify protocol compliance statically. |
required |
Note
url_download is intentionally not registered here. It runs
through a separate code path in the orchestrator because it
downloads the file before it can determine the version, which
does not fit the version-first contract.
Source code in napt/discovery/base.py
get_strategy
Returns a discovery strategy instance by name from the registry.
Strategies are instantiated on-demand because they are stateless. The strategy's module must already be imported for registration to have happened.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Registered strategy name. Case-sensitive. |
required |
Returns:
| Type | Description |
|---|---|
DiscoveryStrategy
|
New instance of the requested strategy. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the name is not registered. The message lists the available strategies for troubleshooting. |
Source code in napt/discovery/base.py
resolve_with_cache
resolve_with_cache(
info: RemoteVersion,
app_config: dict[str, Any],
output_dir: Path,
cache: dict[str, Any] | None,
) -> StrategyResult
Resolves a RemoteVersion to a StrategyResult.
Implements the version-first fast path: when the discovered version
matches the cached version and the cached file still exists on disk,
the download is skipped entirely. Otherwise the file is downloaded
fresh from info.download_url.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
info
|
RemoteVersion
|
Version and download URL produced by a strategy's discover call. |
required |
app_config
|
dict[str, Any]
|
Merged recipe configuration. Used to read |
required |
output_dir
|
Path
|
Base directory to download into. Files land in
|
required |
cache
|
dict[str, Any] | None
|
Cached state for this recipe ( |
required |
Returns:
| Type | Description |
|---|---|
StrategyResult
|
Resolved version, file path, and download metadata. The |
StrategyResult
|
|
Raises:
| Type | Description |
|---|---|
NetworkError
|
On download failures. |
Source code in napt/discovery/base.py
napt.discovery.url_download
url_download discovery flow.
This module is intentionally not a
DiscoveryStrategy. The
strategies in napt.discovery.base produce a
RemoteVersion from configuration
alone (version-first). url_download cannot do that — it has no
remote endpoint to query for the version, so it must download the
installer and extract the version from the file's metadata. The
discovery orchestrator special-cases strategy: url_download and
dispatches to
run_url_download directly.
Cache Strategy
Uses HTTP conditional requests. If a previous run stored an ETag
or Last-Modified header in state, those are sent as
If-None-Match / If-Modified-Since on the next request. A
server response of HTTP 304 reuses the cached file without a
re-download. This is a different mechanism than the version-first
strategies, which compare version strings (no HTTP round-trip
required to detect "no change" beyond the initial discovery query).
Supported File Types
.msi— version is read from the MSI ProductVersion property.- Other extensions raise ConfigError. For non-MSI installers, use a version-first strategy.
run_url_download
run_url_download(
app_config: dict[str, Any],
output_dir: Path,
cache: dict[str, Any] | None = None,
) -> StrategyResult
Downloads a fixed URL and extracts the version from the resulting file.
Issues a conditional HTTP request when cache carries an ETag
or Last-Modified. On HTTP 304 the cached file is reused; otherwise
the fresh download is used. Either way, the version is extracted from
the file (MSI ProductVersion today).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
Merged recipe configuration dict containing
|
required |
output_dir
|
Path
|
Base directory to download into. The file lands
in |
required |
cache
|
dict[str, Any] | None
|
Cached state for this recipe ( |
None
|
Returns:
| Type | Description |
|---|---|
StrategyResult
|
Resolved version, file path, and download metadata. The |
StrategyResult
|
|
StrategyResult
|
previously downloaded file. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If |
NetworkError
|
On download or version-extraction failures. |
Source code in napt/discovery/url_download.py
validate_url_download_config
Validates url_download configuration fields.
Called by napt.validation.validate_config to compose the url_download field rules into the overall recipe validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
Merged recipe configuration dict. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
Human-readable error messages. Empty when configuration is valid. |
Source code in napt/discovery/url_download.py
napt.discovery.web_scrape
Web scraping discovery strategy.
Fetches a vendor download page, locates a download link, and extracts the version from that link's URL. Use this when a vendor has neither a JSON API nor a GitHub releases feed.
Recipe Example (CSS selector — recommended):
discovery:
strategy: web_scrape
page_url: "https://www.7-zip.org/download.html"
link_selector: 'a[href$="-x64.msi"]'
version_pattern: "7z(\\d{2})(\\d{2})-x64"
version_format: "{0}.{1}" # transforms ("25", "01") -> "25.01"
Recipe Example (regex fallback):
discovery:
strategy: web_scrape
page_url: "https://vendor.example.com/downloads"
link_pattern: 'href="(/files/app-v[0-9.]+-x64\\.msi)"'
version_pattern: "app-v([0-9.]+)-x64"
Configuration Fields
- page_url (required): URL of the page to scrape.
- link_selector (optional): CSS selector identifying the download
link's
<a>element. Recommended over regex. - link_pattern (optional): Regex with one capture group around
the link URL. Used when a CSS selector cannot pin the link down.
Exactly one of
link_selector/link_patternis required. - version_pattern (required): Regex applied to the discovered
link URL to extract the version. Capture groups are pulled out
and combined with
version_format. - version_format (optional, default
"{0}"): Python format string referencing capture groups by index ({0},{1}, ...). Use this when a single version field needs to be assembled from multiple captures.
Finding a CSS Selector
- Open the download page in Chrome / Edge / Firefox.
- Right-click the download link -> Inspect.
- Right-click the highlighted element -> Copy -> Copy selector.
- Simplify the result. Common shapes:
a[href$=".msi"](links ending in .msi)a[href*="x64"](links containing "x64")a.download(links withclass="download")
Note
The selector / pattern is expected to match exactly one link; the
first match is used. Relative URLs in the page are resolved against
page_url. CSS selector support requires BeautifulSoup4; the
regex fallback does not.
WebScrapeStrategy
Discovery strategy for scraping vendor download pages.
Source code in napt/discovery/web_scrape.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
discover
Discovers version and download URL by scraping a vendor page.
Fetches discovery.page_url, locates a download link with
either link_selector (CSS) or link_pattern (regex),
and extracts the version from the matched link using
version_pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
Merged recipe configuration dict containing
|
required |
Returns:
| Type | Description |
|---|---|
RemoteVersion
|
Discovered version, the matched link's URL, and |
RemoteVersion
|
|
Raises:
| Type | Description |
|---|---|
ConfigError
|
On missing required configuration or when a selector / pattern matches nothing. |
NetworkError
|
On page fetch failure. |
Source code in napt/discovery/web_scrape.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 | |
validate_config
Validate web_scrape strategy configuration.
Checks for required fields and correct types without making network calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
The app configuration from the recipe. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of error messages (empty if valid). |
Source code in napt/discovery/web_scrape.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
napt.discovery.api_github
GitHub releases discovery strategy.
Queries the GitHub releases API for the latest tag and the download URL
of a matching asset. The version comes from the release tag (parsed
with a regex); the download URL comes from the first asset whose
filename matches asset_pattern.
Recipe Example
discovery:
strategy: api_github
repo: "git-for-windows/git" # required, "owner/name"
asset_pattern: "Git-.*-64-bit\\.exe$" # required, regex on asset filename
version_pattern: "v?([0-9.]+)" # optional, default strips "v"
prerelease: false # optional, default false
token: "${GITHUB_TOKEN}" # optional, supports env expansion
Configuration Fields
- repo (required): GitHub repo as
"owner/name". - asset_pattern (required): Regex matched against asset filename.
First match wins. Case-sensitive by default; prefix with
(?i)for case-insensitive matching. - version_pattern (optional): Regex for extracting the version
from the release tag. Uses a named group
(?P<version>...)or capture group 1 if present, otherwise the full match. Default:v?([0-9.]+). - prerelease (optional, default false): When true, includes pre-release versions; otherwise the latest release must be stable.
- token (optional): GitHub personal access token. Raises the API
rate limit from 60 to 5000 requests/hour. Supports
${ENV_VAR}expansion. Public repos do not require any special permissions.
Note
GitHub returns the most recent release first. If no asset matches,
or the latest release is a pre-release while prerelease: false,
discovery raises an error rather than walking back through history.
ApiGithubStrategy
Discovery strategy for GitHub releases.
Source code in napt/discovery/api_github.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 | |
discover
Discovers the latest GitHub release version and asset download URL.
Queries the GitHub releases API for the latest release of the
configured repository. Extracts the version from the release tag
(via version_pattern) and the download URL from the first
asset matching asset_pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
Merged recipe configuration dict containing
|
required |
Returns:
| Type | Description |
|---|---|
RemoteVersion
|
Latest version, the matched asset's download URL, and |
RemoteVersion
|
|
Raises:
| Type | Description |
|---|---|
ConfigError
|
On missing or malformed required configuration, or when patterns do not match the release. |
NetworkError
|
On API failure, missing assets, or rejected pre-releases. |
Source code in napt/discovery/api_github.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
validate_config
Validate api_github strategy configuration.
Checks for required fields and correct types without making network calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
The app configuration from the recipe. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of error messages (empty if valid). |
Source code in napt/discovery/api_github.py
napt.discovery.api_json
JSON API discovery strategy.
Queries a JSON API endpoint for the latest version and download URL. Both fields are extracted from the response using JSONPath expressions.
Recipe Example
discovery:
strategy: api_json
api_url: "https://vendor.example.com/api/latest" # required
version_path: "version" # required, JSONPath
download_url_path: "download_url" # required, JSONPath
method: "GET" # optional, GET or POST
headers: # optional
Authorization: "Bearer ${API_TOKEN}"
Accept: "application/json"
body: # optional, POST only
platform: "windows"
arch: "x64"
timeout: 30 # optional, seconds
Nested response, with auth header:
Configuration Fields
- api_url (required): JSON endpoint URL.
- version_path (required): JSONPath expression locating the
version string in the response (e.g.
"version","release.version"). - download_url_path (required): JSONPath expression locating the installer download URL in the response.
- method (optional, default
"GET"):"GET"or"POST". - headers (optional): HTTP headers to send. Values support
${ENV_VAR}expansion. - body (optional): Dict sent as a JSON body. Only used when
method: POST. - timeout (optional, default 30): Request timeout in seconds.
Note
JSONPath uses the jsonpath-ng library. Environment-variable
expansion (${VAR}) is applied to string values in headers.
POST bodies are always sent as application/json.
ApiJsonStrategy
Discovery strategy for JSON API endpoints.
Source code in napt/discovery/api_json.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | |
discover
Discovers version and download URL from a JSON API endpoint.
Calls the configured api_url and extracts the version and
download URL using JSONPath expressions. The HTTP method,
headers, and body are configurable so the same strategy works
for GET and POST endpoints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
Merged recipe configuration dict containing
|
required |
Returns:
| Type | Description |
|---|---|
RemoteVersion
|
Discovered version, download URL, and |
RemoteVersion
|
the source identifier. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
On missing required configuration or when the JSONPath expressions do not match the response. |
NetworkError
|
On API request failure. |
Source code in napt/discovery/api_json.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 | |
validate_config
Validate api_json strategy configuration.
Checks for required fields and correct types without making network calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_config
|
dict[str, Any]
|
The app configuration from the recipe. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of error messages (empty if valid). |