Some formulae/casks contain duplicate checkable URLs or contain
URLs that become duplicates after `#preprocess_url` is used. With
the former, a formula's `stable` and `head` URLs can be the same if
they both use the Git repository. With the latter, a formula's
`homepage` and `stable` URLs are different but `#preprocess_url` can
return the same Git repository URL for each (which can also be
a duplicate of the `head` URL).
The `fabric-completion` formula is a worst case scenario, as it
contains both of these conditions but the repository has no tags.
By default, livecheck would needlessly check the repository three
times (i.e., no versions are found so livecheck tries the next URL,
which happens to be the same URL).
This commit avoids duplicate URLs in `#checkable_urls` and keeps
track of checked URLs in `#latest_version` to avoid a duplicate
caused by `#preprocess_url`. This should effectively resolve the
issue of checking the same URL more than once for a given
formula/cask. Checking the same URL only once across all the
formulae/casks in a given livecheck run will be handled in a later
commit (though I've done most of the groundwork already in previous
PRs).
This modifies cask-related livecheck strategies to allow passing a
regex into a `strategy` block, when appropriate. These strategies
were outliers that explicitly rejected a regex even if a `strategy`
block was used, forcing any regex to be inlined in the `strategy`
block (instead of being defined using `#regex`).
With these changes, all `strategy` blocks will be able to accept a
regex, further simplifying the mental model. This also helps to
better align the various `find_versions` and `versions_from_*`
methods across strategies.
Some Apache URLs use a `filename` query string parameter instead of
`path` and the `Apache` strategy isn't applied. The parameter value
is the same as `path` (i.e., a path to a file), so this updates the
strategy's `URL_MATCH_REGEX` to handle this URL format as well.
Valid `strategy` block return types currently vary between
strategies. Some only accept a string whereas others accept a string
or array of strings. [`strategy` blocks also accept a `nil` return
(to simplify early returns) but this was already standardized across
strategies.]
While some strategies only identify one version by default (where a
string is an appropriate return type), it could be that a strategy
block identifies more than one version. In this situation, the
strategy would need to be modified to accept (and work with) an
array from a `strategy` block.
Rather than waiting for this to become a problem, this modifies all
strategies to standardize on allowing `strategy` blocks to return a
string or array of strings (even if only one of these is currently
used in practice). Standardizing valid return types helps to further
simplify the mental model for `strategy` blocks and reduce cognitive
load.
This commit extracts related logic from `#find_versions` into
methods like `#versions_from_content`, which is conceptually similar
to `PageMatch#page_matches` (renamed to `#versions_from_content`
for consistency). This allows us to write tests for the related code
without having to make network requests (or stub them) at this point.
In general, this also helps to better align the structure of
strategies and how the various `#find_versions` methods work with
versions.
There's still more planned work to be done here but this is a step
in the right direction.
- For some of these I changed `context` to `describe` as it fit better
rather than contriving a "when", "with" or "without", or massively
restructuring the tests.