The default redirection maximum for `curl` is 50 but we should use
something more reasonable in livecheck. It's rare but a misconfigured
server with an endless redirection loop will hit the 50 redirection
limit. Unfortunately, we've encountered this in the wild (e.g., the
server for `getmail` and `memtester` endlessly redirects), so it's
not an idle concern. This commit basically adds `--max-redirs 5` to
`Livecheck::Strategy::DEFAULT_CURL_ARGS` to enforce a more reasonable
redirection maximum.
To be clear, the `max_iterations` logic in `#parse_curl_output`
(which was previously found in `Strategy#page_content`) doesn't
restrict the number of redirections that `curl` follows. At the point
the `curl` output is being parsed, the requests have already been
made and `max_iterations` simply restricts the number of responses
`#parse_curl_output` is willing to parse. If we use `--max-redirs`
and properly set `max_iterations` to `max-redirs + 1`, we shouldn't
encounter the "Too many redirects" error in `#parse_curl_output`.
Currently, only `Livecheck::Strategy::PAGE_HEADERS_CURL_ARGS` uses
the `--silent` option and `PAGE_CONTENT_CURL_ARGS` does not (though
there's no intention behind this omission). However, the
`#page_content` method should also use the `--silent` flag, to
prevent progress bar text (`#=#=#`, etc.) from appearing in output.
This is an issue because the regex that's used to identify `curl`
error messages in `stderr` (`^curl:.+$/`) will fail if leading
progress bar text is present. This leads to an ambiguous "cURL
failed without a detectable error" message instead of the actual
error message(s) from `curl`.
This commit addresses the issue by adding `--silent` to
`Livecheck::Strategy::DEFAULT_CURL_ARGS`, which both
`PAGE_HEADERS_CURL_ARGS` and `PAGE_CONTENT_CURL_ARGS` inherit.
The existing regex wasn't able to match errors like:
curl: option --something: is unknown
Additionally, the existing approach wouldn't capture multi-line
errors, whereas this captures all the `curl:` lines from `stderr`.
Valid `strategy` block return types currently vary between
strategies. Some only accept a string whereas others accept a string
or array of strings. [`strategy` blocks also accept a `nil` return
(to simplify early returns) but this was already standardized across
strategies.]
While some strategies only identify one version by default (where a
string is an appropriate return type), it could be that a strategy
block identifies more than one version. In this situation, the
strategy would need to be modified to accept (and work with) an
array from a `strategy` block.
Rather than waiting for this to become a problem, this modifies all
strategies to standardize on allowing `strategy` blocks to return a
string or array of strings (even if only one of these is currently
used in practice). Standardizing valid return types helps to further
simplify the mental model for `strategy` blocks and reduce cognitive
load.
This commit extracts related logic from `#find_versions` into
methods like `#versions_from_content`, which is conceptually similar
to `PageMatch#page_matches` (renamed to `#versions_from_content`
for consistency). This allows us to write tests for the related code
without having to make network requests (or stub them) at this point.
In general, this also helps to better align the structure of
strategies and how the various `#find_versions` methods work with
versions.
There's still more planned work to be done here but this is a step
in the right direction.
Up to this point, we've had to rely on making `Strategy` constants
private to ensure that the only available constants are strategies.
With the current setup, the existence of a constant that's not a
strategy would break `Strategy#strategies` and
`Livecheck#livecheck_strategy_names`.
Instead, we can achieve the same goal by skipping over constants
that aren't a class. Other than saving us from having to make these
constants private, this is necessary to be able to create a
`Strategy` constant that can be used in all strategies.
The simple approach here caches all header or body content from
responses, so memory usage continually grows with each fetch. This
becomes more of a notable issue with long livecheck runs (e.g.,
`--tap homebrew/core`).
Instead, we should only cache the header/body for URLs that we know
will be fetched more than once in a given run. Being able to
determine which URLs will be fetched more than once requires
structural changes within livecheck strategies, so this will take a
bit of work to implement.
I've been working on this off and on and I'll introduce a more
sophisticated method of livecheck-wide caching in a later PR. In the
interim time, it's best to remove this caching behavior until I've
finished working on an approach that provides benefits (reducing
duplicate fetches) while minimizing detriments (increased memory
usage).