When its `try_partial` argument is `true`, `#curl_download` makes a
`HEAD` request before downloading the file using `#curl`. Currently
`try_partial` defaults to `true`, so any `#curl_download` call that
doesn't explicitly specify `try_partial: false` will make a `HEAD`
request first. This can potentially involve several requests if the
URL redirects, so it can be a bit of unnecessary overhead when a
partial download isn't needed.
Partial downloads are generally only useful when we're working with
larger files, however there's currently only one place in brew where
`#curl_download` is used and this is the case:
`CurlDownloadStrategy`. The other `#curl_download` calls are fetching
smaller [text] files and don't need to support partial downloads.
This commit changes the default `try_partial` value to `false`,
making partial downloads opt-in rather than opt-out.
We want `try_partial` to continue to default to `true` in
`CurlDownloadStrategy` and there are various ways to accomplish this.
In this commit, I've chosen to update its `#initialize` method to
accept a `try_partial` argument that defaults to `true`, as this
value can also be used in classes that inherit from
`CurlDownloadStrategy` (e.g., `HomebrewCurlDownloadStrategy`). This
instance variable is passed to `#curl_download` in related methods,
effectively maintaining the previous `try_partial: true` value, while
also allowing this value to be overridden when necessary.
Other uses of `#curl_download` in brew are
`Formulary::FromUrlLoader#load_file` and
`Cask::CaskLoader::FromURILoader#load`, which did not provide a
`try_partial` argument but should have been using
`try_partial: false`. With the `try_partial: false` default in this
commit, these calls are now fine without a `try_partial` argument.
The only other use of `#curl_download` in brew is
`SPDX#download_latest_license_data!`. These calls were previously
using `try_partial: false` but we can now omit this argument with
the new `false` default (aligning with the above).
The default redirection maximum for `curl` is 50 but we should use
something more reasonable in livecheck. It's rare but a misconfigured
server with an endless redirection loop will hit the 50 redirection
limit. Unfortunately, we've encountered this in the wild (e.g., the
server for `getmail` and `memtester` endlessly redirects), so it's
not an idle concern. This commit basically adds `--max-redirs 5` to
`Livecheck::Strategy::DEFAULT_CURL_ARGS` to enforce a more reasonable
redirection maximum.
To be clear, the `max_iterations` logic in `#parse_curl_output`
(which was previously found in `Strategy#page_content`) doesn't
restrict the number of redirections that `curl` follows. At the point
the `curl` output is being parsed, the requests have already been
made and `max_iterations` simply restricts the number of responses
`#parse_curl_output` is willing to parse. If we use `--max-redirs`
and properly set `max_iterations` to `max-redirs + 1`, we shouldn't
encounter the "Too many redirects" error in `#parse_curl_output`.
In cases where there may be more than five responses in `curl`
output to parse, we need to be able to control the `max_iterations`
of the `while` loop in `#parse_curl_output` to properly parse all
the responses.
For example, if we pass `--max-redirs 5` to `curl` and there are
exactly five redirections before the final response, the output would
contain a total of six responses and `#parse_curl_output` wouldn't
properly handle this (it would give a `Too many redirects` error).
`max_iterations` should be the maximum number of redirections + 1
(to account for any final response after the redirections), so we
need to be able to override this value when necessary.
- These were the defaults generated when I clicked the "enable Code
Scanning" button on GitHub, but...
- Since we only have Ruby in this repo, we don't need a matrix, we can
just specify `languages: ruby`.
- And this repo gets enough usage that the schedule is not very useful -
who would look at the scheduled run vs. it running every day on PRs?
> This regular expression has an unrestricted wildcard '.+?' which may cause 'googlecode\.com/svn' to be matched anywhere in the URL, outside the hostname.
> This regular expression has an unrestricted wildcard '.*' which may cause 'googlecode\.com/files' to be matched anywhere in the URL, outside the hostname.