122 lines
4.2 KiB
Ruby
Raw Permalink Normal View History

# typed: strict
# frozen_string_literal: true
require "livecheck/strategic"
module Homebrew
module Livecheck
module Strategy
# The {PageMatch} strategy fetches content at a URL and scans it for
# matching text using the provided regex.
#
# This strategy can be used in a `livecheck` block when no specific
# strategies apply to a given URL. Though {PageMatch} will technically
# match any HTTP URL, the strategy also requires a regex to function.
#
# The {find_versions} method can be used within other strategies, to
# handle the process of identifying version text in content.
#
# @api public
class PageMatch
2025-02-22 21:51:41 -08:00
extend Strategic
NICE_NAME = "Page match"
# A priority of zero causes livecheck to skip the strategy. We do this
2021-08-10 18:38:21 -04:00
# for {PageMatch} so we can selectively apply it only when a regex is
# provided in a `livecheck` block.
PRIORITY = 0
# The `Regexp` used to determine if the strategy applies to the URL.
URL_MATCH_REGEX = %r{^https?://}i
# Whether the strategy can be applied to the provided URL.
2021-08-10 18:38:21 -04:00
# {PageMatch} will technically match any HTTP URL but is only
# usable with a `livecheck` block containing a regex.
2021-08-10 18:38:21 -04:00
#
# @param url [String] the URL to match against
# @return [Boolean]
2025-02-22 21:51:41 -08:00
sig { override.params(url: String).returns(T::Boolean) }
def self.match?(url)
URL_MATCH_REGEX.match?(url)
end
2020-12-22 22:46:52 -05:00
# Uses the regex to match text in the content or, if a block is
# provided, passes the page content to the block to handle matching.
# With either approach, an array of unique matches is returned.
#
2020-12-22 22:46:52 -05:00
# @param content [String] the page content to check
2021-08-19 10:20:11 -04:00
# @param regex [Regexp, nil] a regex used for matching versions in the
# content
# @return [Array]
sig {
params(
content: String,
2021-08-19 10:20:11 -04:00
regex: T.nilable(Regexp),
2023-04-04 22:40:31 -07:00
block: T.nilable(Proc),
).returns(T::Array[String])
}
Standardize valid strategy block return types Valid `strategy` block return types currently vary between strategies. Some only accept a string whereas others accept a string or array of strings. [`strategy` blocks also accept a `nil` return (to simplify early returns) but this was already standardized across strategies.] While some strategies only identify one version by default (where a string is an appropriate return type), it could be that a strategy block identifies more than one version. In this situation, the strategy would need to be modified to accept (and work with) an array from a `strategy` block. Rather than waiting for this to become a problem, this modifies all strategies to standardize on allowing `strategy` blocks to return a string or array of strings (even if only one of these is currently used in practice). Standardizing valid return types helps to further simplify the mental model for `strategy` blocks and reduce cognitive load. This commit extracts related logic from `#find_versions` into methods like `#versions_from_content`, which is conceptually similar to `PageMatch#page_matches` (renamed to `#versions_from_content` for consistency). This allows us to write tests for the related code without having to make network requests (or stub them) at this point. In general, this also helps to better align the structure of strategies and how the various `#find_versions` methods work with versions. There's still more planned work to be done here but this is a step in the right direction.
2021-08-10 11:09:55 -04:00
def self.versions_from_content(content, regex, &block)
if block
block_return_value = regex.present? ? yield(content, regex) : yield(content)
return Strategy.handle_block_return(block_return_value)
end
2021-08-19 10:20:11 -04:00
return [] if regex.blank?
2020-12-14 02:07:07 +01:00
content.scan(regex).filter_map do |match|
2020-12-15 20:08:53 +01:00
case match
when String
match
when Array
2020-12-15 20:08:53 +01:00
match.first
end
end.uniq
end
# Checks the content at the URL for new versions, using the provided
# regex for matching.
2020-12-23 09:12:53 -05:00
#
# @param url [String] the URL of the content to check
2021-08-19 10:20:11 -04:00
# @param regex [Regexp, nil] a regex used for matching versions
2021-08-10 18:38:21 -04:00
# @param provided_content [String, nil] page content to use in place of
# fetching via `Strategy#page_content`
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
# @param options [Options] options to modify behavior
2020-12-23 09:12:53 -05:00
# @return [Hash]
sig {
2025-02-22 21:51:41 -08:00
override.params(
2020-12-23 09:12:53 -05:00
url: String,
2021-08-19 10:20:11 -04:00
regex: T.nilable(Regexp),
2020-12-23 09:12:53 -05:00
provided_content: T.nilable(String),
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
options: Options,
2023-04-04 22:40:31 -07:00
block: T.nilable(Proc),
2020-12-23 09:12:53 -05:00
).returns(T::Hash[Symbol, T.untyped])
}
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
def self.find_versions(url:, regex: nil, provided_content: nil, options: Options.new, &block)
2025-02-22 21:51:41 -08:00
if regex.blank? && !block_given?
2025-02-23 11:08:00 -08:00
raise ArgumentError, "#{Utils.demodulize(name)} requires a regex or `strategy` block"
end
2024-03-07 16:20:20 +00:00
match_data = { matches: {}, regex:, url: }
2025-02-22 21:51:41 -08:00
return match_data if url.blank?
content = if provided_content.is_a?(String)
match_data[:cached] = true
2020-12-23 09:12:53 -05:00
provided_content
else
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
match_data.merge!(Strategy.page_content(url, options:))
match_data[:content]
2020-12-23 09:12:53 -05:00
end
2020-12-22 22:46:52 -05:00
return match_data if content.blank?
Standardize valid strategy block return types Valid `strategy` block return types currently vary between strategies. Some only accept a string whereas others accept a string or array of strings. [`strategy` blocks also accept a `nil` return (to simplify early returns) but this was already standardized across strategies.] While some strategies only identify one version by default (where a string is an appropriate return type), it could be that a strategy block identifies more than one version. In this situation, the strategy would need to be modified to accept (and work with) an array from a `strategy` block. Rather than waiting for this to become a problem, this modifies all strategies to standardize on allowing `strategy` blocks to return a string or array of strings (even if only one of these is currently used in practice). Standardizing valid return types helps to further simplify the mental model for `strategy` blocks and reduce cognitive load. This commit extracts related logic from `#find_versions` into methods like `#versions_from_content`, which is conceptually similar to `PageMatch#page_matches` (renamed to `#versions_from_content` for consistency). This allows us to write tests for the related code without having to make network requests (or stub them) at this point. In general, this also helps to better align the structure of strategies and how the various `#find_versions` methods work with versions. There's still more planned work to be done here but this is a step in the right direction.
2021-08-10 11:09:55 -04:00
versions_from_content(content, regex, &block).each do |match_text|
2020-12-22 22:46:52 -05:00
match_data[:matches][match_text] = Version.new(match_text)
end
match_data
end
end
end
end
end