177 lines
6.9 KiB
Ruby
Raw Normal View History

# typed: strict
# frozen_string_literal: true
require "livecheck/strategic"
module Homebrew
module Livecheck
module Strategy
# The {Xml} strategy fetches content at a URL, parses it as XML using
2024-04-30 11:10:23 +02:00
# `REXML` and provides the `REXML::Document` to a `strategy` block.
# If a regex is present in the `livecheck` block, it should be passed
# as the second argument to the `strategy` block.
#
# This is a generic strategy that doesn't contain any logic for finding
# versions, as the structure of XML data varies. Instead, a `strategy`
# block must be used to extract version information from the XML data.
# For more information on how to work with an `REXML::Document` object,
# please refer to the [`REXML::Document`](https://ruby.github.io/rexml/REXML/Document.html)
# and [`REXML::Element`](https://ruby.github.io/rexml/REXML/Element.html)
# documentation.
#
# This strategy is not applied automatically and it is necessary to use
# `strategy :xml` in a `livecheck` block (in conjunction with a
# `strategy` block) to use it.
#
# This strategy's {find_versions} method can be used in other strategies
# that work with XML content, so it should only be necessary to write
# the version-finding logic that works with the parsed XML data.
#
# @api public
class Xml
2025-02-22 21:51:41 -08:00
extend Strategic
NICE_NAME = "XML"
# A priority of zero causes livecheck to skip the strategy. We do this
# for {Xml} so we can selectively apply it only when a strategy block
# is provided in a `livecheck` block.
PRIORITY = 0
# The `Regexp` used to determine if the strategy applies to the URL.
URL_MATCH_REGEX = %r{^https?://}i
# Whether the strategy can be applied to the provided URL.
# {Xml} will technically match any HTTP URL but is only usable with
# a `livecheck` block containing a `strategy` block.
#
# @param url [String] the URL to match against
# @return [Boolean]
2025-02-22 21:51:41 -08:00
sig { override.params(url: String).returns(T::Boolean) }
def self.match?(url)
URL_MATCH_REGEX.match?(url)
end
# Parses XML text and returns an `REXML::Document` object.
# @param content [String] the XML text to parse
# @return [REXML::Document, nil]
sig { params(content: String).returns(T.nilable(REXML::Document)) }
def self.parse_xml(content)
parsing_tries = 0
begin
REXML::Document.new(content)
rescue REXML::UndefinedNamespaceException => e
undefined_prefix = e.to_s[/Undefined prefix ([^ ]+) found/i, 1]
raise "Could not identify undefined prefix." if undefined_prefix.blank?
# Only retry parsing once after removing prefix from content
parsing_tries += 1
raise "Could not parse XML after removing undefined prefix." if parsing_tries > 1
# When an XML document contains a prefix without a corresponding
# namespace, it's necessary to remove the prefix from the content
# to be able to successfully parse it using REXML
content = content.gsub(%r{(</?| )#{Regexp.escape(undefined_prefix)}:}, '\1')
retry
end
end
# Retrieves the stripped inner text of an `REXML` element. Returns
# `nil` if the optional child element doesn't exist or the text is
# blank.
# @param element [REXML::Element] an `REXML` element to retrieve text
# from, either directly or from a child element
# @param child_path [String, nil] the XPath of a child element to
# retrieve text from
# @return [String, nil]
sig {
params(
element: REXML::Element,
child_path: T.nilable(String),
).returns(T.nilable(String))
}
def self.element_text(element, child_path = nil)
element = element.get_elements(child_path).first if child_path.present?
return if element.nil?
text = element.text
return if text.blank?
text.strip
end
# Parses XML text and identifies versions using a `strategy` block.
# If a regex is provided, it will be passed as the second argument to
2025-08-05 12:47:04 -04:00
# the `strategy` block (after the parsed XML data).
# @param content [String] the XML text to parse and check
# @param regex [Regexp, nil] a regex used for matching versions in the
# content
# @return [Array]
sig {
params(
content: String,
regex: T.nilable(Regexp),
2023-04-04 22:40:31 -07:00
block: T.nilable(Proc),
).returns(T::Array[String])
}
def self.versions_from_content(content, regex = nil, &block)
2025-02-22 21:51:41 -08:00
return [] if content.blank? || !block_given?
require "rexml"
xml = parse_xml(content)
return [] if xml.blank?
block_return_value = if regex.present?
yield(xml, regex)
elsif block.arity == 2
raise "Two arguments found in `strategy` block but no regex provided."
else
yield(xml)
end
Strategy.handle_block_return(block_return_value)
end
# Checks the XML content at the URL for versions, using the provided
# `strategy` block to extract version information.
#
# @param url [String] the URL of the content to check
# @param regex [Regexp, nil] a regex used for matching versions
# @param provided_content [String, nil] page content to use in place of
# fetching via `Strategy#page_content`
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
# @param options [Options] options to modify behavior
# @return [Hash]
sig {
2025-02-22 21:51:41 -08:00
override.params(
url: String,
regex: T.nilable(Regexp),
provided_content: T.nilable(String),
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
options: Options,
2023-04-04 22:40:31 -07:00
block: T.nilable(Proc),
2025-02-22 21:51:41 -08:00
).returns(T::Hash[Symbol, T.anything])
}
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
def self.find_versions(url:, regex: nil, provided_content: nil, options: Options.new, &block)
2025-02-22 21:51:41 -08:00
raise ArgumentError, "#{Utils.demodulize(name)} requires a `strategy` block" unless block_given?
2024-03-07 16:20:20 +00:00
match_data = { matches: {}, regex:, url: }
2025-02-22 21:51:41 -08:00
return match_data if url.blank?
content = if provided_content.is_a?(String)
match_data[:cached] = true
provided_content
else
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
match_data.merge!(Strategy.page_content(url, options:))
match_data[:content]
end
return match_data if content.blank?
versions_from_content(content, regex, &block).each do |match_text|
match_data[:matches][match_text] = Version.new(match_text)
end
match_data
end
end
end
end
end