151 lines
6.1 KiB
Ruby
Raw Permalink Normal View History

# typed: strict
# frozen_string_literal: true
require "livecheck/strategic"
module Homebrew
module Livecheck
module Strategy
# The {Xorg} strategy identifies versions of software at x.org by
# checking directory listing pages.
#
# X.Org URLs take one of the following formats, among several others:
#
# * `https://www.x.org/archive/individual/app/example-1.2.3.tar.bz2`
# * `https://www.x.org/archive/individual/font/example-1.2.3.tar.bz2`
# * `https://www.x.org/archive/individual/lib/libexample-1.2.3.tar.bz2`
# * `https://ftp.x.org/archive/individual/lib/libexample-1.2.3.tar.bz2`
# * `https://www.x.org/pub/individual/doc/example-1.2.3.tar.gz`
# * `https://xorg.freedesktop.org/archive/individual/util/example-1.2.3.tar.xz`
#
# The notable differences between URLs are as follows:
#
# * `www.x.org` and `ftp.x.org` seem to be interchangeable (we prefer
# `www.x.org`).
# * `/archive/` is the current top-level directory and `/pub/` will
# redirect to the same URL using `/archive/` instead. (The strategy
# handles this replacement to avoid the redirection.)
# * The `/individual/` directory contains a number of directories (e.g.
# app, data, doc, driver, font, lib, etc.) which contain a number of
# different archive files.
#
# Since this strategy ends up checking the same directory listing pages
# for multiple formulae, we've included a simple method of page caching.
# This prevents livecheck from fetching the same page more than once and
# also dramatically speeds up these checks. Eventually we hope to
# implement a more sophisticated page cache that all strategies using
# {PageMatch} can use (allowing us to simplify this strategy accordingly).
#
# The default regex identifies versions in archive files found in `href`
# attributes.
#
# @api public
class Xorg
2025-02-22 21:51:41 -08:00
extend Strategic
NICE_NAME = "X.Org"
# A `Regexp` used in determining if the strategy applies to the URL and
# also as part of extracting the module name from the URL basename.
MODULE_REGEX = /(?<module_name>.+)-\d+/i
# A `Regexp` used to extract the module name from the URL basename.
FILENAME_REGEX = /^#{MODULE_REGEX.source.strip}/i
# The `Regexp` used to determine if the strategy applies to the URL.
URL_MATCH_REGEX = %r{
^https?://(?:[^/]+?\.)* # Scheme and any leading subdomains
(?:x\.org/(?:[^/]+/)*individual
|freedesktop\.org/(?:archive|dist|software)
|archive\.mesa3d\.org)
/(?:[^/]+/)*#{MODULE_REGEX.source.strip}
}ix
# Used to cache page content, so we don't fetch the same pages
# repeatedly.
@page_data = T.let({}, T::Hash[String, String])
# Whether the strategy can be applied to the provided URL.
#
# @param url [String] the URL to match against
# @return [Boolean]
2025-02-22 21:51:41 -08:00
sig { override.params(url: String).returns(T::Boolean) }
def self.match?(url)
URL_MATCH_REGEX.match?(url)
end
# Extracts information from a provided URL and uses it to generate
# various input values used by the strategy to check for new versions.
# Some of these values act as defaults and can be overridden in a
# `livecheck` block.
#
# @param url [String] the URL used to generate values
# @return [Hash]
sig { params(url: String).returns(T::Hash[Symbol, T.untyped]) }
def self.generate_input_values(url)
values = {}
file_name = File.basename(url)
match = file_name.match(FILENAME_REGEX)
return values if match.blank?
# /pub/ URLs redirect to the same URL with /archive/, so we replace
# it to avoid the redirection. Removing the filename from the end of
# the URL gives us the relevant directory listing page.
values[:url] = url.sub("x.org/pub/", "x.org/archive/").delete_suffix(file_name)
regex_name = Regexp.escape(T.must(match[:module_name])).gsub("\\-", "-")
# Example regex: `/href=.*?example[._-]v?(\d+(?:\.\d+)+)\.t/i`
values[:regex] = /href=.*?#{regex_name}[._-]v?(\d+(?:\.\d+)+)\.t/i
values
end
# Generates a URL and regex (if one isn't provided) and checks the
# content at the URL for new versions (using the regex for matching).
#
# The behavior in this method for matching text in the content using a
# regex is copied and modified from the {PageMatch} strategy, so that
# we can add some simple page caching. If this behavior is expanded to
# apply to all strategies that use {PageMatch} to identify versions,
# then this strategy can be brought in line with the others.
#
# @param url [String] the URL of the content to check
# @param regex [Regexp] a regex used for matching versions in content
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
# @param options [Options] options to modify behavior
# @return [Hash]
2021-04-04 03:00:34 +02:00
sig {
2025-02-22 21:51:41 -08:00
override(allow_incompatible: true).params(
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
url: String,
regex: T.nilable(Regexp),
options: Options,
block: T.nilable(Proc),
2025-02-22 21:51:41 -08:00
).returns(T::Hash[Symbol, T.anything])
2021-04-04 03:00:34 +02:00
}
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
def self.find_versions(url:, regex: nil, options: Options.new, &block)
generated = generate_input_values(url)
generated_url = generated[:url]
2020-12-23 09:12:53 -05:00
# Use the cached page content to avoid duplicate fetches
cached_content = @page_data[generated_url]
2023-04-03 17:34:39 -07:00
match_data = PageMatch.find_versions(
url: generated_url,
regex: regex || generated[:regex],
provided_content: cached_content,
livecheck: Add Options class This adds a `Livecheck::Options` class, which is intended to house various configuration options that are set in `livecheck` blocks, conditionally set by livecheck at runtime, etc. The general idea is that when we add features involving configurations options (e.g., for livecheck, strategies, curl, etc.), we can make changes to `Options` without needing to modify parameters for strategy `find_versions` methods, `Strategy` methods like `page_headers` and `page_content`, etc. This is something that I've been trying to improve over the years and `Options` should help to reduce maintenance overhead in this area while also strengthening type signatures. `Options` replaces the existing `homebrew_curl` option (which related strategies pass to `Strategy` methods and on to `curl_args`) and the new `url_options` (which contains `post_form` or `post_json` values that are used to make `POST` requests). I recently added `url_options` as a temporary way of enabling `POST` support without `Options` but this restores the original `Options`-based implementation. Along the way, I added a `homebrew_curl` parameter to the `url` DSL method, allowing us to set an explicit value in `livecheck` blocks. This is something that we've needed in some cases but I also intend to replace implicit/inferred `homebrew_curl` usage with explicit values in `livecheck` blocks once this is available for use. My intention is to eventually remove the implicit behavior and only rely on explicit values. That will align with how `homebrew_curl` options work for other URLs and makes the behavior clear just from looking at the `livecheck` block. Lastly, this removes the `unused` rest parameter from `find_versions` methods. I originally added `unused` as a way of handling parameters that some `find_versions` methods have but others don't (e.g., `cask` in `ExtractPlist`), as this allowed us to pass various arguments to `find_versions` methods without worrying about whether a particular parameter is available. This isn't an ideal solution and I originally wanted to handle this situation by only passing expected arguments to `find_versions` methods but there was a technical issue standing in the way. I recently found an answer to the issue, so this also replaces the existing `ExtractPlist` special case with generic logic that checks the parameters for a strategy's `find_versions` method and only passes expected arguments. Replacing the aforementioned `find_versions` parameters with `Options` ensures that the remaining parameters are fairly consistent across strategies and any differences are handled by the aforementioned logic. Outside of `ExtractPlist`, the only other difference is that some `find_versions` methods have a `provided_content` parameter but that's currently only used by tests (though it's intended for caching support in the future). I will be renaming that parameter to `content` in an upcoming PR and expanding it to the other strategies, which should make them all consistent outside of `ExtractPlist`.
2025-02-11 18:04:38 -05:00
options:,
&block
)
content = match_data[:content]
return match_data if content.blank?
2020-12-23 09:12:53 -05:00
# Cache any new page content
@page_data[generated_url] = content unless cached_content
match_data
end
end
end
end
end