Prohibit non-ASCII characters in URLs, nudge toward punycode

Inspired by curl's blog post, [Detecting malicious Unicode][1], this likely captures most if not all cases and nudges the user toward supplying IDNs with punycode.

A possible improvement would be telling the user exactly what punycode domain to use instead, but that may require another library as I can't quickly find something built into the Ruby stdlib that handles punycode encoding.

[1]: https://daniel.haxx.se/blog/2025/05/16/detecting-malicious-unicode/

Co-authored-by: Štefan Baebler <319826+stefanb@users.noreply.github.com>
This commit is contained in:
Colin Dean 2025-05-20 11:06:20 -04:00 committed by Mike McQuaid
parent 33a6d21eef
commit d5b3ae095c
No known key found for this signature in database
2 changed files with 14 additions and 0 deletions

View File

@ -35,6 +35,12 @@ module RuboCop
def audit_url(type, urls, mirrors, livecheck_url: false)
@type = type
# URLs must be ASCII; IDNs must be punycode
ascii_pattern = /[^\p{ASCII}]+/
audit_urls(urls, ascii_pattern) do |_, url|
problem "Please use the ASCII (Punycode encoded host, URL-encoded path and query) version of #{url}."
end
# GNU URLs; doesn't apply to mirrors
gnu_pattern = %r{^(?:https?|ftp)://ftpmirror\.gnu\.org/(.*)}
audit_urls(urls, gnu_pattern) do |match, url|

View File

@ -177,6 +177,14 @@ RSpec.describe RuboCop::Cop::FormulaAudit::Urls do
"url" => "svn+http://brew.sh/foo/bar",
"msg" => "Use of the svn+http:// scheme is deprecated, pass `:using => :svn` instead",
"col" => 2,
}, {
"url" => "https://🫠.sh/foo/bar",
"msg" => "Please use the ASCII (Punycode encoded host, URL-encoded path and query) version of https://🫠.sh/foo/bar.",
"col" => 2,
}, {
"url" => "https://ßre.sh/foo/bar",
"msg" => "Please use the ASCII (Punycode encoded host, URL-encoded path and query) version of https://ßre.sh/foo/bar.",
"col" => 2,
}]
end