Don't choke on invalid UTF-8 in file output
Sometimes `file` output contains data from the file under examination, which may include binary data that does not represent valid UTF-8 codepoints. String#split dies if it doesn't understand the encoding, so tell Ruby to treat `file` output as a bytestring.
This commit is contained in:
parent
7ac90613fd
commit
371cd0dd3e
@ -84,9 +84,12 @@ class Keg
|
||||
}
|
||||
output, _status = Open3.capture2("/usr/bin/xargs -0 /usr/bin/file --no-dereference --print0",
|
||||
stdin_data: files.to_a.join("\0"))
|
||||
# `file` output sometimes contains data from the file, which may include
|
||||
# invalid UTF-8 entities, so tell Ruby this is just a bytestring
|
||||
output.force_encoding(Encoding::ASCII_8BIT)
|
||||
output.each_line do |line|
|
||||
path, info = line.split("\0")
|
||||
next unless info.to_s.include?("text")
|
||||
path, info = line.split("\0", 2)
|
||||
next unless info.include?("text")
|
||||
path = Pathname.new(path)
|
||||
next unless files.include?(path)
|
||||
text_files << path
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user