* [PATCH] player: support guessing encodings for comments
@ 2018-01-29 0:58 14% ` Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2018-01-29 0:58 UTC (permalink / raw)
To: Rene Maurer; +Cc: dtas-all
Eric Wong wrote:
> Ugh, this is taking a while. I have a mix of UTF-8 and
> ISO-8859-1 and probably some totally bogus filenames available to me :x
Maybe the following patch is alright, a few other things I want
to work on around mlib before I release.
---8<---
Subject: [PATCH] player: support guessing encodings for comments
This can be helpful for end users and is close to what other
players use. We can fallback to Encoding.default_external by
default (typically UTF-8) and then again using `charlock_holmes'
if installed.
Note: path names remain binary, because that's how proper
filesystems operate.
---
lib/dtas.rb | 2 ++
lib/dtas/encoding.rb | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
lib/dtas/source/sox.rb | 4 +++-
test/test_encoding.rb | 20 +++++++++++++++++
4 files changed, 83 insertions(+), 1 deletion(-)
create mode 100644 lib/dtas/encoding.rb
create mode 100644 test/test_encoding.rb
diff --git a/lib/dtas.rb b/lib/dtas.rb
index ac416d7..3c2cdb4 100644
--- a/lib/dtas.rb
+++ b/lib/dtas.rb
@@ -42,3 +42,5 @@ def self.dedupe_str(str)
require_relative 'dtas/compat_onenine'
require_relative 'dtas/spawn_fix'
+require_relative 'dtas/encoding'
+DTAS.extend(DTAS::Encoding)
diff --git a/lib/dtas/encoding.rb b/lib/dtas/encoding.rb
new file mode 100644
index 0000000..71c877f
--- /dev/null
+++ b/lib/dtas/encoding.rb
@@ -0,0 +1,58 @@
+# Copyright (C) 2018 all contributors <dtas-all@nongnu.org>
+# License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>
+# frozen_string_literal: true
+
+# This module gets included in DTAS
+module DTAS::Encoding # :nodoc:
+ def self.extended(mod)
+ mod.instance_eval { @charlock_holmes = nil}
+ end
+
+private
+
+ def try_enc_harder(str, enc, old) # :nodoc:
+ case @charlock_holmes
+ when nil
+ begin
+ require 'charlock_holmes'
+ @charlock_holmes = CharlockHolmes::EncodingDetector.new
+ rescue LoadError
+ warn "`charlock_holmes` gem not available for encoding detection"
+ @charlock_holmes = false
+ end
+ when false
+ enc_fallback(str, enc, old)
+ else
+ res = @charlock_holmes.detect(str)
+ if det = res[:ruby_encoding]
+ str.force_encoding(det)
+ warn "charlock_holmes detected #{str.inspect} as #{det}..."
+ str.valid_encoding? or enc_fallback(str, det, old)
+ else
+ enc_fallback(str, enc, old)
+ end
+ end
+ str
+ end
+
+ def enc_fallback(str, enc, old) # :nodoc:
+ str.force_encoding(old)
+ warn "could not detect encoding for #{str.inspect} (not #{enc})"
+ end
+
+public
+
+ def try_enc(str, enc, harder = true) # :nodoc:
+ old = str.encoding
+ return str if old == enc
+ str.force_encoding(enc)
+ unless str.valid_encoding?
+ if harder
+ try_enc_harder(str, enc, old)
+ else
+ enc_fallback(str, enc, old)
+ end
+ end
+ str
+ end
+end
diff --git a/lib/dtas/source/sox.rb b/lib/dtas/source/sox.rb
index f702b41..03487fe 100644
--- a/lib/dtas/source/sox.rb
+++ b/lib/dtas/source/sox.rb
@@ -50,17 +50,19 @@ def mcache_lookup(infile)
out =~ /^Sample Rate\s*:\s*(\d+)/n and dst['rate'] = $1.to_i
out =~ /^Precision\s*:\s*(\d+)-bit/n and dst['bits'] = $1.to_i
+ enc = Encoding.default_external
if out =~ /\nComments\s*:[ \t]*\n?(.*)\z/mn
comments = dst['comments'] = {}
key = nil
$1.split(/\n/n).each do |line|
if line.sub!(/^([^=]+)=/ni, '')
- key = DTAS.dedupe_str($1.upcase)
+ key = DTAS.dedupe_str(DTAS.try_enc($1.upcase, enc))
end
(comments[key] ||= ''.b) << "#{line}\n" unless line.empty?
end
comments.each do |k,v|
v.chomp!
+ DTAS.try_enc(v, enc)
comments[k] = DTAS.dedupe_str(v)
end
end
diff --git a/test/test_encoding.rb b/test/test_encoding.rb
new file mode 100644
index 0000000..d9af968
--- /dev/null
+++ b/test/test_encoding.rb
@@ -0,0 +1,20 @@
+# Copyright (C) 2018 all contributors <dtas-all@nongnu.org>
+# License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>
+# frozen_string_literal: true
+require './test/helper'
+require 'dtas'
+require 'yaml'
+
+class TestEncoding < Testcase
+ def test_encoding
+ data = <<EOD # <20180111114546.77906b35@cumparsita.ch>
+---
+comments:
+ ARTIST: !binary |-
+ RW5yaXF1ZSBSb2Ryw61ndWV6
+EOD
+ hash = YAML.load(data)
+ artist = DTAS.try_enc(hash['comments']['ARTIST'], Encoding::UTF_8)
+ assert_equal 'Enrique Rodríguez', artist
+ end
+end
--
EW
^ permalink raw reply related [relevance 14%]
* [PATCH 2/4] mlib: remove redundant tag massaging and encoding
@ 2018-01-30 9:17 8% ` Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2018-01-30 9:17 UTC (permalink / raw)
To: dtas-all
Redundant since ("player: support guessing encodings for comments")
---
lib/dtas/mlib.rb | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/lib/dtas/mlib.rb b/lib/dtas/mlib.rb
index e217b59..d1707fb 100644
--- a/lib/dtas/mlib.rb
+++ b/lib/dtas/mlib.rb
@@ -106,18 +106,10 @@ def worker_work(job)
return ignore(job) if tlen < 0
tlen = tlen.round
tmp = {}
- found.comments.each do |tag, value|
- tag_id = @tag_map[tag] or next
- value.strip!
-
- # FIXME: this fallback needs testing
- [ Encoding::UTF_8, Encoding::ISO_8859_1 ].each do |enc|
- value.force_encoding(enc)
- if value.valid_encoding?
- value.encode!(Encoding::UTF_8) if enc != Encoding::UTF_8
- tmp[tag_id] = value
- break
- end
+ if comments = found.comments
+ comments.each do |tag, value|
+ tag_id = @tag_map[tag] or next
+ tmp[tag_id] = value if value.valid_encoding?
end
end
@db.transaction do
--
EW
^ permalink raw reply related [relevance 8%]
* [ANN] dtas 0.16.0 - duct tape audio suite for *nix
@ 2019-01-02 21:35 7% Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2019-01-02 21:35 UTC (permalink / raw)
To: ruby-talk, dtas-all
Free Software command-line tools for audio playback, mastering, and
whatever else related to audio. dtas follows the worse-is-better
philosophy and acts as duct tape to combine existing command-line tools
for flexibility and ease-of-development. dtas is currently implemented
in Ruby (and some embedded shell), but may use other languages in the
future.
Changes:
A bunch of minor fixes and cleanups accumulating for the past
two years since the last release. It's tough to remember to
make releases when I'm always running the latest version from
git :x
Most notably, "io_splice" is no longer used for dtas-linux
users since "sleepy_penguin" includes all the functionality
we use. This is to reduce memory overhead from extra DSOs(*)
There's also some deprecation warning fixes for the
still-undocumented "dtas-mlib" command.
12 changes since v0.15.0 (2017-04-07):
pipeline: new module for running process pipelines
console: ensure time calculations are done in UTC
Rakefile: update path for uploads
player: support guessing encodings for comments
get rid of Windows-31J regexps
mlib: compatibility with Sequel 5.x
mlib: remove redundant tag massaging and encoding
mlib: use flock to get around SQLite busy errors
mlib: ignore files with nil times
dtas/watchable: check SystemCallError
mlib: fix unused variable warning
use sleepy_penguin 3.5+ for splice and tee support
(*) https://udrepper.livejournal.com/8790.html
* homepage: https://80x24.org/dtas/README
* https://80x24.org/dtas/INSTALL
* https://80x24.org/dtas/dtas-player.txt
* https://80x24.org/dtas/NEWS.atom
* git clone https://80x24.org/dtas.git
* dtas-all@nongnu.org (plain-text only, no HTML mail, please)
* mailing list archives: https://80x24.org/dtas-all/
nntp://news.public-inbox.org/inbox.comp.audio.dtas
https://80x24.org/dtas-all/new.atom
^ permalink raw reply [relevance 7%]
Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-01-11 10:45 dtas-0.15.0 "!binary" in yaml file Rene Maurer
2018-01-11 17:38 ` Eric Wong
2018-01-11 19:43 ` Eric Wong
2018-01-29 0:58 14% ` [PATCH] player: support guessing encodings for comments Eric Wong
2018-01-30 9:17 [PATCH 0/4] mlib: misc updates Eric Wong
2018-01-30 9:17 8% ` [PATCH 2/4] mlib: remove redundant tag massaging and encoding Eric Wong
2019-01-02 21:35 7% [ANN] dtas 0.16.0 - duct tape audio suite for *nix Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/dtas.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).