everything related to duct tape audio suite (dtas)
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH] player: support guessing encodings for comments
  @ 2018-01-29  0:58 14%     ` Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2018-01-29  0:58 UTC (permalink / raw)
  To: Rene Maurer; +Cc: dtas-all

Eric Wong wrote:
> Ugh, this is taking a while.  I have a mix of UTF-8 and
> ISO-8859-1 and probably some totally bogus filenames available to me :x

Maybe the following patch is alright, a few other things I want
to work on around mlib before I release.

---8<---
Subject: [PATCH] player: support guessing encodings for comments

This can be helpful for end users and is close to what other
players use.  We can fallback to Encoding.default_external by
default (typically UTF-8) and then again using `charlock_holmes'
if installed.

Note: path names remain binary, because that's how proper
filesystems operate.
---
 lib/dtas.rb            |  2 ++
 lib/dtas/encoding.rb   | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/dtas/source/sox.rb |  4 +++-
 test/test_encoding.rb  | 20 +++++++++++++++++
 4 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 lib/dtas/encoding.rb
 create mode 100644 test/test_encoding.rb

diff --git a/lib/dtas.rb b/lib/dtas.rb
index ac416d7..3c2cdb4 100644
--- a/lib/dtas.rb
+++ b/lib/dtas.rb
@@ -42,3 +42,5 @@ def self.dedupe_str(str)
 
 require_relative 'dtas/compat_onenine'
 require_relative 'dtas/spawn_fix'
+require_relative 'dtas/encoding'
+DTAS.extend(DTAS::Encoding)
diff --git a/lib/dtas/encoding.rb b/lib/dtas/encoding.rb
new file mode 100644
index 0000000..71c877f
--- /dev/null
+++ b/lib/dtas/encoding.rb
@@ -0,0 +1,58 @@
+# Copyright (C) 2018 all contributors <dtas-all@nongnu.org>
+# License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>
+# frozen_string_literal: true
+
+# This module gets included in DTAS
+module DTAS::Encoding # :nodoc:
+  def self.extended(mod)
+    mod.instance_eval { @charlock_holmes = nil}
+  end
+
+private
+
+  def try_enc_harder(str, enc, old) # :nodoc:
+    case @charlock_holmes
+    when nil
+      begin
+        require 'charlock_holmes'
+        @charlock_holmes = CharlockHolmes::EncodingDetector.new
+      rescue LoadError
+        warn "`charlock_holmes` gem not available for encoding detection"
+        @charlock_holmes = false
+      end
+    when false
+      enc_fallback(str, enc, old)
+    else
+      res = @charlock_holmes.detect(str)
+      if det = res[:ruby_encoding]
+        str.force_encoding(det)
+        warn "charlock_holmes detected #{str.inspect} as #{det}..."
+        str.valid_encoding? or enc_fallback(str, det, old)
+      else
+        enc_fallback(str, enc, old)
+      end
+    end
+    str
+  end
+
+  def enc_fallback(str, enc, old) # :nodoc:
+    str.force_encoding(old)
+    warn "could not detect encoding for #{str.inspect} (not #{enc})"
+  end
+
+public
+
+  def try_enc(str, enc, harder = true) # :nodoc:
+    old = str.encoding
+    return str if old == enc
+    str.force_encoding(enc)
+    unless str.valid_encoding?
+      if harder
+        try_enc_harder(str, enc, old)
+      else
+        enc_fallback(str, enc, old)
+      end
+    end
+    str
+  end
+end
diff --git a/lib/dtas/source/sox.rb b/lib/dtas/source/sox.rb
index f702b41..03487fe 100644
--- a/lib/dtas/source/sox.rb
+++ b/lib/dtas/source/sox.rb
@@ -50,17 +50,19 @@ def mcache_lookup(infile)
       out =~ /^Sample Rate\s*:\s*(\d+)/n and dst['rate'] = $1.to_i
       out =~ /^Precision\s*:\s*(\d+)-bit/n and dst['bits'] = $1.to_i
 
+      enc = Encoding.default_external
       if out =~ /\nComments\s*:[ \t]*\n?(.*)\z/mn
         comments = dst['comments'] = {}
         key = nil
         $1.split(/\n/n).each do |line|
           if line.sub!(/^([^=]+)=/ni, '')
-            key = DTAS.dedupe_str($1.upcase)
+            key = DTAS.dedupe_str(DTAS.try_enc($1.upcase, enc))
           end
           (comments[key] ||= ''.b) << "#{line}\n" unless line.empty?
         end
         comments.each do |k,v|
           v.chomp!
+          DTAS.try_enc(v, enc)
           comments[k] = DTAS.dedupe_str(v)
         end
       end
diff --git a/test/test_encoding.rb b/test/test_encoding.rb
new file mode 100644
index 0000000..d9af968
--- /dev/null
+++ b/test/test_encoding.rb
@@ -0,0 +1,20 @@
+# Copyright (C) 2018 all contributors <dtas-all@nongnu.org>
+# License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>
+# frozen_string_literal: true
+require './test/helper'
+require 'dtas'
+require 'yaml'
+
+class TestEncoding < Testcase
+  def test_encoding
+    data = <<EOD # <20180111114546.77906b35@cumparsita.ch>
+---
+comments:
+  ARTIST: !binary |-
+    RW5yaXF1ZSBSb2Ryw61ndWV6
+EOD
+    hash = YAML.load(data)
+    artist = DTAS.try_enc(hash['comments']['ARTIST'], Encoding::UTF_8)
+    assert_equal 'Enrique Rodríguez', artist
+  end
+end
-- 
EW


^ permalink raw reply related	[relevance 14%]

* [PATCH 2/4] mlib: remove redundant tag massaging and encoding
  @ 2018-01-30  9:17  8% ` Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2018-01-30  9:17 UTC (permalink / raw)
  To: dtas-all

Redundant since ("player: support guessing encodings for comments")
---
 lib/dtas/mlib.rb | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/lib/dtas/mlib.rb b/lib/dtas/mlib.rb
index e217b59..d1707fb 100644
--- a/lib/dtas/mlib.rb
+++ b/lib/dtas/mlib.rb
@@ -106,18 +106,10 @@ def worker_work(job)
     return ignore(job) if tlen < 0
     tlen = tlen.round
     tmp = {}
-    found.comments.each do |tag, value|
-      tag_id = @tag_map[tag] or next
-      value.strip!
-
-      # FIXME: this fallback needs testing
-      [ Encoding::UTF_8, Encoding::ISO_8859_1 ].each do |enc|
-        value.force_encoding(enc)
-        if value.valid_encoding?
-          value.encode!(Encoding::UTF_8) if enc != Encoding::UTF_8
-          tmp[tag_id] = value
-          break
-        end
+    if comments = found.comments
+      comments.each do |tag, value|
+        tag_id = @tag_map[tag] or next
+        tmp[tag_id] = value if value.valid_encoding?
       end
     end
     @db.transaction do
-- 
EW



^ permalink raw reply related	[relevance 8%]

* [ANN] dtas 0.16.0 - duct tape audio suite for *nix
@ 2019-01-02 21:35  7% Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2019-01-02 21:35 UTC (permalink / raw)
  To: ruby-talk, dtas-all

Free Software command-line tools for audio playback, mastering, and
whatever else related to audio.  dtas follows the worse-is-better
philosophy and acts as duct tape to combine existing command-line tools
for flexibility and ease-of-development.  dtas is currently implemented
in Ruby (and some embedded shell), but may use other languages in the
future.

Changes:

    A bunch of minor fixes and cleanups accumulating for the past
    two years since the last release.  It's tough to remember to
    make releases when I'm always running the latest version from
    git :x

    Most notably, "io_splice" is no longer used for dtas-linux
    users since "sleepy_penguin" includes all the functionality
    we use.  This is to reduce memory overhead from extra DSOs(*)

    There's also some deprecation warning fixes for the
    still-undocumented "dtas-mlib" command.

    12 changes since v0.15.0 (2017-04-07):

          pipeline: new module for running process pipelines
          console: ensure time calculations are done in UTC
          Rakefile: update path for uploads
          player: support guessing encodings for comments
          get rid of Windows-31J regexps
          mlib: compatibility with Sequel 5.x
          mlib: remove redundant tag massaging and encoding
          mlib: use flock to get around SQLite busy errors
          mlib: ignore files with nil times
          dtas/watchable: check SystemCallError
          mlib: fix unused variable warning
          use sleepy_penguin 3.5+ for splice and tee support

    (*) https://udrepper.livejournal.com/8790.html

* homepage: https://80x24.org/dtas/README
* https://80x24.org/dtas/INSTALL
* https://80x24.org/dtas/dtas-player.txt
* https://80x24.org/dtas/NEWS.atom
* git clone https://80x24.org/dtas.git
* dtas-all@nongnu.org (plain-text only, no HTML mail, please)
* mailing list archives: https://80x24.org/dtas-all/
  nntp://news.public-inbox.org/inbox.comp.audio.dtas
  https://80x24.org/dtas-all/new.atom


^ permalink raw reply	[relevance 7%]

Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-01-11 10:45     dtas-0.15.0 "!binary" in yaml file Rene Maurer
2018-01-11 17:38     ` Eric Wong
2018-01-11 19:43       ` Eric Wong
2018-01-29  0:58 14%     ` [PATCH] player: support guessing encodings for comments Eric Wong
2018-01-30  9:17     [PATCH 0/4] mlib: misc updates Eric Wong
2018-01-30  9:17  8% ` [PATCH 2/4] mlib: remove redundant tag massaging and encoding Eric Wong
2019-01-02 21:35  7% [ANN] dtas 0.16.0 - duct tape audio suite for *nix Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/dtas.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).