* [PATCH] dtas-archive: paranoid archival script
@ 2015-04-07 7:47 3% Eric Wong
2015-09-18 8:54 7% ` Eric Wong
0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2015-04-07 7:47 UTC (permalink / raw)
To: dtas-all
This archives audio files (typically .wav from a portable devices)
as FLAC and performs a best-effort verification the file was
transferred succesfully without bit errors by dropping kernel caches
and rechecking the result.
---
Documentation/GNUmakefile | 1 +
Documentation/dtas-archive.txt | 61 ++++++++++++++
bin/dtas-archive | 187 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 249 insertions(+)
create mode 100644 Documentation/dtas-archive.txt
create mode 100755 bin/dtas-archive
diff --git a/Documentation/GNUmakefile b/Documentation/GNUmakefile
index 4f44cdc..537c659 100644
--- a/Documentation/GNUmakefile
+++ b/Documentation/GNUmakefile
@@ -18,6 +18,7 @@ m1 += dtas-sinkedit
m1 += dtas-sourceedit
m1 += dtas-tl
m1 += dtas-splitfx
+m1 += dtas-archive
m7 =
m7 += dtas-player_protocol
diff --git a/Documentation/dtas-archive.txt b/Documentation/dtas-archive.txt
new file mode 100644
index 0000000..9f2fe04
--- /dev/null
+++ b/Documentation/dtas-archive.txt
@@ -0,0 +1,61 @@
+% dtas-archive(1) dtas user manual
+%
+
+# NAME
+
+dtas-archive - paranoid audio file copy
+
+# SYNOPSYS
+
+dtas-archive [OPTIONS] SOURCE DESTINATION
+
+# DESCRIPTION
+
+dtas-archive is intended for archiving audio data to/from laptops and
+computers without ECC memory, attempting to read data multiple times in
+an attempt to detect memory corruption. dtas-archive may only be
+effective on machines running the Linux kernel where posix_fadvise(2)
+can be used to drop caches for a particular file after fsync(2).
+
+dtas-archive spawns sox(1) to archive audio data (likely uncompressed
+WAVE) to FLAC and verifies the result using sndfile-cmp(1), a tool
+implemented by different than sox and less likely to share the same bugs
+(if any) as sox.
+
+# OPTIONS
+
+-j, \--jobs [JOBS]
+: Number of jobs to run in parallel. Incrementing this may hurt
+ performance on slow storage devices. Default: 1
+
+-n, \--dry-run
+: Print, but do not run the commands to be executed
+
+-s, \--quiet, \--silent
+: Silent operation, commands are not printed as executed
+
+-S, \--stats
+: Run and save the text output of the sox "stats" effect as
+ $DESTINATION_FILE_WITHOUT_SUFFIX.stats next to the output file
+
+-k, \--keep-going
+: Continue after error
+
+-r, \--repeat [COUNT]
+: Number of times to repeat the sndfile-cmp(1) check. Default: 1
+
+# COPYRIGHT
+
+Copyright 2015 all contributors <dtas-all@nongnu.org>.\
+License: GPLv3 or later <http://www.gnu.org/licenses/gpl-3.0.txt>
+
+# CONTACT
+
+All feedback welcome via plain-text mail to: <dtas-all@nongnu.org>\
+Mailing list archives available at <http://80x24.org/dtas-all/> and
+<ftp://lists.gnu.org/dtas-all/>\
+No subscription is necessary to post to the mailing list.
+
+# SEE ALSO
+
+sndfile-cmp(1), sox(1)
diff --git a/bin/dtas-archive b/bin/dtas-archive
new file mode 100755
index 0000000..69fc40e
--- /dev/null
+++ b/bin/dtas-archive
@@ -0,0 +1,187 @@
+#!/usr/bin/env ruby
+# Copyright (C) 2015 all contributors <dtas-all@nongnu.org>
+# License: GPLv3 or later (https://www.gnu.org/licenses/gpl-3.0.txt)
+usage = "#$0 SOURCE DESTINATION"
+
+# We could use the equivalent sox command here, but some folks working on
+# dtas is more likely to write patches for sox (and thus introduce bugs
+# into it), so we'll use sndfile-cmp as it lives in a different source tree
+%w(sndfile-cmp sox).each do |cmd|
+ `which #{cmd} 2>/dev/null`.chomp.empty? and abort "#{cmd} not found in PATH"
+end
+
+RUBY_PLATFORM =~ /linux/ or
+ warn "#$0 is unproven without Linux kernel fadvise semantics"
+have_advise = IO.instance_methods.include?(:advise)
+have_advise or warn "#$0 does not work reliably without IO#advise support"
+
+require 'shellwords'
+require 'fileutils'
+require 'find'
+require 'optparse'
+Thread.abort_on_exception = true
+dry_run = false
+silent = false
+type = 'flac'
+jobs = 1
+repeat = 1
+stats = false
+keep_going = false
+
+OptionParser.new('', 24, ' ') do |op|
+ op.banner = usage
+ op.on('-t', '--type [TYPE]', 'FILE-TYPE (default: flac)') { |t| type = t }
+ op.on('-j', '--jobs [JOBS]', Integer) { |j| jobs = j }
+ op.on('-S', '--stats', 'save stats on the file') { stats = true }
+ op.on('-k', '--keep-going', 'continue after error') { keep_going = true }
+ op.on('-n', '--dry-run', 'only print commands, do not run them') do
+ dry_run = true
+ end
+ op.on('-r', '--repeat [COUNT]', 'number of times to check', Integer) do |r|
+ repeat = r
+ end
+ op.on('-s', '--quiet', '--silent') { silent = true }
+ op.on('-h', '--help') do
+ puts(op.to_s)
+ exit
+ end
+ op.parse!(ARGV)
+end
+
+dst = ARGV.pop
+src = ARGV.dup
+
+FileUtils.mkpath(dst) unless File.exist?(dst)
+src_files = Hash.new { |h,dest_dir| h[dest_dir] = [] }
+
+src.each do |s|
+ src_st = File.stat(s)
+ if src_st.directory?
+ Find.find(s) do |path|
+ File.file?(path) or next
+ dir = File.dirname(path)
+ dir_st = File.stat(dir)
+ if dir_st.ino == src_st.ino && dir_st.dev == src_st.dev
+ src_files['.'] << path
+ else
+ dir = File.basename(File.dirname(path))
+ src_files[dir] << path
+ end
+ end
+ else
+ src_files['.'] << s
+ end
+end
+
+pairs = []
+type = ".#{type}" unless type.start_with?('.')
+
+src_files.each do |dir, files|
+ dir = dir == '.' ? dst : File.join(dst, dir)
+ if dry_run || !silent
+ puts "mkdir -p #{Shellwords.escape(dir)}"
+ end
+ FileUtils.mkpath(dir) unless dry_run
+
+ files.each do |path|
+ base = File.basename(path).sub(/\.[^\.]+\z/, type)
+ out = File.join(dir, base)
+ pairs << [ path, out ]
+ end
+end
+
+mtx = Mutex.new # protects fails and pairs
+fails = []
+mismatches = []
+
+on_fail = lambda do |job, status|
+ mtx.synchronize do
+ pairs.clear unless keep_going
+ fails << [ job, status ]
+ end
+end
+
+on_mismatch = lambda do |job, status|
+ mtx.synchronize do
+ mismatches << [ job, status ]
+ end
+end
+
+exiting = false
+%w(INT TERM).each do |s|
+ trap(s) do
+ warn "Caught SIG#{s}, stopping gracefully..."
+ exiting = true
+ trap(s, 'DEFAULT') # non-graceful if signaled again
+ end
+end
+
+thrs = jobs.times.map do |i|
+ Thread.new do
+ while job = mtx.synchronize { pairs.shift }
+ break if exiting
+
+ input, output = *job
+
+ unless system('soxi', '-s', input, out: IO::NULL, err: IO::NULL)
+ warn "skipping #{input.inspect}, not an audio file"
+ next
+ end
+
+ stats_out = "#{output.sub(/\.[\.]+\z/, '')}.stats" if stats
+
+ if dry_run || !silent
+ names = job.map { |x| Shellwords.escape(x) }
+ cmd = [ 'sox', *names ]
+ if stats
+ cmd << 'stats'
+ cmd << "2>#{Shellwords.escape(stats_out)}"
+ end
+
+ puts cmd.join(' ')
+ cmpcmd = "sndfile-cmp #{names[0]} #{names[1]}"
+ if dry_run
+ puts cmpcmd
+ next
+ end
+ end
+
+ cmd = [ 'sox', input, output ]
+ if stats
+ cmd << 'stats'
+ cmd = [ *cmd, { err: stats_out } ]
+ end
+ system(*cmd) or on_fail.call(job, $?)
+
+ # clear kernel caches, this relies on Linux behavior
+ repeat.times do
+ if have_advise
+ th = Thread.new { File.open(input) { |fp| fp.advise(:dontneed) } }
+ File.open(output, 'ab') do |fp|
+ fp.fsync
+ fp.advise(:dontneed)
+ end
+ th.join
+ end
+
+ puts cmpcmd unless silent
+ system('sndfile-cmp', input, output) or on_mismatch.call(job, $?)
+ end
+ st = File.stat(input)
+ File.utime(st.atime, st.mtime, output)
+ end
+ end
+end
+
+thrs.each(&:join)
+ok = true
+fails.each do |job, status|
+ $stderr.puts "#{job.inspect} failed: #{status.inspect}"
+ ok = false
+end
+mismatches.each do |job, status|
+ $stderr.puts "#{job.inspect} mismatched: #{status.inspect}"
+ ok = false
+end
+
+exit ok
--
EW
^ permalink raw reply related [relevance 3%]
* [ANN] dtas 0.10.0 - major features and small fixes
@ 2015-04-13 6:20 5% Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2015-04-13 6:20 UTC (permalink / raw)
To: dtas-all
Bug fixes:
* Exported INFILE environment variable is always shell-escaped
This prevent screw-ups when users are using funky filenames.
* dtas-player: enqueued commands cannot use audio format bypass
(the audio format cannot be known ahead-of-time from raw commands)
* YAML omap (ordered map) is explicitly used for all env hashes for
user editing. Normal (unordered hashes) are still allowed if loading
existing files. This does not affect Ruby 1.9+ users, but allows
easier processing for users of other languages.
New features (all platforms):
* dtas-player now plays dtas-splitfx YAML files support cue sheet
emulation based on the track list. Under Linux[1], changes to
the YAML file are reflected in real-time as the file is edited
and saved in an $EDITOR. This feature is useful for dialing
in EQ, compressor, and limiter effects on tracks.
* dtas-player supports the "source restart" command for restarting
playback on modified files for systems without inotify support.
* dtas-splitfx now exports the INDIR and INBASE environment variables
which are intended to act like `$(@D)' and `$(@F)' in GNU make(1).
It should ease managing temporary files for some effects
(e.g. noiseprof + noisered in sox)
* dtas-console supports '!' and '@' hotkeys keys for moving within
files with embedded cue sheets.
* dtas-player supports the "trim" command to focus on a particular
portion of a track. It may be useful when combined with the existing
"tl repeat" command for dialing in audio editing parameters
(via a splitfx YAML file):
To continuously repeat a 5 second part of the current track starting
at 1 minute into the track:
dtas-ctl tl repeat 1 && dtas-ctl trim 1:00 5
Passing "off" as the parameter disables trim:
dtas-ctl trim off
* dtas-env(7) manpage added for common environment variables across
the suite
* dtas-sinkedit shows default parameters in addition to user-changed
parameters
New features (Linux-only)
* dtas-sourceedit and dtas-sinkedit support inotify[1] when editing
the YAML text file. This allows real-time updates on $EDITOR
file save as the user edits the parameters of the commands used
for decoding and playback.
* dtas-archive - paranoid archival script for copying and (re-reading)
files. This is useful when transferring files from removable devices
to computers without ECC memory (or any other bit errors in transport
before main memory is accessed). This requires Ruby 1.9.3 or later
(no 3rd-party RubyGems) on Linux for IO#advise support.
There are also many internal cleanups and more work-in-progress
for dtas-splitfx features.
[1] feature requires the sleepy_penguin RubyGem to be installed.
Upgrade or install "dtas" via RubyGems or via tarball at:
http://dtas.80x24.org/2015/dtas-0.10.0.tar.gz
^ permalink raw reply [relevance 5%]
* Re: [PATCH] dtas-archive: paranoid archival script
2015-04-07 7:47 3% [PATCH] dtas-archive: paranoid archival script Eric Wong
@ 2015-09-18 8:54 7% ` Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2015-09-18 8:54 UTC (permalink / raw)
To: dtas-all
Eric Wong <e@80x24.org> wrote:
> +dtas-archive is intended for archiving audio data to/from laptops and
> +computers without ECC memory, attempting to read data multiple times in
> +an attempt to detect memory corruption. dtas-archive may only be
> +effective on machines running the Linux kernel where posix_fadvise(2)
> +can be used to drop caches for a particular file after fsync(2).
>
> +dtas-archive spawns sox(1) to archive audio data (likely uncompressed
> +WAVE) to FLAC and verifies the result using sndfile-cmp(1), a tool
> +implemented by different than sox and less likely to share the same bugs
> +(if any) as sox.
Fwiw, I'm VERY happy to note this script actually just detected an error
the kernel + ECC RAM did not notice. The sox wav-to-flac copy was
corrupted, and sndfile-cmp caught the error after dropping the cache and
re-comparing. I tried remounting the device just in case, but running
sndfile-cmp by hand reproduced the error.
Again, this was from my workstation with ECC memory, even, so the error
probably happened on the USB or device level. I was just copying a WAV
file recording from a MicroSD card off a USB card reader.
Fwiw, I've been doing this cache-dropping + sndfile-cmp dance for nearly
3 years, now (with a different, unpublished script) now and this is my
first time detecting a real error.
I haven't listened, but stats off the corrupt copy did indicate
clipping.
So the paranoia with verifying copies really is justified :>
^ permalink raw reply [relevance 7%]
Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2015-04-13 6:20 5% [ANN] dtas 0.10.0 - major features and small fixes Eric Wong
2015-04-07 7:47 3% [PATCH] dtas-archive: paranoid archival script Eric Wong
2015-09-18 8:54 7% ` Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/dtas.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).