From: Eric Wong <e@80x24.org>
To: <olddoc-public@80x24.org>
Subject: [PATCH] add man2html generator
Date: Thu, 12 Dec 2019 20:01:14 +0000 [thread overview]
Message-ID: <20191212200114.11738-1-e@80x24.org> (raw)
The man2html(1) and groff(1) HTML generation isn't compatible
anchor-wise with what pandoc(1) generated. They also had too
much styling for my liking.
---
bin/olddoc | 4 +-
lib/olddoc.rb | 1 +
lib/olddoc/man2html.rb | 149 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 153 insertions(+), 1 deletion(-)
create mode 100644 lib/olddoc/man2html.rb
diff --git a/bin/olddoc b/bin/olddoc
index f7e80fb..986cc87 100755
--- a/bin/olddoc
+++ b/bin/olddoc
@@ -1,5 +1,5 @@
#!/usr/bin/env ruby
-# Copyright (C) 2015, all contributors <olddoc-public@80x24.org>
+# Copyright (C) 2015,2019 all contributors <olddoc-public@80x24.org>
$stderr.sync = $stdout.sync = true
tasks = %w(prepare merge)
usage = "Usage: #{File.basename($0)} [#{tasks.join('|')}]"
@@ -10,6 +10,8 @@ when "prepare"
Olddoc::Prepare.new(opts).run
when "merge"
Olddoc::Merge.new(opts).run
+when "man2html"
+ Olddoc::Man2HTML.new(opts).run(ARGV[1..-1])
else
warn "#{$0.inspect} #{ARGV.inspect} not understood"
abort usage
diff --git a/lib/olddoc.rb b/lib/olddoc.rb
index d5d6b37..e4cd344 100644
--- a/lib/olddoc.rb
+++ b/lib/olddoc.rb
@@ -8,6 +8,7 @@ module Olddoc # :nodoc:
autoload :NewsRdoc, 'olddoc/news_rdoc'
autoload :Prepare, 'olddoc/prepare'
autoload :Readme, 'olddoc/readme'
+ autoload :Man2HTML, 'olddoc/man2html'
def self.config(path = ".olddoc.yml")
File.readable?(path) and return YAML.load(File.read(path))
diff --git a/lib/olddoc/man2html.rb b/lib/olddoc/man2html.rb
new file mode 100644
index 0000000..82254d2
--- /dev/null
+++ b/lib/olddoc/man2html.rb
@@ -0,0 +1,149 @@
+# Copyright (C) 2019 all contributors <olddoc-public@80x24.org>
+# License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>
+# frozen_string_literal: true
+require 'digest'
+require 'optparse'
+
+# linkifier for manpages rendered to a terminal. man2html(1) and
+# groff generate too much style
+
+class Olddoc::Man2HTML # :nodoc:
+ SALT = rand
+ LINK_RE = %r{([\('!])?\b((?:ftps?|https?|nntps?|gopher)://
+ [\@:\w\.-]+(?:/
+ (?:[a-z0-9\-\._~!\$\&\';\(\)\*\+,;=:@/%]*)
+ (?:\?[a-z0-9\-\._~!\$\&\';\(\)\*\+,;=:@/%]+)?
+ (?:\#[a-z0-9\-\._~!\$\&\';\(\)\*\+,;=:@/%\?]+)?
+ )?
+ )}xi
+
+ PAIRS = {
+ "(" => %r/(\)[\.,;\+]?)\z/, # Markdown (,), Ruby (+) (, for arrays)
+ "'" => %r/('[\.,;\+]?)\z/, # Perl / Ruby
+ "!" => %r/(![\.,;\+]?)\z/, # Perl / Ruby
+ }
+
+ def initialize(opts) # :nodoc:
+ end
+
+ def run(argv) # :nodoc:
+ out = $stdout
+ OptionParser.new("", 24, ' ') do |opts|
+ opts.on('-o', '--output PATH', 'output to given file') { |path|
+ out = File.open(path, 'w')
+ }
+ opts.parse!(argv)
+ end
+ argv[0] or abort 'manpage required'
+ cols = '72'
+ env = ENV.to_hash
+ env.merge!({ 'COLUMNS' => cols, 'MANWIDTH' => cols, 'TERM' => 'dumb' })
+
+ # note: I don't care for the styles groff and man2html throw
+ # on us, I just want indented and wrapped text with <a hrefs>
+ # for URLs.
+
+ # try man-db options, first:
+ str = IO.popen(env, ['man', '--nh', '--nj', *argv], &:read)
+
+ if str.empty? || !$?.success?
+ str = IO.popen(env, ['man', *argv], &:read)
+ end
+ if $?.success?
+ sections = '[A-Z][A-Z ]+'
+ str = str.split(/^(#{sections})$/mo)
+
+ str = str.map! do |s|
+ case s
+ when /\A(#{sections})$/o
+ # this is to be compatible with HTML fragments pandoc used
+ sec = $1
+ anchor = sec.downcase.tr(' ', '-')
+ "<h1\nid=#{anchor.encode(xml: :attr)}>#{sec}</h1>"
+ else
+ state = linkify_1(s)
+ s.encode!(xml: :text)
+ linkify_2(state, s)
+ s.rstrip!
+ s.empty? ? '' : "<pre>#{s}</pre>"
+ end
+ end.join
+
+ out.print(str)
+
+ # use mtime of the original source
+ if out.respond_to?(:path)
+ path = out.path
+ out.close
+ stat = src_input_stat(argv)
+ File.utime(stat.atime, stat.mtime, path) if stat
+ end
+ end
+ end
+
+ def src_input_stat(argv)
+ argv.reverse_each do |f|
+ next unless File.file?(f)
+ return File.stat(f)
+ end
+
+ argv.reverse_each do |f|
+ path = IO.popen(%W(man -w #{f}), &:read)
+ path.chomp!
+ next unless File.file?(path)
+ return File.stat(path)
+ end
+ nil
+ end
+
+ def linkify_1(str) # :nodoc:
+ state = {}
+ str.gsub!(LINK_RE) do
+ head = $1 || ''
+ url = $2.dup
+ tail = ''.dup
+
+ # it's fairly common to end URLs in messages with
+ # '.', ',' or ';' to denote the end of a statement;
+ # assume the intent was to end the statement/sentence
+ # in English
+ if re = PAIRS[head]
+ url.sub!(re, '')
+ tail = $1
+ elsif url.sub!(/(\))?([\.,;])\z/, '')
+ tail = $2
+ # require ')' to be paired with '('
+ if $1 # ')'
+ if url.index('(').nil?
+ tail = ")#{tail}"
+ else
+ url += ')'
+ end
+ end
+ elsif url !~ /\(/ && url.sub!(/\)\z/, '')
+ tail = ')'
+ end
+
+ # salt this, as this could be exploited to show
+ # links in the HTML which don't show up in the raw mail.
+ key = Digest::MD5.hexdigest("#{url}#{SALT}").freeze
+ state[key] = url
+ "#{head}OLD-LINK-#{key}#{tail}"
+ end
+ state
+ end
+
+ def linkify_2(state, str) # :nodoc:
+ # Added "OLD-LINK-" prefix to avoid false-positives on git commits
+ str.gsub!(/\bOLD-LINK-([a-f0-9]{32})\b/) do
+ key = $1
+ url = state[key]
+ if url
+ %Q{<a\nhref=#{url.encode(xml: :attr)}>#{url.encode(xml: :text)}</a>}
+ else
+ # false positive or somebody tried to mess with us
+ key
+ end
+ end
+ end
+end
reply other threads:[~2019-12-12 20:01 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://80x24.org/olddoc/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191212200114.11738-1-e@80x24.org \
--to=e@80x24.org \
--cc=olddoc-public@80x24.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/olddoc.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).