From: Eric Wong <e@80x24.org>
To: spew@80x24.org
Subject: [PATCH] uri optimizations, too ugly
Date: Mon, 22 Dec 2014 21:53:55 +0000 [thread overview]
Message-ID: <ugly-uri-optimizations@r48922> (raw)
Unfortunately, these make evaluating effects of any future compile.c
optimizations more difficult.
[misc #10628] https://bugs.ruby-lang.org/issues/10628
* use opt_str_freeze to reduce duplicates
* avoid regenerating hashe in parser.regexp
* reduced bytecode size of conditionals
benchmark results:
target 0: 2.1.5 (ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]) at "/home/ew/ruby-2.1/bin/ruby"
target 1: trunk (ruby 2.2.0dev (2014-12-22 trunk 48922) [x86_64-linux]) at "/home/ew/rrrr/b/i/bin/ruby"
target 2: built (ruby 2.2.0dev (2014-12-22 trunk 48922) [x86_64-linux]) at "/home/ew/ruby/b/i/bin/ruby"
-----------------------------------------------------------
raw data:
[["app_uri",
[[0.4858027193695307,
0.48909279331564903,
0.4869739431887865,
0.4856073558330536,
0.49060618318617344,
0.49414661154150963,
0.48784281872212887,
0.4851597473025322,
0.48379900865256786,
0.48618787340819836],
[0.589355481788516,
0.6005589235574007,
0.6023986879736185,
0.586976544931531,
0.6007280834019184,
0.5901837293058634,
0.5893201008439064,
0.5839062985032797,
0.5905469041317701,
0.6007170639932156],
[0.5123783405870199,
0.5250121373683214,
0.5028673857450485,
0.49962601624429226,
0.5074941627681255,
0.5039216671139002,
0.5183564182370901,
0.5083295572549105,
0.5006583165377378,
0.5104942582547665]]]]
Elapsed time: 15.90131543 (sec)
-----------------------------------------------------------
benchmark results:
minimum results in each 10 measurements.
Execution time (sec)
name 2.1.5 trunk built
app_uri 0.484 0.584 0.500
Speedup ratio: compare with the result of `2.1.5' (greater is better)
name trunk built
app_uri 0.829 0.968
---
lib/uri/generic.rb | 23 +++++-------
lib/uri/rfc3986_parser.rb | 93 +++++++++++++++++++++++++++--------------------
2 files changed, 63 insertions(+), 53 deletions(-)
diff --git a/lib/uri/generic.rb b/lib/uri/generic.rb
index c0b94a8..6559cd8 100644
--- a/lib/uri/generic.rb
+++ b/lib/uri/generic.rb
@@ -543,7 +543,7 @@ module URI
# if properly formatted as 'user:password'
def split_userinfo(ui)
return nil, nil unless ui
- user, password = ui.split(/:/, 2)
+ user, password = ui.split(':'.freeze, 2)
return user, password
end
@@ -695,13 +695,7 @@ module URI
# see also URI::Generic.port=
#
def set_port(v)
- unless !v || v.kind_of?(Fixnum)
- if v.empty?
- v = nil
- else
- v = v.to_i
- end
- end
+ v = v.empty? ? nil : v.to_i unless !v || v.kind_of?(Fixnum)
@port = v
end
protected :set_port
@@ -768,13 +762,14 @@ module URI
# If scheme is ftp, path may be relative.
# See RFC 1738 section 3.2.2, and RFC 2396.
- if @scheme && @scheme != "ftp"
- if v && v != '' && parser.regexp[:ABS_PATH] !~ v
+ if @scheme && @scheme != "ftp".freeze
+ if v && v != ''.freeze && parser.regexp[:ABS_PATH] !~ v
raise InvalidComponentError,
"bad component(expected absolute path component): #{v}"
end
else
- if v && v != '' && parser.regexp[:ABS_PATH] !~ v && parser.regexp[:REL_PATH] !~ v
+ if v && v != ''.freeze && parser.regexp[:ABS_PATH] !~ v &&
+ parser.regexp[:REL_PATH] !~ v
raise InvalidComponentError,
"bad component(expected relative path component): #{v}"
end
@@ -849,7 +844,7 @@ module URI
x = v.to_str
v = x.dup if x.equal? v
v.encode!(Encoding::UTF_8) rescue nil
- v.delete!("\t\r\n")
+ v.delete!("\t\r\n".freeze)
v.force_encoding(Encoding::ASCII_8BIT)
v.gsub!(/(?!%\h\h|[!$-&(-;=?-Z_a-~])./n.freeze){'%%%02X'.freeze % $&.ord}
v.force_encoding(Encoding::US_ASCII)
@@ -939,9 +934,9 @@ module URI
x = v.to_str
v = x.dup if x.equal? v
v.encode!(Encoding::UTF_8) rescue nil
- v.delete!("\t\r\n")
+ v.delete!("\t\r\n".freeze)
v.force_encoding(Encoding::ASCII_8BIT)
- v.gsub!(/(?!%\h\h|[!-~])./n){'%%%02X' % $&.ord}
+ v.gsub!(/(?!%\h\h|[!-~])./n){'%%%02X'.freeze % $&.ord}
v.force_encoding(Encoding::US_ASCII)
@fragment = v
end
diff --git a/lib/uri/rfc3986_parser.rb b/lib/uri/rfc3986_parser.rb
index 946f374..3923b06 100644
--- a/lib/uri/rfc3986_parser.rb
+++ b/lib/uri/rfc3986_parser.rb
@@ -4,6 +4,11 @@ module URI
# this regexp is modified not to host is not empty string
RFC3986_URI = /\A(?<URI>(?<scheme>[A-Za-z][+\-.0-9A-Za-z]*):(?<hier-part>\/\/(?<authority>(?:(?<userinfo>(?:%\h\h|[!$&-.0-;=A-Z_a-z~])*)@)?(?<host>(?<IP-literal>\[(?:(?<IPv6address>(?:\h{1,4}:){6}(?<ls32>\h{1,4}:\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]|\d)\.\g<dec-octet>\.\g<dec-octet>\.\g<dec-octet>))|::(?:\h{1,4}:){5}\g<ls32>|\h{1,4}?::(?:\h{1,4}:){4}\g<ls32>|(?:(?:\h{1,4}:)?\h{1,4})?::(?:\h{1,4}:){3}\g<ls32>|(?:(?:\h{1,4}:){,2}\h{1,4})?::(?:\h{1,4}:){2}\g<ls32>|(?:(?:\h{1,4}:){,3}\h{1,4})?::\h{1,4}:\g<ls32>|(?:(?:\h{1,4}:){,4}\h{1,4})?::\g<ls32>|(?:(?:\h{1,4}:){,5}\h{1,4})?::\h{1,4}|(?:(?:\h{1,4}:){,6}\h{1,4})?::)|(?<IPvFuture>v\h+\.[!$&-.0-;=A-Z_a-z~]+))\])|\g<IPv4address>|(?<reg-name>(?:%\h\h|[!$&-.0-9;=A-Z_a-z~])+))?(?::(?<port>\d*))?)(?<path-abempty>(?:\/(?<segment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])*))*)|(?<path-absolute>\/(?:(?<segment-nz>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])+)(?:\/\g<segment>)*)?)|(?<path-rootless>\g<segment-nz>(?:\/\g<segment>)*)|(?<path-empty>))(?:\?(?<query>[^#]*))?(?:\#(?<fragment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~\/?])*))?)\z/
RFC3986_relative_ref = /\A(?<relative-ref>(?<relative-part>\/\/(?<authority>(?:(?<userinfo>(?:%\h\h|[!$&-.0-;=A-Z_a-z~])*)@)?(?<host>(?<IP-literal>\[(?<IPv6address>(?:\h{1,4}:){6}(?<ls32>\h{1,4}:\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]|\d)\.\g<dec-octet>\.\g<dec-octet>\.\g<dec-octet>))|::(?:\h{1,4}:){5}\g<ls32>|\h{1,4}?::(?:\h{1,4}:){4}\g<ls32>|(?:(?:\h{1,4}:){,1}\h{1,4})?::(?:\h{1,4}:){3}\g<ls32>|(?:(?:\h{1,4}:){,2}\h{1,4})?::(?:\h{1,4}:){2}\g<ls32>|(?:(?:\h{1,4}:){,3}\h{1,4})?::\h{1,4}:\g<ls32>|(?:(?:\h{1,4}:){,4}\h{1,4})?::\g<ls32>|(?:(?:\h{1,4}:){,5}\h{1,4})?::\h{1,4}|(?:(?:\h{1,4}:){,6}\h{1,4})?::)|(?<IPvFuture>v\h+\.[!$&-.0-;=A-Z_a-z~]+)\])|\g<IPv4address>|(?<reg-name>(?:%\h\h|[!$&-.0-9;=A-Z_a-z~])+))?(?::(?<port>\d*))?)(?<path-abempty>(?:\/(?<segment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])*))*)|(?<path-absolute>\/(?:(?<segment-nz>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])+)(?:\/\g<segment>)*)?)|(?<path-noscheme>(?<segment-nz-nc>(?:%\h\h|[!$&-.0-9;=@-Z_a-z~])+)(?:\/\g<segment>)*)|(?<path-empty>))(?:\?(?<query>[^#]*))?(?:\#(?<fragment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~\/?])*))?)\z/
+ attr_reader :regexp
+
+ def initialize
+ @regexp = default_regexp.each_value(&:freeze).freeze
+ end
def split(uri) #:nodoc:
begin
@@ -11,42 +16,52 @@ module URI
rescue NoMethodError
raise InvalidURIError, "bad URI(is not URI?): #{uri}"
end
- unless uri.ascii_only?
+ uri.ascii_only? or
raise InvalidURIError, "URI must be ascii only #{uri.dump}"
- end
if m = RFC3986_URI.match(uri)
- ary = []
- ary << m["scheme"]
- if m["path-rootless"] # opaque
- ary << nil # userinfo
- ary << nil # host
- ary << nil # port
- ary << nil # registry
- ary << nil # path
- ary << m["path-rootless"]
- ary[-1] << '?' << m["query"] if m["query"]
- ary << nil # query
- ary << m["fragment"]
+ query = m["query".freeze]
+ scheme = m["scheme".freeze]
+ opaque = m["path-rootless".freeze]
+ if opaque
+ opaque << "?#{query}" if query
+ [ scheme,
+ nil, # userinfo
+ nil, # host
+ nil, # port
+ nil, # registry
+ nil, # path
+ opaque,
+ nil, # query
+ m["fragment".freeze]
+ ]
else # normal
- ary << m["userinfo"]
- ary << m["host"]
- ary << m["port"]
- ary << nil # registry
- ary << (m["path-abempty"] || m["path-absolute"] || m["path-empty"])
- ary << nil # opaque
- ary << m["query"]
- ary << m["fragment"]
+ [ scheme,
+ m["userinfo".freeze],
+ m["host".freeze],
+ m["port".freeze],
+ nil, # registry
+ (m["path-abempty".freeze] ||
+ m["path-absolute".freeze] ||
+ m["path-empty".freeze]),
+ nil, # opaque
+ query,
+ m["fragment".freeze]
+ ]
end
elsif m = RFC3986_relative_ref.match(uri)
- ary = [nil]
- ary << m["userinfo"]
- ary << m["host"]
- ary << m["port"]
- ary << nil # registry
- ary << (m["path-abempty"] || m["path-absolute"] || m["path-noscheme"] || m["path-empty"])
- ary << nil # opaque
- ary << m["query"]
- ary << m["fragment"]
+ [ nil, # scheme
+ m["userinfo".freeze],
+ m["host".freeze],
+ m["port".freeze],
+ nil, # registry,
+ (m["path-abempty".freeze] ||
+ m["path-absolute".freeze] ||
+ m["path-noscheme".freeze] ||
+ m["path-empty".freeze]),
+ nil, # opaque
+ m["query".freeze],
+ m["fragment".freeze]
+ ]
else
raise InvalidURIError, "bad URI(is not URI?): #{uri}"
end
@@ -55,11 +70,11 @@ module URI
def parse(uri) # :nodoc:
scheme, userinfo, host, port,
registry, path, opaque, query, fragment = self.split(uri)
-
- if scheme && URI.scheme_list.include?(scheme.upcase)
- URI.scheme_list[scheme.upcase].new(scheme, userinfo, host, port,
- registry, path, opaque, query,
- fragment, self)
+ scheme_list = URI.scheme_list
+ if scheme && scheme_list.include?(uc = scheme.upcase)
+ scheme_list[uc].new(scheme, userinfo, host, port,
+ registry, path, opaque, query,
+ fragment, self)
else
Generic.new(scheme, userinfo, host, port,
registry, path, opaque, query,
@@ -78,7 +93,9 @@ module URI
@@to_s.bind(self).call
end
- def regexp
+ private
+
+ def default_regexp # :nodoc:
{
SCHEME: /\A[A-Za-z][A-Za-z0-9+\-.]*\z/,
USERINFO: /\A(?:%\h\h|[!$&-.0-;=A-Z_a-z~])*\z/,
@@ -92,8 +109,6 @@ module URI
}
end
- private
-
def convert_to_uri(uri)
if uri.is_a?(URI::Generic)
uri
--
EW
next reply other threads:[~2014-12-22 21:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-22 21:53 Eric Wong [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-12-22 21:03 [PATCH] uri optimizations, too ugly Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ugly-uri-optimizations@r48922 \
--to=e@80x24.org \
--cc=spew@80x24.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).