80x24.org misc. Free Software, open data formats/protocols discussion
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: misc@80x24.org
Subject: disk might be cheap, everything else isn't
Date: Wed, 16 Sep 2015 20:16:33 +0000
Message-ID: <20150916-disk-might-be-cheap@everything-else-isnt> (raw)

TL;DR: Compress your data, and do it early.

Disk latency is high.
Disks (including SSDS) wear out faster.
Memory (for cache) is expensive.
Memory bandwidth is expensive.
Memory latency is high.
Network bandwidth is expensive.
Network latency is high.
Storage bus bandwidth (SAS, SATA, USB, etc) is expensive.
Storage bus latency sucks.

The CPU overhead for common zlib-based compression is relatively
inexpensive compared to these things.

Everything that gets stored on disk is expected to be read at some
point.  Reading that will use memory and memory bandwidth on just
about any OS.  Memory used for caching is not cheap and neither is
memory bandwidth and latency.

Sure one could use O_DIRECT, an interface designed by deranged
monkeys[2] to avoid the caching, but it is tricky to use and
most apps need to be modified to use it.


Transparent compression at the filesystem or virtual memory[1]
layers helps at some points, but becomes worthless once your
data needs to be transferred to other machines which do not
compress transparently.


As a bonus, compression formats such as FLAC and gzip tend to come
with integrity checking, too, giving you extra piece-of-mind when
you have unreliable hardware.


Sometimes compression does not even require special algorithms or
code.  It could be as simple as choosing tabs over spaces for
indentation to get a 16% improvement in grep performance :)

   http://mid.gmane.org/20071018024553.GA5186@coredump.intra.peff.net
   ("Re: On Tabs and Spaces" - Jeff King on the git mailing list)


Footnotes:
[1] https://en.wikipedia.org/wiki/Virtual_memory_compression
[2] http://man7.org/linux/man-pages/man2/open.2.html

             reply index

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-16 20:16 Eric Wong [this message]
2016-08-09 19:09 ` Eric Wong

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150916-disk-might-be-cheap@everything-else-isnt \
    --to=e@80x24.org \
    --cc=misc@80x24.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

80x24.org misc. Free Software, open data formats/protocols discussion

Archives are clonable:
	git clone --mirror https://80x24.org/misc
	git clone --mirror http://ou63pmih66umazou.onion/misc

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.org.80x24.misc
	nntp://ou63pmih66umazou.onion/inbox.org.80x24.misc

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox