80x24.org misc. Free Software, open data formats/protocols discussion
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: misc@80x24.org
Subject: disk might be cheap, everything else isn't
Date: Wed, 16 Sep 2015 20:16:33 +0000	[thread overview]
Message-ID: <20150916-disk-might-be-cheap@everything-else-isnt> (raw)

TL;DR: Compress your data, and do it early.

Disk latency is high.
Disks (including SSDS) wear out faster.
Memory (for cache) is expensive.
Memory bandwidth is expensive.
Memory latency is high.
Network bandwidth is expensive.
Network latency is high.
Storage bus bandwidth (SAS, SATA, USB, etc) is expensive.
Storage bus latency sucks.

The CPU overhead for common zlib-based compression is relatively
inexpensive compared to these things.

Everything that gets stored on disk is expected to be read at some
point.  Reading that will use memory and memory bandwidth on just
about any OS.  Memory used for caching is not cheap and neither is
memory bandwidth and latency.

Sure one could use O_DIRECT, an interface designed by deranged
monkeys[2] to avoid the caching, but it is tricky to use and
most apps need to be modified to use it.


Transparent compression at the filesystem or virtual memory[1]
layers helps at some points, but becomes worthless once your
data needs to be transferred to other machines which do not
compress transparently.


As a bonus, compression formats such as FLAC and gzip tend to come
with integrity checking, too, giving you extra piece-of-mind when
you have unreliable hardware.


Sometimes compression does not even require special algorithms or
code.  It could be as simple as choosing tabs over spaces for
indentation to get a 16% improvement in grep performance :)

   http://mid.gmane.org/20071018024553.GA5186@coredump.intra.peff.net
   ("Re: On Tabs and Spaces" - Jeff King on the git mailing list)


Footnotes:
[1] https://en.wikipedia.org/wiki/Virtual_memory_compression
[2] http://man7.org/linux/man-pages/man2/open.2.html

             reply	other threads:[~2015-09-16 20:16 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-16 20:16 Eric Wong [this message]
2016-08-09 19:09 ` disk might be cheap, everything else isn't Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150916-disk-might-be-cheap@everything-else-isnt \
    --to=e@80x24.org \
    --cc=misc@80x24.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).