80x24.org misc. Free Software, open data formats/protocols discussion
 help / color / mirror / Atom feed
* disk might be cheap, everything else isn't
@ 2015-09-16 20:16 Eric Wong
  2016-08-09 19:09 ` Eric Wong
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Wong @ 2015-09-16 20:16 UTC (permalink / raw)
  To: misc

TL;DR: Compress your data, and do it early.

Disk latency is high.
Disks (including SSDS) wear out faster.
Memory (for cache) is expensive.
Memory bandwidth is expensive.
Memory latency is high.
Network bandwidth is expensive.
Network latency is high.
Storage bus bandwidth (SAS, SATA, USB, etc) is expensive.
Storage bus latency sucks.

The CPU overhead for common zlib-based compression is relatively
inexpensive compared to these things.

Everything that gets stored on disk is expected to be read at some
point.  Reading that will use memory and memory bandwidth on just
about any OS.  Memory used for caching is not cheap and neither is
memory bandwidth and latency.

Sure one could use O_DIRECT, an interface designed by deranged
monkeys[2] to avoid the caching, but it is tricky to use and
most apps need to be modified to use it.


Transparent compression at the filesystem or virtual memory[1]
layers helps at some points, but becomes worthless once your
data needs to be transferred to other machines which do not
compress transparently.


As a bonus, compression formats such as FLAC and gzip tend to come
with integrity checking, too, giving you extra piece-of-mind when
you have unreliable hardware.


Sometimes compression does not even require special algorithms or
code.  It could be as simple as choosing tabs over spaces for
indentation to get a 16% improvement in grep performance :)

   http://mid.gmane.org/20071018024553.GA5186@coredump.intra.peff.net
   ("Re: On Tabs and Spaces" - Jeff King on the git mailing list)


Footnotes:
[1] https://en.wikipedia.org/wiki/Virtual_memory_compression
[2] http://man7.org/linux/man-pages/man2/open.2.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-08-09 19:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-16 20:16 disk might be cheap, everything else isn't Eric Wong
2016-08-09 19:09 ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).