From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.8 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, URIBL_BLOCKED shortcircuit=no autolearn=unavailable version=3.3.2 X-Original-To: misc@80x24.org Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 898D1633832; Wed, 16 Sep 2015 20:16:33 +0000 (UTC) Date: Wed, 16 Sep 2015 20:16:33 +0000 From: Eric Wong To: misc@80x24.org Subject: disk might be cheap, everything else isn't Message-ID: <20150916-disk-might-be-cheap@everything-else-isnt> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline List-Id: TL;DR: Compress your data, and do it early. Disk latency is high. Disks (including SSDS) wear out faster. Memory (for cache) is expensive. Memory bandwidth is expensive. Memory latency is high. Network bandwidth is expensive. Network latency is high. Storage bus bandwidth (SAS, SATA, USB, etc) is expensive. Storage bus latency sucks. The CPU overhead for common zlib-based compression is relatively inexpensive compared to these things. Everything that gets stored on disk is expected to be read at some point. Reading that will use memory and memory bandwidth on just about any OS. Memory used for caching is not cheap and neither is memory bandwidth and latency. Sure one could use O_DIRECT, an interface designed by deranged monkeys[2] to avoid the caching, but it is tricky to use and most apps need to be modified to use it. Transparent compression at the filesystem or virtual memory[1] layers helps at some points, but becomes worthless once your data needs to be transferred to other machines which do not compress transparently. As a bonus, compression formats such as FLAC and gzip tend to come with integrity checking, too, giving you extra piece-of-mind when you have unreliable hardware. Sometimes compression does not even require special algorithms or code. It could be as simple as choosing tabs over spaces for indentation to get a 16% improvement in grep performance :) http://mid.gmane.org/20071018024553.GA5186@coredump.intra.peff.net ("Re: On Tabs and Spaces" - Jeff King on the git mailing list) Footnotes: [1] https://en.wikipedia.org/wiki/Virtual_memory_compression [2] http://man7.org/linux/man-pages/man2/open.2.html