Understanding ZFS: Compression06 Nov '08 - 09:31 by benr
One of the most appealing features ZFS offers is built in compression capabilities. The tradeoffs are self evident, consume additional CPU but conserve disk space. If your running an OLTP database then compression probly isn't for you, however if you are doing bulk data archiving this could be a huge win.
ZFS is built with the realization that in modern systems we typically have large amounts of memory and CPU available, and we should be provided with the means to put those resources to work. Contrast this with the traditional logic that compression slows things down, because we stop and compress the data before flushing it out to disk, which takes time. Consider that in some situations, you may have significantly faster CPU and Memory than you have IO throughput, in which case it may in fact be faster to read and write compressed data because your reducing the quanity of IO through the channel! Compression isn't just about saving disk space... keep an open mind.
The first important point about ZFS Compression is that its granular. Within ZFS we create datasets (some people call them "nested filesystems", but I find that confusing terminology), each of which has inherited properties. One of those properties is compression. Therefore, if we create a "home" dataset which mounts to "/home", and then create a "home/user" dataset for each user, we can do interesting things, such as apply per-user quotas (disk usage limits) or reservations (set aside space) or, in this context, enable, disable or specify differing types of compression. Some users may want compression, others may not, or you may wish all to use it by default. ZFS gives us a wide range of flexible options. Most importantly, if we change our mind at some point we can change the setting and all new data is compressed, the old uncompressed data is still used as expected. This means that changes are no disruptive, however this does mean that if you really want to later conserve all the disk you can you'd need to enable compression and then slosh all the data off and then back in.
So, how do we enable compression? Simple, use the zfs set compression=on some/dataset command. If we then "get all" properties from a dataset we'll see some interesting information. Here is an example (pruned for length) of my home directory:
root@quadra ~$ zfs get all quadra/home/benr NAME PROPERTY VALUE SOURCE quadra/home/benr type filesystem - quadra/home/benr creation Thu Oct 9 11:33 2008 - quadra/home/benr used 122G - quadra/home/benr available 432G - quadra/home/benr referenced 122G - quadra/home/benr compressratio 1.19x - quadra/home/benr mounted yes - quadra/home/benr quota none default quadra/home/benr reservation none default quadra/home/benr recordsize 128K default quadra/home/benr mountpoint /quadra/home/benr default quadra/home/benr checksum on default quadra/home/benr compression on inherited from quadra/home ...
Here we see that compression is "on", and was inherited automatically from its parent dataset "quadra/home". We can also see the compression ratio above: 1.19x.
But what are our options? Just on or off? Many ZFS properties have simplistic "defaults", in this case "on" means that we use the "lzjb" compression algorithm. We can instead specify the exact algorithm. Current, in fairly modern releases of Nevada/OpenSolaris we have available the default LZJB (a lossless compression algorithm created by Jeff Bonwick, which is extremely fast) and gzip at compression levels 0-9. If you set "compression=gzip" you'll get GZIP level 6 compression, however you can explicitly "set compression=gzip-9". More compression algorithms may be added in the future. (The source is out there, feel free to give us another!)
But how can you see the effect? Did you know the "du" command will show you the on-disk (compressed) size of a file? Lets experiment!
root@quadra ~$ zfs create quadra/test root@quadra ~$ zfs get compression quadra/test NAME PROPERTY VALUE SOURCE quadra/test compression off default
Ok, we have a dataset to play with. I've downloaded Moby Dick and combined into a single text file.
root@quadra test$ ls -lh moby-dick.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:38 moby-dick.txt root@quadra test$ du -h moby-dick.txt 1.8M moby-dick.txt root@quadra test$ head -4 moby-dick.txt .. < chapter I 2 LOOMINGS > Call me Ishmael. Some years ago--never mind how long precisely --having little or no money in my purse, and nothing particular
Alright, so here is Moby Disk in text, weighing in at 1.8M uncompressed. Lets not enable compression (LZJB), copy the file and see how much benefit we get:
root@quadra test$ zfs set compression=on quadra/test root@quadra test$ cp moby-dick.txt moby-dick-lzjb.txt root@quadra test$ sync root@quadra test$ ls -lh total 3.5M -rw-r--r-- 1 root root 1.8M Nov 6 01:40 moby-dick-lzjb.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:38 moby-dick.txt root@quadra test$ du -ah 1.7M ./moby-dick-lzjb.txt 1.8M ./moby-dick.txt 3.5M .
Nice, we're saving some space. Now lets repeat with gzip.
root@quadra test$ zfs set compression=gzip quadra/test root@quadra test$ cp moby-dick.txt moby-dick-gzip.txt root@quadra test$ ls -lh total 4.6M -rw-r--r-- 1 root root 1.8M Nov 6 01:44 moby-dick-gzip.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:40 moby-dick-lzjb.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:38 moby-dick.txt root@quadra test$ du -ah 1.7M ./moby-dick-lzjb.txt 1.8M ./moby-dick.txt 1.1M ./moby-dick-gzip.txt 4.6M .
Ahhhh. Nice gain there. Remember that this is gzip-6 really, lets crank it up to gzip-9!
root@quadra test$ zfs set compression=gzip-9 quadra/test root@quadra test$ cp moby-dick.txt moby-dick-gzip9.txt root@quadra test$ ls -lh total 4.6M -rw-r--r-- 1 root root 1.8M Nov 6 01:44 moby-dick-gzip.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:46 moby-dick-gzip9.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:40 moby-dick-lzjb.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:38 moby-dick.txt root@quadra test$ du -ah 1.7M ./moby-dick-lzjb.txt 1.8M ./moby-dick.txt 1.1M ./moby-dick-gzip.txt 512 ./moby-dick-gzip9.txt 4.6M .
Wow! Thats savings. Just to put this in context, I'll test gzip'ing the file like your used to (using tmpfs, not zfs):
root@quadra test$ cd /tmp root@quadra tmp$ ls -alh moby-dick.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:47 moby-dick.txt root@quadra tmp$ gzip moby-dick.txt root@quadra tmp$ ls -alh moby-dick.txt.gz -rw-r--r-- 1 root root 1.1M Nov 6 01:47 moby-dick.txt.gz
And so we see here that just gzip'ing the file matches the compression I got with gzip (gzip-6) enabled.
But before you get too excited, remember that this is consuming system CPU time. The more compression you do, the more CPU you'll consume. If this is a dedicated storage system your working on, then consuming a ton of CPU for compression may well be worth it (many appliances have fast CPU's for just this reason), however if your running critical apps and CPU really counts, then notch it down or even turn it off. I highly recommend you dry-run your application workload and then load test it hard to see whether or not that extra CPU will be a problem. Whenever possible, try to determine these things before you deploy, not after.
To follow up that idea, remember that we can set differing compression levels on different datasets. You may want to put your application data on an uncompressed dataset, but store less commonly used data or backups on a separate dataset where you've cranked compression up. Get creative!
ZFS is an amazing technology and compression is certainly one of its big attractions for the common user. Workstation always low on disk? Compression to the rescue, no stupid FUSE or loopback tricks required. :)
A Word Of Warning
At this point I do want to warn you of something. Notice that du displays actual disk consumption, not true file size. Now consider the way in which most admins actually use the command... to total up cumulative file sizes. On a typical file system, "du -sh ." will nicely total up all the files, which would be the same as if I tar'ed up the files and looked at the tarball's filesize. When using compression you can not use "du" in this way because the files are larger than the actual disk usage. So you get into potentially confusing situations like this:
root@quadra test$ ls -alh total 5.6M drwxr-xr-x 2 root root 6 Nov 6 01:46 . drwxr-xr-x 8 root root 8 Nov 6 01:33 .. -rw-r--r-- 1 root root 1.8M Nov 6 01:44 moby-dick-gzip.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:46 moby-dick-gzip9.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:40 moby-dick-lzjb.txt -rw-r--r-- 1 root root 1.8M Nov 6 01:38 moby-dick.txt root@quadra test$ du -h . 5.6M . root@quadra test$ tar cfv md.tar moby-dick* moby-dick-gzip.txt moby-dick-gzip9.txt moby-dick-lzjb.txt moby-dick.txt root@quadra test$ ls -lh md.tar -rw-r--r-- 1 root root 7.0M Nov 6 02:11 md.tar
In the real world, this could come to you as a shock if you wanted to rsync a bunch of data, totalled it up using "du" to estimate the bits that need to move, and then got nervous when you moved way more bits than you initially expected because your forgot to take compression into the equitation. So hopefully some of you can learn here not just how ZFS works, but appreciate "du" in a new way as well. :)
wow, this is a great post! I’d heard a lot of good things about ZFS, and the more I hear, the more I hope Sun releases it on Linux. I have my doubts about how soon that will happen, though.
If it weren’t for the lack of a Solaris brain trust here, I’d be tempted to switch. I just don’t have the experience with it. I just keep getting more and more incentive to work with it.
aZ - 06 November '08 - 14:47Is there any way to determine which files have been compressed on the filesystem and which have not? I can see how you can determine the properties of the ZFS filesystems but I don’t see a way of determining the properties associated with a particular object (file, directory, etc.). If you could tell that a file was or was not created with compression, you could write a script which would only “slosh off and back on” files which needed their compression status changed. Similar considerations might apply when you need to adjust other properties (encryption, copies, etc.) as well.
Rand Huntzinger - 06 November '08 - 15:08Good post as usual Ben!
Thanks for such a comprehensive post. I’ve mainly been using ZFS in-house, using it in anger for non-critical stuff so I feel more comfortable when I jump to deploying business critical apps on it.
But your post lead me to thinking – I’ve also been considering throwing together a fat-ass drive array + mobo for HTPC storage accessed via iSCSI [that way I can have a huge quantity of video and audio content, and the noisy box can get stashed somewhere far away from the home theatre. Do you think that ZFS+compression would cope with that – provided the CPU had enough capacity? From the above, I can’t see why not…thus giving me even more “space” for my data!
Also – do you think (Open)Solaris needs to add an legacy option to du to “help” those of us sysadmins who regularly jump between say Solaris, Ubuntu and Mac OS X? And might just forget that we’re not necessarily looking at the actual total file size, which as you say is what we’re accustomed to?
Mark Glossop - 06 November '08 - 16:54Mark,
I’m pretty sure that’s and ideal usage because the CPU would otherwise be pretty idle. It’s where the processing and the storage occur in the same place that you’d get into trouble, such as if you were decoding/encoding data directly on the system with ZFS compression. It all depends on whether you’re I/O bound or CPU bound.
I’m glad to hear about the options to compression. I’ve previously only been using on/off, but I’m interested in trying to get some more space out of backup boxes for which space is much more critical than storage.
One question, how granular is the compression, really? That is, is compression less of an issue with large sequential accesses over small random accesses? Is the compression over each block?
Drew (Email) - 07 November '08 - 18:15Hi there!
My first post at this great blog!
I wanna show u my dayly updated blog: Black Amateur Fuck Video
Have a nice day!
P.S. if you don’t want to see this message please write me to firstname.lastname@example.org with subject “NO ADS” and URL of your forum
Thank you for cooperation!
There are two reasons for this:
1. It is natural to think of compression in simple space/time tradeoff terms. However, every block of data that doesn’t have to be transferred also saves the CPU time required to manage the transfer. Furthermore, in terms of CPU time accounting, that time is often attributed to general system overhead costs and does not get credited to a specific user process, so accounting data will often show the additional CPU cost, but not the related CPU savings.
2. It is natural to think of files which aren’t human readable as random data, and hence not compressible or less compressible than they actually are.
Dave Hamaker (Email) - 04 December '08 - 21:19>When using compression you can not use “du” in this way because the
>files are larger than the actual disk usage.
i don`t have solaris around here at home, but why not using gnu`s “du” ? that one has ”—apparent-size” as an option and iirc, it gave me the real size of compressed files on zfs-fuse.
here`s the manpage: [[http://www.gnu.org/software/coreutils/..]]
roland (Email) - 01 January '09 - 18:03Hey, uh, I downloaded moby.zip from the URL you gave, concatenated all the files inside except for moby.0 (which isn’t in your head output) and README, but the resulting file was only 1202893 bytes. Where are your other 600k coming from?
Also, what’s wrong with LZJB that it could only compress that file by 7% or so? I don’t have an LZJB implementation handy, but LZF (which is usually lightning fast but for some reason takes 100ms on this file) compresses it by 40% down to 737181 bytes, and gzip compresses it by 53% to 59% (569776 to 487367 bytes, -1 and -9), not the mere 49% you got. And what’s up with gzip compressing the file to 512 bytes the second time?
good post,I think so!
Thanks for your information, i have read it, very good！
Bing is a really overlord!! support Bing~~
This is great news. Best of luck for the future and keep up the good work. [[http://www.linkslife.uk.com]] links of london
[[http://www.linkslife.uk.com]] links of london jewellery [[http://www.offerreplicawatches.com.The..]] sell Chanel Watches, Replica BRM Watches,offer Cartier Watches, Copy Ebel Watches, Replica Hublot Watches, Fake Tag Heuer Watches, wholesale Richard Mille Watches.
They also sell some very fashion watches, such as Montblanc Watches for sale,Chopard Watches, Dior Watches and Gucci Watches. Rolex watches are hot sell. If u are Armani Watches and Breitling Watches fan. You can still look at Omega Watches and Longines Watches, many new arrivals watches just come here.You can have a try on 58338312391273913. :[url=[[http://www.offerreplicawatches.com]www..]]]. [[http://www.adapterlist.com/toshiba/sat..]] We provide the highest quality Battery at the lowest price with the highest level of service, all in a secure and convenient platform. [[http://www.globallaptopbattery.co.uk/s..]] ,laptop AC adapters. All our products are brand new, with the excellent service from our laptop battery of customer service team.
The easy-to-use Video Converter for Mac lets you to enjoy your videos on all sorts of palyback including PSP, iPod, Mobile Phone, Zune, iPhone, Apple TV and MP4/MP3 player.
Free download supported. [[http://www.videoconverterformac.com]] [[http://www.airjordanshoescheap.com/]] [[http://www.cuddletech.com/blog/pivot/e..]]
I think we can make friends. [[http://www.mbtshoeslatest.com]]
Welcome to our website: [[http://www.mbtshoesmasai.com]] [[http://www.ghdiron-outlet.com/]] to win the cheap ghd. [[http://www.ghdoutlet-au.com/.]]. [[http://www.ghdoutlet-uk.com/index.php]] to win the ghd iv styler. [[http://www.lovemypursemall.com]]
accessorize [[http://www.china-wholesale-directory.c..]] Top China Wholesalers category.. thanks for sharing the article!
uCoolStuff is the leading China wholesaler for [[http://www.ucoolstuff.com]] cool stuff , [[http://www.ucoolstuff.com]] cool gifts , unusual gadgets and other unique gift ideas. We provide the very latest cool stuff and cool gifts for you [[http://www.sellnikeairmax.com/]] LIJ
nike air max (URL) - 14 September '10 - 03:18Top Online Stores is a SEO Friendly [[http://www.toponlinestores.org]] free directory where you can find the best online shopping stores selected by hand and sorted by category. [[http://www.timberland4you.co.uk/]] YQ A sense of acceptance moves the relationship with full of joy. Women want to be with that man who is affirmative with his career, goals, ambitions and relationship. They prefer these types of men and they are so rare these days.Here was someone immersed in a search for truth and beauty. Words had been cheap timberland boots treasured, words that were beautiful. And I felt as if the words timberland waterproof boots somehow delighted in being discovered, for they were obviously very generous to the as yet anonymous timberland work boots writer of the notes. And now this person was in turn learning the secret of sharing them. Beauty so shines when given away. There were timberland 6 inch sensitivity and a beauty to her that have nothing to do with timberland 6 inch boots looks. She was one to be listened to, whose words were so easy to take to timberland boots sale heart.I used to find notes left in the collection basket, beautiful timberland boots outlet notes about my homilies and about the writer’s black timberland boots thoughts on the daily scriptural readings. The person who penned the notes mens timberland boots would add reflections to my thoughts and would always include some timberland pro boots quotes from poets and mystics he or she had white timberland boots read and remembered and loved. The notes fascinated black timberland shoes uk me.The only truth that exists timberland hiking boots is, in that sense, free. [[http://www.timberland6inch.com/]] CLQ