Can You Compress Berkeley DB Databases? 10
Paul Gear asks: "I'm wanting to create a database using Berkeley DB that includes a lot of textual information, and because of the bulk of the data and its obvious compressibility, I was wondering whether it was possible to have the DB libraries automatically compress it at the file level, rather than me compressing each data (rather small) item before putting it in (which would result in much less gain in compression). Section 4.1 of the paper "Challenges in Embedded Database System Administration" talks about automatic compression, but that is the only place in the documentation that it is mentioned. Can anyone point me in the right direction?"
OS (Score:1)
If you end up rolling your own solution ... (Score:1)
http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html
I've used it in an embedded app to decompress/overlay main applications from rom to ram and can vouch for it's decompression speed.
Tradeoffs (Score:1)
Re: mifluz (Score:1)
You should consider contacting Loic Dachary--his address is on the Senga project pages.
from my experience (Score:1)
If you use zlib's replacements for fread, fseek, etc. things will be VERY slow. I tried this aproach with WordNet [princeton.edu] databases and it sucked. Well, WordNet does a lot of seeks, but still.
Teaching the Berkeley DB functions to use libz will be extremely painful too. I'd say, your best bet would be to try the new Linux's filesystem extension, mentioned already by someone else. But I'm not sure how efficient it is with respect to reading/uncompressing/compressing back things, which are read and/or mmaped. In any case, you will not be modifying any code -- just the file's attribute.
I'm afraid, you'll defeat most of the DB's tricks, that know about the sector size and other file-system details.
You could also just uncompress the file into memory and then give it to the database functions, but then you loose the automatic syncronization with the file on the filesystem, which for many is the main reason of using Berkley DB (or gdbm) in the first place.
Database choice? (Score:1)
fs-level compression Re:OS (Score:2)
FWIW, linux can do this on the ext2fs as well, with chattr +c filename. There are analogs in other unix operating systems and filesystems as well. :-) (man chattr for more info)
--
Performance bottleneck. (Score:2)
ZlibC (Score:3)
Zlibc is a read-only compressed file-system emulation. It allows executables to uncompress their data files on the fly. No kernel patch, no re-compilation of the executables and the libraries is needed. Using gzip -9, a compression ratio of 1:3 can easily be achieved! (See examples below). This program has (almost) the same effect as a (read-only) compressed file system.
See the web page for more.
Baz
Re:Tradeoffs (Score:3)
The caching won't save you from uncompressing the blocks repeatedly. If you really want to compress the database metadata, you basically need a block-oriented compressed filesystem that allows random access within compressed files. I don't know if such a thing already exists, but it's effectively what you'd be writing to do it...
I'd just use zlib to compress the individual entries and not try to compress the entire database as a whole. I've done this before, and it actually works better than you'd think. Even with data entries as small as 50-100 bytes, you get reasonable compression. Yes, you'd get much better compression across the entire database, but you can't hope to access a fully-compressed database without uncompressing it or doing a lot of work to make random-access possible. (And like I said, at that point you might as well be making a compressed filesystem.)