Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Comments

top

Wayland/Weston Gets Forked As Northfield/Norwood

psocccer Re:Explanation (252 comments)

3. It needs some kind of middle layer so that you can move applications between displays, and displays between consoles. Think something like screen or tmux. Once you launch an app on a display, it is stuck there.

I know I'm late to the party, but you can do this using xpra. Still works around the x display idea though, so you can't attach/detach individual windows, but you start them attached to xpra then you attach your x display to xpra. So very much like screen, but I think tmux has some more advanced functions to move windows between tmux sessions.

about a year ago
top

Google Upgrades Chrome To Beta For OS X, Linux

psocccer Re:Beware Google's penchant for auto-updates... (197 comments)

The OP might not be completely wrong, according to a dpkg-query -L google-chrome-beta it installs some stuff to /etc/cron.daily/google-chrome which apparently adds an extra source to your apt sources then updates google chrome based on some settings in your /etc/default/google-chrome. It also adds the source to /etc/apt/sources.list.d. Seems a bit invasive to me.

more than 4 years ago
top

ZFS Gets Built-In Deduplication

psocccer Been wanting something like this for a long time (386 comments)

A lot of times these days I use rsync to do hard linked backups, which works mostly well but has some shortcomings. For example, backups across multiple machines don't have their duplicate files hardlinked, and files that are mostly similar can't be hard linked, such as files that grow like log files. More specifically we have some database files that grow with yearly detail information and everything before the newly added records is identical, resulting in gigs of used up space every day during backups when maybe a few megs has changed.

Initially I liked the way BackupPC handled the situation by pooling and compressing all the files, and duplicate files from different backups were automatically linked together. So I wrote a little script that primarily duplicated the the functionality of hardlinking duplicate files together regardless of file stat, running on top of fusecompress to get the compression too. The problem mostly is time though to crawl thousands and thousands of files and relink them. On top of that, rsync will not use those duplicate files for hardlinks in the next backup if the file stat info doesn't match, like mtime/owner/etc which means the next backup contains fresh new copies of files that have to be re-hardlinked by crawling the files again. Plus you don't get elimination of partial file redundancy.

So I looked around some more for a system that would allow you to compress out redundant blocks, and the closest thing I could find is squashfs, but it's read-only. Which sucks because we need to purge daily local backups occasionally to make more room for newer backups. We keep the last 6 month of daily backups available on a server, and do daily offsite backups from that. So once a month we delete the oldest months backups from the local backup server, and using squashfs you'd have to recreate the whole squash archive, which would suck for a terabyte archive with millions of files in it.

At this point I knew what features I wanted but couldn't find anything that did it yet, so I went ahead and wrote a fuse daemon in python that handles block-level deduplication and compression at the same time. I'm still playing around with it and testing different storage ideas, it's available in git if anyone wants to take a look, you can get it by doing:

git clone http://git.hoopajoo.net/projects/fusearchive.git fusearchive

(note the above command might be mangled because of the auto-linking in slashdot, there should be no [hoopajoo.net] in the actual clone command)

Currently it uses a storage directory with 2 sub directories, store/ and tree/. Inside tree/ are files that contain a hash that identifies the block list for the file contents. This way 2 identical files will only consume the size of a hash on disk + inodes. The hash points the the block that contains the file data block list, which is also a list of hashes of the data. This way any files that have identical blocks (on a block boundry) will have the redundant blocks only take up the size of the hash. Blocks are currently 5M, which can be tuned, and the blocks are compressed using zlib. So a bunch of small files get the benefit of compression and entire-file deduplication while large growing files will at most use up an extra block or data + the hash info for the rest of the file. So far this seems to be working pretty well, the biggest issues I have is tracking block references so we can free the block when it's no longer referenced by any files. It works fine currently but since each block contains it's own reference counter a crash could make the ref counts incorrect, and unfortunately I can't think of a better, more atomic way to handle that. The other big drawback is speed, it's about 1/3 the speed of native file copying, and from profiling the code 80-90% of the time seems to be spent passing fuse messages in the main fuse-python library, with a little time being taken up by zlib and actual file writes.

If I could get something like that from a native filesystem that also supported journaling so you didn't have the refcount mess that would be pretty sweet. Plus I wouldn't have to waste time developing and supporting it :p

more than 4 years ago
top

OLPC Spinoff Pixel Qi Merges E-ink With LCD

psocccer Re:e-Ink? (78 comments)

Or that search company...what's it called? Gloople? Gorgon? Giggle?

I don't know, maybe you should google it?

more than 5 years ago
top

Is the One-Size-Fits-All Database Dead?

psocccer Re:Dammit (208 comments)

Well comments and replies naturally lend themselves to a tree, and the obvious way to store them is by using some self-referential parentid to the same table. In practice this becomes difficult for exactly the reason you cited, no recursion. But recursion is hard to optimize for a database which is why I presume it's not built in to SQL, but the answer for modeling trees in SQL is to use nested sets which allow you to extract parts of a tree and determine the depth at the same time, it's a very fast operation because you are simply selecting a range of numbers which databases are very good at.

more than 7 years ago

Submissions

psocccer hasn't submitted any stories.

Journals

psocccer has no journal entries.

Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...