File Organization — How Do You Do It In 2011?

timothy posted more than 3 years ago

Data Storage

siddesu writes "After 30 years of being around computers, I have, like everyone else, amassed a huge amount of files in huge amount of formats about a huge amount of topics. And it isn't only me — the family has now a ton of data that they want managed and easily accessible. Keeping all that information in order has always been a pain, but it has gone harder as the storage has increased and people and files and sizes have multiplied. What do you folks use to keep your odd terabyte of document, picture, video and code files organized — that is, relatively uniformly tagged, versioned, searchable and ultimately findable, without 50 duplicates over your 50 devices and without typing arcane commands in a terminal window? I found this discussion from 2003 and this tangentially relevant post from 2006. How have things changed for you in 2011? And how satisfied is your extended family with the solution you have unleashed upon them?"

Directories (4, Informative)

Anrego (830717) | more than 3 years ago | (#35192644)

.. seriously.. they still work for me.

I’ve got a 12TB file server (~6TB filled). It’s arranged as follows:

incoming_downloads/ (before you ask.. yes.. _legit_ downloads)

That’s always been enough for me. Never got into all this tagging/meta data stuff. If there’s anything I’d ever want to search on... I put it in the file name. Indexed every night via slocate.

backup_links is part of my hacked together backup system.

The thing is raid6, setup so two drives can fail without loss of data. I see this as adequate “backup” for stuff that is replaceable (the large portion of my media is rips of DVDs I own... so although it would be a huge pain in the ass to re-rip them all... it’s not impossible). Stuff that is irreplaceable, I backup to separate hard drives (via hot swap trays).

I leave one backup drive plugged into the machine, and keep the other elsewhere. I periodically swap these drives. I have a script that just rsyncs the files and directories pointed to in backup_links (the irreplaceable ones) to the currently plugged in drive (and yes I verified that I’m not getting a backup of my links ;p). This way I always have one drive that has a pretty recent backup (runs nightly), and one drive that has at most a month or so old backup if the plugged in one fails for some reason.

backups is backed up files from other machines.

Keeping everything in one place helps with the organization I think. Most of the other machines on this network are basically just OS installs. All the real files are on the file server. My desktop runs of a small SSD, which is not even half filled.

No Porn? (4, Funny)

Anonymous Coward | more than 3 years ago | (#35192652)

I think you left a directory out. ;)

Re:No Porn? (1)

lordandmaker (960504) | more than 3 years ago | (#35192676)

What do you think is in media/video/movies/ ?

Joking aside, that's pretty much what I do. I've never had some turning-point moment where I've thought "I need to do this differently", and it does still Just Work.

Though my collection is way under 1TB at the minute, so I suppose i'm still where everybody else was about five years ago.

Re:No Porn? (1)

j_l_cgull (129101) | more than 3 years ago | (#35192876)

Joking aside, that's pretty much what I do. I've never had some turning-point moment where I've thought "I need to do this differently", and it does still Just Work.

Have been in this boat for a while and just out of curiosity tried to move into something more sophisticated. Using Mac OS X, I was unwilling to get into any organizing software that stores the data in propitiatory format - making it difficult/impossible to get the data out. Ended up with EagleFiler, which uses tags that can go into/come out of OpenMeta. Can also use directory structure if that's your style. Since the data is just regular file system contents, you can get to the data using any means.

Text searches are far better than Spotlight's performance.

Using two different external USB drives for backups using TimeMachine ensures some amount of safety from hardware failures (have had to restore once when the internal HD died).

Re:No Porn? (0)

Anonymous Coward | more than 3 years ago | (#35192698)

I think you left a directory out. ;)

/documents/incoming_downloads/ (before you ask..yes..._legit_ downloads)

Re:Directories (5, Interesting)

RuiFerreira (791654) | more than 3 years ago | (#35192666)

I basically use the same structure as you but I have an extra directory called "attic" where in practice I end up putting everything.

Re:Directories (1)

quoob (791191) | more than 3 years ago | (#35192756)

I also have an "attic" directory and I use it for files that don't need to be backed up in my regular backup schedule. It contains stuff that I've already archived or that wouldn't be badly missed if I lost it. I'm also considering a adding a place for stuff I know won't be of any use or interest after a few years. I do this already with my dead tree files of receipts and such. I keep things in the order filed and throw out some of the oldest when the file drawer gets full. I should be able to do this for similar computer based data by occasionally purging files with old dates.

Re:Directories (2, Funny)

Anonymous Coward | more than 3 years ago | (#35192950)

I used to download stuff directly on my desktop (my habits have since then changed). One day, my desktop was completely full (meaning that there was no more iconspace, I know I could've just used an Explorer window but it was too much of a hassle), so I made a folder called "shit", and I dragged all of my stuff in there (most of the stuff was shit, so the folder was aptly named). It took a month or two to refill my desktop, then I made another folder called "more shit", and I again dragged everything on the desktop in, including the previous shit folder. When it was time to move my files to a new machine, all I had to move was about 10 documents in my Documents folder, my Music folder, and a folder called "megashit", which contained six nested levels of folders named after some variation of "shit", and with a total size of 200 gigabytes.

Re:Directories (0)

Anonymous Coward | more than 3 years ago | (#35192670)

I've also always wondered how people get used to tags. With tagging, it feels as if you put a tag on something then just release that something into the cloud-of-disorganization.

Re:Directories (0)

Anonymous Coward | more than 3 years ago | (#35192672)

Same here. Use directories to organize files, and some homebrew to backup files to external drives and other computers. At any one time I have about 7 copies of my home directory less than a month out of date from current.

Keep the irreplaceable stuff in a separate tree (3, Insightful)

traindirector (1001483) | more than 3 years ago | (#35192744)

I also still use a similar directory structure, but I've made once change in the past few years that makes it much easier to manage: I keep the special, personal, irreplaceable in a separate hierarchy.

This negates the need for something like a backup_links directory, and makes it much easier to just share the "normal" media directory with everyone/thing on my home network and then handle permissions on the personal stuff with more granularity. It's also much easier when I know I'm looking for a photo I've taken or a document I've made that it'll be in the personal hierarchy under those categories rather than the main ones.

It's a small change, but keeping a separation between stuff I've made and the easily replaceable stuff I've acquired has gone a long way to making my personal data and treasures more secure--both from loss and accidental sharing.

Re:Directories (0)

Anonymous Coward | more than 3 years ago | (#35192748)

Yep, this is still the most straightforward way to do it. I have a similar layout, though I also have "porn" (obvious) and "software" (for installers, applications, utilities, etc). I also have a "holding_tank" that I just drop everything into; when it gets too full, I move everything out of there and into the proper directories. Also have "emulation" to hold all my emulators and ROMs (NES and SNES mostly). Finally, I have a separate "retail" directory that holds anything that I paid for and the related receipts, serial keys, and whatever else.

Same here, Directories are fine. (2)

guidryp (702488) | more than 3 years ago | (#35192786)

I have media drives that hold the bulk and they are easily organized into games/pictures/books/movies/tv/music. Smaller document/coding directories are on my C drive for source/text/spreadsheets I make myself.

I don't tag anything. For my pictures. I simply name the directories Year_date_mainContent. (ex 2010_12_25_Xmas). Media names are self evident, but I also run XBMC for video, so I guess that has internal tagging. But still easy to find video outside of XBMC which I only use about 50% of the time.

I almost never even use search to find things, because the layout is very logical and it is pretty much obvious where everything is.

Everything is online and in my computer, multiple TB drives. No raid.

For backup I simply use external esata multiple TB drives and FreeFileSync, that I run once/week.

Re:Directories (1)

dixonpete (1267776) | more than 3 years ago | (#35192806)

ya, and what happens if you have a fire and all your equipment melts?

Re:Directories (0)

Anonymous Coward | more than 3 years ago | (#35192928)

He easily restores from backups? Organized files are easy to back up. Disorganized files that are "organized" by means of meta tags and scattered all over a hard drive are much much more difficult to back up.

Re:Directories (0)

Anonymous Coward | more than 3 years ago | (#35192998)

ya, and what happens if you have a fire and all your equipment melts?

Then he restores his "irreplaceable" stuff from his offsite hard drive. Which he specified.

Jeez, I know people don't read TFA, but at least read the comment you are snarkily replying to.

Re:Directories / split over machines / crashplan (1)

rvw (755107) | more than 3 years ago | (#35192942)

Another option is to split data over machines. Use a media server, and keep media there. Maybe pictures and personal video's can be on your normal laptop as well, but your MP3 collection, movies and tv-shows don't need to be on your laptop.

Then I use one old desktop with Ubuntu on it as backup-machine. I use Crashplan for this. It has a free option to backup to your own machine, or to backup to a friend's machine. I backup several machines (from my parents as well) to this one machine. Then this one machine can be backed up online for $5/month, no limit. (You have unlimited storage, only limited by the upload speed. But as we've seen with Mozy, that can change very quickly.)

Re:Directories (1)

bazorg (911295) | more than 3 years ago | (#35192972)

... and what if your family add their own files and do not observe your directory rules? What if someone joins the household and bring their own 500GB of files with a different directory structure?

Re:Directories (2)

timeOday (582209) | more than 3 years ago | (#35193066)

I agree, his structure appeared to be for only 1 person. The top level of organization, really, is /home/username, with permissions set so people can't accidentally mis-file things under others' directory. I've still found that useful on my home computers even though there's no concept of enforced 'security' between users (they could all sudo to each other without a password, though they don't know it). This is because I have one desktop with a submenu for each person that launches their apps in their home directory as them. Trying to make them log in and out to run each program would never work, too much hassle, besides it would kill all the previous person's programs which they usually just leave running.

Of course, sharing is where it gets complicated. The music for my wife and I is in one directory structure. I used to just dump kid stuff under a "kids" subdirectory, but as they're developing individual tastes that doesn't really work either.

Re:Directories (0)

Anonymous Coward | more than 3 years ago | (#35192980)

I completely agree, a proper directory tree will help you find anything in seconds. How hard is it to create


This has worked for me with data from a few MB up to a couple of TB. And, for those odd cases where I'm having trouble finding something, there's the locate and find commands. As long as the files and folders are properly named life is easy.

Re:Directories (1)

sb98052 (857171) | more than 3 years ago | (#35192986)

I think what you mean is it still doesn't work for you. Some things suck less than they did in 2003, but not this.

Re:Directories (1)

isopropanol (1936936) | more than 3 years ago | (#35193000)

Similar here, but RAID is RAID1 for the partition where the OS, email (IMAP on Maildirs), and must-not-go-down files are, RAID0 where the must-be-recoverable and re-downloadable stuff goes. Backup is via rdiff-backup (plain files) and cp --sparse=always (sparse iSCSI shares) to an eSATA drive cradle.

Also, I ran multiple backup/recovery drills for scenarios such as accidental deletion or overwrite of files, Partial failure (one mirror of the Raid1, all of the RAID0), and complete failure on a VM before I even ordered the hardware.

Re:Directories (1)

TeknoHog (164938) | more than 3 years ago | (#35193018)

Same here. I also use symlinks to organize music based on genre, even though all artist directories are under the music dir. This way it is also possible to file one band under multiple genres.

Actually, pathnames are also metadata. Or at least they can be used to provide a lot of metadata, when properly used. Fancy metadata/tagging systems also need some discipline to work, and sloppy people can lose track of their data despite the fancy tools. I choose the old-fashioned way as it works for me, and is readily accessible with a variety of programs.

I also have a simple filing system [iki.fi] for music burned to DVDs. Basically, after burning a new disc I do "find > disc#". To find a filename, I simply grep for it in a directory with all these files. Usually it is enough to find just the disc number.

Arcane commands (0)

Anonymous Coward | more than 3 years ago | (#35192658)

What's wrong with typing arcane commands at the terminal? And who said I unleashed that upon my extended family?

Re:Arcane commands (0)

Arker (91948) | more than 3 years ago | (#35192792)

That was the one part of the question that makes no sense to me. I have the same issues and was hoping to find some good answers posted, but then I notice that the question is pre-rigged to exclude the best answers!

Face it, searching a file system for relevant documents is one of those tasks where the best tool for the job is always going to be a command line. It doesnt have to be "arcane" but then that seems to be a word that is (mis)used solely by anti-command-line people to refer to any command line, not just those that are unnecessarily complex or difficult for humans to parse.

Home Server Solution (0)

Anonymous Coward | more than 3 years ago | (#35192674)

i use windows home server and fill the shares with nicely named subfolders. around 6tb by now.

one could use linux for that, too. just make sure it's something that hides physical disks away from you.

My solution (1)

Average_Joe_Sixpack (534373) | more than 3 years ago | (#35192680)

If hasn't been accessed in 3 years then it gets deleted [porn files 6 months].

Re:My solution (0)

Anonymous Coward | more than 3 years ago | (#35192962)

If I adopted your solution then a good 50% of my Photo Archive would be gone forever.
Then I wouldn't have been able to sell that picture of the two puffins colliding in flight that I took in 2003 for $500.

Oh well.

Re:My solution (1)

woboyle (1044168) | more than 3 years ago | (#35192994)

Well, I have data that I want to keep, but haven't looked at in 5-10 years (or more). Just like a book on the shelf. I may not have read it in a dog's age, but I still want to keep it (when was the last time you read Little Women/Little Men?). So, we need a means to archive, but keep accessible, data that has value over time, but may not be currently relevant.

Mac Spotlight (0)

Anonymous Coward | more than 3 years ago | (#35192692)

For personal documents and such, I don't even worry about it any more. Just make sure that I don't put anything important in /tmp, and then let Spotlight find it for me. The only problem is when you have several versions of the same document, and you they all hit on the Spotlight search. So you just need to discipline yourself to not have a zillion copies of them (let Time Machine do that for you).

Directories are fine until you forget what your taxonomy is, plus the "let's build an arbitrary genus on the fly" syndrome.

Nah. Ad hoc organizing when I feel like it, but Spotlight find them for me no matter how bad I mess it up.

Re:Mac Spotlight (1)

sauge (930823) | more than 3 years ago | (#35192856)

I second this - I have come to find spotlight (Apple OS X's search tool) to be wonderful. It can search through everything - email entries, pdfs, text files, word processing files, etc. (And I am a find . -name "*" -exec grep -Hi searchterm {} \; user.)

Add to that, Apple's file system allows comments and tags to be added as file attributes that spot light searches on.

Re:Mac Spotlight (1)

newcastlejon (1483695) | more than 3 years ago | (#35193008)

I think my Spotlight must be broken... It used to work just fine but I typed in "Dracula" before I started this comment: two minutes later I gave up on moaning and decided to moan on slashdot. I store all my rips in either /Volumes/External/Movies or /Volumes/External/TV and "Dracula, Dead and Loving it" is in the former. Oddly, Spotlight doesn't find what I want when I search for "Dracula", but has no problems with "dead".

Frankly, I used to use spotlight all of the time but since realising it was often quicker just to find stuff myself I practically gave up. The problem wasn't a lack of speed per se, more that it randomly refuses to find some search terms.

Hierarchical Diectories (1)

Dakiraun (1633747) | more than 3 years ago | (#35192702)

Hierarchical named directory structures is how I organize things. I've actually been relatively conservative with the data I keep around, and have about 600g of it, with maybe 100g being irreplaceable. Everything is organized via an appropriately named directory with appropriately named sub-directories, sub-sub-directories and so on. The files themselves are also named with an appropriate name as to contents. I was doing this long before libraries and "tags" and stuff came along, and I've just always kept doing it that way since I just don't have the time to go back and "tag" thousands if not tens of thousands of files. For me, this named-directory approach has been best due to it's simplicity - the structure is easily transferable to any OS, and easily understood by anyone that sees it. It requires no application to handle it or interpret it either. I can't see myself deviating from this method even with 10 times more data as it would continue to be effective regardless of the amount of data I collected.

My approach (2)

mikael_j (106439) | more than 3 years ago | (#35192718)

I've tried forcing myself to use various schemes including relying completely on metadata and search. The last couple of years this is how I've ended up setting things up:

"Public" network storage

This is for data that should be accessible to the entire network at home. NFS mounted on all my machines, stored on ZFS volume on my file server.

  • Software/Applications - Application installers and ISOs.
  • Software/Games - Game installers and ISOs.
  • Video/compressed - Download directory.
  • Video/Movies - Hard links from Video/compressed, naming set to work with Plex (for looking up movie info from imdb).
  • Video/TV Shows - Hard links from Video/compressed, similar naming as for movies.
  • Music/Rips - Music I've ripped myself, organized by artist and album name.
  • Music/Downloads/Singles - Single songs downloaded, organized by genre.
  • Music/Downloads/Albums - Whole downloaded albums, organized by genre.

Private network storage

I use my home directory on the file server (also on the ZFS volume) for storing personal files and mirroring home directories from client machines in ~/Backup/homes/.

Local storage

On individual client machines I generally try to stick with whatever the operating system tries to make me use with an rsync script that syncs everything to the file server (automatically for desktops, run manually on portable machines).

This is what works for me. I would probably have stuck to the "just use metadata" approach if most user interfaces didn't seem to try and make it a major chore to edit and view metadata...

Ultrafast search and metadata filesystem (5, Interesting)

Twinbee (767046) | more than 3 years ago | (#35192724)

I have recently found an incredibly fast search tool called Everything [voidtools.com]. We're talking about Google-like searching where the results pop up as you type. It must be something on the order of a fifth of a second for my 1.5 million files. This kind of technology should be widespread - it makes searches actually *pleasant* to do. Anyway thanks to Everything, I worry less now about where I store my files, and I also try to pack in keywords into the filename.

Anyway, this kind of program is just a glimpse of what a future OS would look like. Imagine a system where everything is stored in tags and where folders become obsolete or used far less often. What you have then is a database or metadata file-system. The relatively new Haiku OS uses such a system, and I wrote about the massive advantages from this old page:
http://www.skytopia.com/project/articles/filesystem.html [skytopia.com]

Honestly, we'll all be better off the sooner we switch.

Re:Ultrafast search and metadata filesystem (2)

JSG (82708) | more than 3 years ago | (#35192852)

Everything (just looked at the homepage) looks just like "locate" (slocate, mlocate etc) which is a long standing *nix system tool. Oh and with a GUI frontend. There's plenty of those for locate as well.

As to that sort of metadata based FS, it seems to be really hard to do properly and despite it seeming like a good idea, not many are screaming out for it. If they were we'd all have one by now.

My money is on it being hard to do whilst not sacrificing performance. FSs are bloody hard - watch the development of any of the new breed of FS eg BTRFS, EXT4 etc.

Re:Ultrafast search and metadata filesystem (2)

maxwell demon (590494) | more than 3 years ago | (#35192936)

Reiser was going to go into that direction (at least if I understood the description on the namesys web sites correctly). But then, development stopped because Reiser turned out to be a real killer ...

Re:Ultrafast search and metadata filesystem (0)

Anonymous Coward | more than 3 years ago | (#35193052)

I have recently found an incredibly fast search tool called Everything [voidtools.com]. We're talking about Google-like searching where the results pop up as you type...

So... like Spotlight, then?

Re:Ultrafast search and metadata filesystem (5, Interesting)

timeOday (582209) | more than 3 years ago | (#35193132)

Imagine a system where everything is stored in tags and where folders become obsolete or used far less often.

It bothers me when people think tags are fundamentally different from folders (directories) in the first place. I'm going to re-introduce directories as "hierarchial tags" and blow everybody's mind.

Maybe it's because people think of directory membership as exclusive? But it isn't. You can link a file into as many directories as you like with the 'ln' command. If that hasn't caught on, and if Windows Folders don't even really support that, it's because most people just don't bother... and the same is/will be true of tags by any other name.

If you need to manage documents (0)

Anonymous Coward | more than 3 years ago | (#35192728)

consider using software like Zotero, Jabref, BibDesk or others. But from your problem description it sounds like you don't have the problem that you need to refer to documents quickly and keep a lot of pertinent meta data about them. For managing a few videos and photos for other people directories should be enough. And for anything resembling source code and configuration files you can also use directories plus a distributed revision control system like git.

Learn to delete (2)

zwei2stein (782480) | more than 3 years ago | (#35192738)

Simple: Delete stuff.

Do you need all those instalation files for 10 year old shareware? Do you really need Gigabytes of movies you will never watch again? Music Collection so big that your playlist is months on lenght? Irrelevant TV shows? More ebooks than you can possibly read?

What you really need to keep are personal files - photos, home video, documents. Those can easily be managed - tag by occasion, file under year/month. done. (they do not take that much space either and people get tired of documenting everything sooner or later.).

Re:Learn to delete (1)

mikael_j (106439) | more than 3 years ago | (#35192768)

I have no problem deleting stuff that's easy to find again, the problem is all that hard-to-find data. The movie you can't find a DVD or Bluray of and just finding a torrent with seeds took days of searching (not to mention downloading it at a blazing 50 kB/s from the one person seeding it). Removing data like that means that if you ever want to watch it again it will take you days or weeks of preparation.

The problem is that data like that adds up. At first it's just a couple of movies, after a while half the movies you have are movies like that.

Re:Learn to delete (4, Insightful)

Hatta (162192) | more than 3 years ago | (#35192824)

Do you need all those instalation files for 10 year old shareware?

Sure do. In fact I just installed StuffIt Deluxe on an SE/30 last weekend

Do you really need Gigabytes of movies you will never watch again? Music Collection so big that your playlist is months on lenght? Irrelevant TV shows?

The bigger the collection, the more fun shuffle is.

More ebooks than you can possibly read?

You never know which one you'll need to refer to.

For better or for worse... (0)

Anonymous Coward | more than 3 years ago | (#35192740)

I have my local folder structure the same as the project server, so that anyone who might need to, can get something off of it, and it's easier for me to make backups.

OpenGoo/Fengoffice (1)

IANAAC (692242) | more than 3 years ago | (#35192746)

For all my editable docs I create/edit for work, I use OpenGoo/FengOffice. I can keep track of versions, and apply any tagging, etc.

For my media files (on a separate server)I use Rhythmbox for audio and XBMC for video.

And I back up everything somewhere else, just in case. I don't have terabytes of stuff though. Close to a terabyte.

Delegation (1, Insightful)

rhendershot (46429) | more than 3 years ago | (#35192754)

I gave my son his own computer and, like many IT strategies, told him I'd back up what he asked me to. I made him responsible for his own collection, as am I. They may duplicate but hardware is so cheap. When we watch recorded TV shows sometimes we are both interested in keeping a copy, and that's ok. A gig here or there really doesn't matter when I can add 2TB for a $100.

That's very different from the scenario we faced when his brothers were kids. A 100MB hard drive was then pretty significant. I had to consider floppies and temp spaces. Now I'm more concerned with the age of the hard drive.

I don't think I'm the best one to decide how he might like to find his information - who knows what innovation might bring. I DO care that the systems are stable and reliable. That means repairable, at least to me.

Organization, Tags and Smart Programs (2)

chill (34294) | more than 3 years ago | (#35192762)

My main file server, where anything not in immediate use is stored, is organized mostly for human convenience. That is, a tree-hierarchy of folders.

media/pictures/family (with various subfolders like "zoo", "picnic", "christmas 2010", etc.)
documents/work/[person's name]
documents/school/[person's name]
web/[site name]
programming/[person's name]/project
family history/

At the end of the year, or when I do a mass data import, I spend more time getting the meta-data and tags correct than anything else. All of my audio and video are properly tagged. Ditto for any documents.

Almost all video is accessed with "smart" programs, like Amarok or XBMC which automatically pull in things like lyrics, trailers, cover art, etc. That stuff is almost never accessed thru the directory tree. The interfaces on the programs are way too good -- assuming the stuff is properly tagged.

The web and programming folders are basically .tar.gz files that are backed up and copied over (drag-n-drop via smb mounted share). They're archives of whatever project someone is working on their local system. I've set up cron/scheduled tasks to update those daily on everyone's PCs, even the kids.

Most media folders are read-only, to prevent accidental deletion. My account is the master and I can upload stuff there, but I don't want accidents from people wanting to just watch a movie. 600+ DVDs/BluRays, including movies, educational & television shows all on a 2 Tb file server in h.264 format. All *music* is FLAC format, with Amarok auto-transcoding if people want to transfer to an iPod. All other audio, like drama/comedy/educational is 128 Kbps MP3 for ease of streaming. And old comedy albums aren't exactly THX-quality to begin with.

It's a mess (1)

SchizoDuckie (1051438) | more than 3 years ago | (#35192764)

In spite of what MS promised, we still have no SQL filesystems.. I'd love one of those by now. I have terabytes of data, photos, code, php, javascript, movies, chat logs all scatterd throughout different disks backed up when needed, double copies everywhere. I want something to manage this properly! Any advise?

Re:It's a mess (1)

darkpixel2k (623900) | more than 3 years ago | (#35192850)

In spite of what MS promised, we still have no SQL filesystems.. I'd love one of those by now. I have terabytes of data, photos, code, php, javascript, movies, chat logs all scatterd throughout different disks backed up when needed, double copies everywhere. I want something to manage this properly! Any advise?

mysql --user root --password s3cr3t
create database 'my_new_filesystem'
Have fun.

Re:It's a mess (1)

maxwell demon (590494) | more than 3 years ago | (#35193004)

OK, now you "only" have to write the file system driver for it (probably through fuse), so that normal applications can access the files in your SQL file system ...

Re:It's a mess (2)

rhendershot (46429) | more than 3 years ago | (#35192952)

I also was intrigued by the idea of a database-oriented file-system. A basic operation though is to get a file. how? by it's name or id. what is it's name? Something you have to define. It could have a category (eg. javascript development library) but that's something the schema would impose upon you. what if you're more interested in the files' Contributor (author, downloader, etc.) ?

By itself a file-system backed by a database engine doesn't make the problem smaller it adds overhead.

There's only one resolution that identifies one from another and that's the explicit bytes contained within its storage. That can be simplified by indexing schemes like mdasum but they all can have collisions. (rare but how much of a chance are you willing to take?)

Is a file of bytes, ended by CR the same as the same file of bytes ended by CRLF? While the system itself might probably use null termination, other files from other systems won't.

the low-hanging fruit for file de-duplication is in backup storage. When you and another person need to retain the same file it can easily be merged into the stream. when you have two files that are byte comparable that's not so easy because you probably have defined some separation criteria (eg different file paths). so on your system they still need to remain discrete.

I've not heard much about how they would integrate this at the OS level but I think that's the trick.

Tags are useless for personal organization (3, Interesting)

icemaze (1865436) | more than 3 years ago | (#35192772)

Who has the time to hand-pick all the relevant tags for every file they download? Yeah, me neither.
Finding time to put things in their own directory, and not dumping them all in "downloads", is a great accomplishment.

However finding a meaningful, hierarchical structure is non-trivial. I'm still working on it.

Disagree (1)

IANAAC (692242) | more than 3 years ago | (#35192836)

Tags are no less useful than any other form of organization. They're more useful to me when used with software that keeps track of location for me.

Well thought out is well thought out, regardless of whatever system you use.

Re:Tags are useless for personal organization (1)

darkpixel2k (623900) | more than 3 years ago | (#35192868)

Who has the time to hand-pick all the relevant tags for every file they download? Yeah, me neither. Finding time to put things in their own directory, and not dumping them all in "downloads", is a great accomplishment.

However finding a meaningful, hierarchical structure is non-trivial. I'm still working on it.

I'd settle for being able to tag a file/folder with 'temp' and have the folder/file automatically delete $x days after I last touch it.

The reason I can't just 'rm -rf *' in my downloads directory is I don't want to delete the stuff I just downloaded a few minutes/days ago but haven't sorted properly yet.

I support I should just write a damn script.

Re:Tags are useless for personal organization (1)

maxwell demon (590494) | more than 3 years ago | (#35192892)

The nice thing about tags is that you don't need to apply all of them right away (yes, it's better if you do, but it's not mandatory). You can easily add tags later if you discover that some specific tag would be useful. In contrast, it's quite hard to retrofit an existing hierarchical structure.

That said, there's unfortunately no way I know to consistently tag arbitrary files without the danger of the tagging getting broken either by file system operations, or by programs to edit the files.

File organization/etc (0)

Anonymous Coward | more than 3 years ago | (#35192774)

For the most part, I use directory structure coupled with hard drive provisions that organize the content based on media type.

I also use http://bulkfilemanager.codeplex.com for managing names/relocation of mass files, makes life a lot easier. Especially in download sets where a uniform naming scheme for the media is non-existant, and having one implemented would be useful.

Google said it best.. (2)

xtal (49134) | more than 3 years ago | (#35192788)

Search, don't sort.

Re:Google said it best.. (1)

Rob the Bold (788862) | more than 3 years ago | (#35192884)

Search, don't sort.

Good idea in theory, but in practice -- at least for me -- most files aren't going to have enough data in the filename and tags (if any) to search for pictures from Uncle Bob's second wedding. Unless, of course, you've sorted all your pics and tagged/named all the pics from that event.

Some kind of picture/video/audio searching that would be effective seems a long way away, at least if you're looking for the aforementioned wedding as opposed to all pictures with blue flowers.

Re:Google said it best.. (0)

Anonymous Coward | more than 3 years ago | (#35192946)

Google created the ilusion that everything can be searched, lots of information are not accesible via Google and other search engines. Google is very similiar to TV, if you are not shown in TV you don't exist.

There is a new project for binary version control. (1)

mekberg (1942358) | more than 3 years ago | (#35192802)

This problems keeps popping up more and more often, as people collect more and more data... I don't have the answer to the question on how to keep everything indexed and searchable, but I do have the answer to the question on how to safely version control and store/backup such large amounts of data data... A little project I have called Boar. I quote from the project front page:

"BOAR aims to be the perfect way to make sure your most important digital information, like pictures, movies and documents, are stored safely.

* BOAR prevents data loss due to human or machine error
* BOAR makes it possible for you to restore any or all of your files from any point in time.
* BOAR makes it easy to maintain verified backups of your data, including file history.
* BOAR will make it much more likely for your digital heirlooms to reach your grandchildren some day.

If you are familiar with vcs software such as Subversion, you might think of boar as "version control for large binary files". But keep reading, because there is more to it."

Please check it out at google code: http://code.google.com/p/boar/ [google.com]

Re:There is a new project for binary version contr (1)

woboyle (1044168) | more than 3 years ago | (#35193106)

Interesting, but most people need (at least I do) more of a data de-duplication tool than anything else. That, and a subject-based whereis tool, would deal with 95+% of the problems most people (and organizations) face in this realm. JMHO, but then I only have about 40 years experience in this field... :-)

Please tell us (1)

rossdee (243626) | more than 3 years ago | (#35192804)

Where you keep all your valuable data, so if we ever hack into your computer, we know where to steal (or at least make copies of) your pron collection

I use 'group directories' (1)

david.emery (127135) | more than 3 years ago | (#35192814)

I'm working in a Mac OS X environment, but this should work for Linux too: I have groups for the various classes of stuff, e.g. photos, household files (like taxes and Christmas letters), etc. Each group has a group home associated with it, and I mount those from my server as needed. (The server's a RAID 5 box). Irreplaceable stuff like photos are copied a couple of times, once to a disk on a separate machine and periodically to a portable USB drive that I keep at a friend's house. (I have 2 of them and rotate them.) An advantage of the group-based approach is that I can use group privileges to limit access if required (e.g. my work related stuff is not readable by the rest of the family. Photos are updatable by my wife and I and readable by everyone else, etc.)

For sensitive materials, I actually use a Mac OS X encrypted disk image in the group home directory. One of these days I'll work out how to get whole-drive encryption on my Mac OS X Mini Server.

For my photos, I'm experimenting with various keyword Digital Asset Management schemes, inspired by "The DAM Book" http://oreilly.com/catalog/9780596523589/ [oreilly.com]

And as a side note, I'm seeing -50% failure rate- on Seagate 3.5" 1tb drives that are about 1-2 years old. The RAID enclosure is running Toshiba 1tb drives. One of my 2 USB backups (with a Seagate drive) failed, so I'll replace that with a Toshiba or WD drive. I'm really disgusted with Seagate reliability!

my solution (1)

StripedCow (776465) | more than 3 years ago | (#35192820)

1. Post it on the web, or run your own apache instance.
2. Use google to find back your data.
3. ?
4. Let others also profit from your data.

Re:my solution (2)

woboyle (1044168) | more than 3 years ago | (#35193016)

Some savant once said "DON'T TRUST ANYONE!" with your money or your wife (or today, your data). I think that includes Google...

google desktop (RIP) (4, Interesting)

meeotch (524339) | more than 3 years ago | (#35192822)

I had great success with Google Desktop Search (on windoze) for a while. It would index my mail, files, and web history (if instructed to) - and the best part was hitting one key to get an instant, minimalist search box with auto-preview. From there, you could jump straight to what you were looking for, or open a further page to narrow the search.

Sadly, it doesn't work with Thunderbird 3.0, and Google doesn't appear to care, or even to be supporting it anymore. So now I'm on a hodgepodge of GDS, Windows built-in search, and the sucky T-bird search bar.

I honestly can't believe that nobody has duplicated this Spotlight-esque functionality yet. I realize there are other desktop search options, but none of the ones I've come across have that one-key mini search that goes away as easily as it is called up. For an operation that I'm performing dozens of times daily, that's pretty crucial. It even replaced the file browser for me - much easier to call up the GDS box & type a couple letters than to grab the mouse and drill down into some directory structure - even if I know exactly where I'm going.

File Organization How do you do it in 2011 (0)

Anonymous Coward | more than 3 years ago | (#35192874)

I have a huge NUMBER of files, in a huge NUMBER of formats about a huge NUMBER of topics....

My Secret Technique (1)

Snarf You (1285360) | more than 3 years ago | (#35192880)

Here is my highly effective file organization methodology:

  • I keep multimedia (audio, video, porn) files scattered around 37 or so different folders, each with several subfolders. The majority of these folders have 'temp' or 'download' in their names, except for porn files and folders which consist solely of simple combinations of letters and numbers, such as zzz/tt2.mpg, abc/xyz10.mpg, etc.
  • Programming projects (some for work, some for hobby, none of which are finished or ever will be) get split up among folders whose names convey increasing seriousness. For example, first there was simply 'Projects', then along came 'New Projects', which begat 'Really New Projects' and eventually 'Even More New Projects'.
  • Helper files (planning, documentation, spreadsheets, etc.) for each such project have strange filenames relating to whatever thoughts were racing through my mind at the time they were named.
  • Then, all you have to do is remember where and what everything is.

And that's pretty much all there is to it. This system hasn't failed me yet. Plus, it will stimulate the economy in approximately 0 to 60 years, because the investigator who has the pleasure of snooping through my computers after I croak will have job security for years.

Re:My Secret Technique (1)

woboyle (1044168) | more than 3 years ago | (#35193154)

And I keep all my tax-relevant papers in a single big cardboard box... For all years, of course! :-)

Let me give you one word...Volume. (0)

Anonymous Coward | more than 3 years ago | (#35192882)

One big disk, or multiple ones if you want redundancy for your data. It's the only way you'll keep anything.

Switching out media is a pain. There are people who do it. If you were one of those anal retentive types, you'd hardly be asking about it on Slashdot. So I'll assume you're a normal person.

So one big storage space. There are a number of ways to do that, from a USB or eSata drive, to a network share. Whatever choice you make, if you don't like it, change in another year or so. At the rate of data space increases, you won't run out anytime soon.

The real key is to keep your media up to date. Disks decay, drives break down. Yeah, you can buy a 5 1/4" drive, but is it worth the bother? Even 3 1/2" are fading, and it won't be long before some others hit the dust.

Of course, specifics are up to you, different people like different organizations, and who knows what you really have a use for? I remember some old games I've played once or twice. Should I keep them, or just forget about them because if I really cared, I'd still know their names and not just have to look for them.

A word for "lifestreams" and against livelink (3, Insightful)

rbrander (73222) | more than 3 years ago | (#35192888)

I'm pretty much a "have a lot of structured directories" guy myself; I don't see your complaint about rising file sizes, or even total number of files. They've pretty much increased linearly in number while the speed of the linux "locate" command has gone up exponentially with Moore's Law. It's the other way around from management trouble - with TB hard drives, I have so much space I leave around TV shows and other media files I'll likely never watch again, "just in case".

At work, the search problems are harder, because I've got quite the multi-tasking job where I may spend just minutes on some problem, then be asked for an update months later, totally skeptical that I ever addressed the issue. And my favourite file-management with that is the most insane-sounding of all: one big directory. I sort it by date and rely on the fact that I take time to write out helpful file names like "downtown_condition_assessment_newmall_4_ernie.xlsx" (not actually that long, I use abbrevs in RL). Only files that have a whole lot of subject-matter friends get their own subdirectory; lonely "one-off" files go in the Big Pile.

The "sort the directory by date" uses the theory behind "lifestreams" [wikipedia.org] promoted by Eric Freeman and David Gelernter at Yale. It really is the best thing I've found (same 30 years) to stimulate the memory - seeing the names of other things you did at the same time; you can actually sense yourself getting close to the file as you remember, "Oh yeah, I worked on that in the spring".

An additional word of Fear & Loathing for "document management systems" like LiveLink by Formark. Required to use this by work (shared directories are strictly for 'short-term' storage), it's awful. Terribly slow, the search function approaches useless, and it's hard (and slow, did I mention slow) to even re-sort a directory (sorry, that's a 'filter down' in Livelink's vocab) by name or date or whatever. After promising that photos would be displayed with thumbnails by the great new Version 4 for two years, it came, broke some stuff that was working, and did not provide thumbnails - all media files are unsearchable in any way. I suspect for long-term archiving, putting documents in a database would have advantages, but for active business usage, it's been crippling.

Directory orgs I use (1)

bzipitidoo (647217) | more than 3 years ago | (#35192908)

These are what I've come up with.

For Windows, I create C:\Software and C:\Hardware. Drivers, DirectX updates, and such all go in Hardware. Any software I install goes in Software. Games are the reason to use Windows, and are huge consumers of hard drive space, so they rate their own subdirectory, C:\Software\Game. (I've also decided to drop plurals from directory names I create. Was getting annoying having "pic", "pics", "pictures", "images", etc.) It doesn't have to be "Software", all it has to be is not C:\Program Files. That way I can tell at a glance what I put on there, and what else is there. Back in the days of dial up BBSes, I used C:\LOAD\DOWN and C:\LOAD\UP. When I installed Windows, I'd have it install into C:\W, figuring that would make various configuration files ever so slightly smaller.

For UNIX, of course I have /home mounted on its own partition. Makes upgrading and backing up a lot easier. I use 'u' (for "user") for my primary user name. (Some distros, such as SuSE, won't allow single char user names, so it's "u1" for those.) Besides keeping it as simple and short as possible, it also heads off any possibility of my real name being easily discovered from my chosen user name. As more and more crap has been stored in the home directories of users (directories like .mozilla, .gnome, .gnome2, Desktop, Documents, Downloads), I've recently taken to putting all my stuff in /home/u/own/, so I can easily tell them apart. I could live with it as long as they kept to hidden names, but when the desktop environments started pushing in with subdirs like /home/u/Documents, I decided to do something. Same idea with C:\HOME\U on Windows, when I have anything there. C:\My Whatever attracts too much junk from programs that take it upon themselves to save their ever so valuable configuration info there.

And lastly, I save configuration tweaks, with full path names, in /home/localconfig/. If I change, say, /etc/hosts.deny, I save the changed copy (not the original) in /home/localconfig/etc/hosts.deny. Really helps when I'm trying to remember what I had to do to get sshd, CUPS, XWindows, or whatever to work, or where the window manager du jour stores its global configuration and menus, or where the heck they moved DIRCOLORS functionality this time. Of course there is no user named "localconfig".

Ubuntu + Google Rainbow (0)

Anonymous Coward | more than 3 years ago | (#35192914)

Nexus One / Nexus S / Folio (Android Tablett = iPad, cheaper)

Then : Chrome / Picasa / Google Power

Because the Invisible Garfield @french_matt ... pushes Innovation instead of just staying @ home ;)

Everything on PC, Spotlight on Mac (4, Interesting)

Xian97 (714198) | more than 3 years ago | (#35192932)

Everything [voidtools.com] is what I use on the PC to quickly find any file I am looking for.

On the Mac I use Spotlight.

While it would be nice to be completely organized, these tools let me find my files anywhere they are located on my PC. I try to keep things organized into folders, but I am always falling behind so these are what I can use in the interim.

Re:Everything on PC, Spotlight on Mac (-1)

Anonymous Coward | more than 3 years ago | (#35193192)

Windows is not PC. GNU/Linux also runs on PC. Actually, nowadays even Macs are PCs. You silly little fop!

Besides, you didn't even begin to answer the question or then you just have FUCKING long descriptive file names.....

Just two for everyone I thought (0)

Anonymous Coward | more than 3 years ago | (#35192974)


A 2-tier backup system would be nice (1)

azgard (461476) | more than 3 years ago | (#35193002)

I don't use any tools. I just have all the content on two sets of external disks (copies of each other; I use external disks because I don't have large enough computer and I don't like the idea everything to be under current at all times). It's a pain to manage. I think Linux (or your favorite OS) desperately needs a 2-tier backup system with deduplication (but still making sure you have enough copies for recovery) and a good user interface.

Ideally, I would say, in file manager, unarchive me this file, and he would look for the file, let me mount the proper disks/CDs required to get and then copy it to some cache area on main harddisk. Here I could play with the files (change, sort, rename, tag, whatever), and then he would automatically backup them again when I wouldn't play with them any longer anymore.

On a side note.. (2)

vondiggity (1038522) | more than 3 years ago | (#35193020)

What happened to Beagle for Linux? It used to work pretty well for me, and now it seems to have been abandoned.

Easy (1)

Smartcowboy (679871) | more than 3 years ago | (#35193026)

I put everything on the desktop. When there is no more place on the desktop, I create a subfolder named "temp" and then put everything in it, including the last "temp" subfolder.

Do what my boss does (0)

Anonymous Coward | more than 3 years ago | (#35193036)

All files on your desktop with subfolders - along with one or two random virus .exe files.
All emails sent to me have subject lines with "Re:" followed by some completely unrelated subject.
And all VBA code is commented at a ratio of 1 commented line for every 600 lines of code.

That should it do.

*smart* metadata filesystem (1)

itzdandy (183397) | more than 3 years ago | (#35193054)

when need a smart metadata filesystem. The system needs to be a simple and automatic system which file extensions and file headers are used to create the base level tags. Other tags could be added for items like music and video but the 'bread and butter' of the tag system needs to come from obvious information in the file and filename.

Directories, b*tch (1)

tkprit (8581) | more than 3 years ago | (#35193062)

(waiting for someone to say 'CLOUD')

Not folders, "libraries", and sure as hell no tags (I tried that w/ Picasa for my pictures a few years ago; made a mess; deleted Picasa -- returned to sensible dir structure "pictures/TOPIC/year/month" and I'm fine). And I separate code by language ("code/4th/TOPIC" or "code/c++/TOPIC" ..I even keep a /fortran dir though I haven't used it since 80s, and a /48sx— some of my favorite code even though it's essentially unusable). Homebrew backup across local network drives and [for pics/video and code] solid state.

evil: I keep an /mp3 dir that's root-accessible only, with no subdirs whatsoever. ONE time I had a problem with this, and for that system alone I made a few subdirs. Learned hard way w/ iTunes. I despise programs that rearrange your files for you, make ridiculous subdirs w/out permissions, etc. I have to use iTunes, but I look forward to the day when I can get rid of anything apple and/or adobe. Hell, not even MS forces directories on you (not incl. the OS itself, I guess).

folders + wiki (2)

alexmagni (190839) | more than 3 years ago | (#35193116)

I finally dealt with this problem once and for all in the following way. I found the best personal wiki out there (Zim: http://zim-wiki.org/ [zim-wiki.org]), and wrote a simple python script (http://www.inrim.it/~magni/zimDMS.htm [inrim.it]) that scans nightly my folder structure, keeping up-to-date my wiki. My wiki, therefore, is a perfect mirror of my folder structure, with the added bonuses that I can navigate to each folder, comment it, describe its content, insert images, insert links to other folders, and finally by a single click I can open it in the file manager. My ~ 15000 folders are managed perfectly...

ls -lR works. Just add grep. (0)

Anonymous Coward | more than 3 years ago | (#35193140)

Arcane commands?? Terminal window?

Certainly you are confused, grasshopper.

I have a mix of online and offline files. Online files are stored across 20+ machines, but most live on a file server that runs 'updatedb' nightly. That means 'locate' can be used to find any file on that system efficiently.

For media files stored off line, it is all about building a text DB. Those offline files are (usually) stored on numbered optical media and the contents are stored with the equivelant of 'ls -lR' > nnnn.txt. If certain types of files are included in the media, additional information may be pulled from the internet and placed into another text DB with "additional" information. Egrep is used to find anything and the optical media number is shown in the results.

On Windows, you can use locate32 for similar capabilities to the UNIX 'locate' command. I think it will look inside files too, so the egrep command to find which media disc a file is stored would be easy.

I like that it is all TEXT files for efficiency, trivial access, and maintenance.

I've created web interfaces ... never use them. They just get in the way. Wife and kids use those, but only with limited searches based on filename. I search the additional metadata when that is desired.

Remember when you learned how to create a dictionary using a large text file as input?

The old ways are not necessarily bad.

Two level file structure (maybe three) (1)

Overzeetop (214511) | more than 3 years ago | (#35193174)

Everything I have is on an UnRaid box, so it's organized into shares (virtual drives)

Personal Files
      Photos (actually, it might be called Images)
    Music Archive (all my originals)
    iTunes (my working volume/set, compressed to mp4)
    Home Movies (Raw, In Process, & Finished folders)
    Files (everything else goes here, organized by what it is - health, rockets, cooking, guitar...etc)
Work Files
    My business stuff
      Separated by type (DVD, HD, Recorded TV, Youtube)

This lets me backup my Personal and Work volumes with very little fanfare - I use LiveDrive remote storage and local external drives with SyncBack Pro. I don't back up the Video directory. Sure, it'd be a bummer to have to re-rip 400 discs, and lose some old TV shows, but it wouldn't be the end of the word, and it's not worth building a separate box (or two) with 4TB of storage to back up what I own on commercial media. That may change in the next year or two, and when it does, I'll backup with SyncBack just like the others.

The files are organized enough that most files I need can be narrowed down to 2-4 folders, tops. (side note...I have another 2-3 levels under my Files share..but that's just a second level of organization) All my images are cataloged using Picasa. I don't subscribe to the "dump it in a folder and tag it for searching." Even in Evernote I'm relatively organized. I've found that you can never remember the tag you used 4 years ago to search for your stuff. I keep a running log of my work jobs - about 1200+ in the past 8 years - and I still have problems finding specific jobs from too long ago.

Occasionally I'll clean stuff out and reorganize, and having the folders makes it easy. The biggest thing is that the master set is in ONE spot, and all my other machines sync to that spot. Sadly, the LiveDrive engineers are a bunch of useless hacks with an inflated view of their servers, so I don't sync any of my personal stuff there. I use their sync for work because it's the only service that seems to work reliably for less than $200/yr, and I've modified my workflow to use their ass-backwards system. As a result, I have to manually sync things to and from my server if I go remote for a while, but that's rare for my personal stuff; usually if I'm away from home and not on business, I want to be away from technology.

