# filesystem organization examples for large file collections # general observations a directory name can be considered to represent a category with entries belonging to that category. directories can also be seen as graph nodes or even relation labels. when there is, for example, a directory "text-files", it is usually simpler to have it only contain text files, so that the directory can act as a placeholder or entry point to files of a homogenous category type, and not a mix of types that might at some point require the extra step of differentiation. the traditional filesystem can be used to build hierarchies of directories (nodes) and regular files (always leaf nodes). there is the root path which is superior to all others. usually no circular relationships and not more than one edge between two nodes is allowed (hierarchy). filesystem paths can contain information. this example path to an audio file "artist/artist.album/artist.album.1.flac" contains redundant information. it could be reduced to "artist/album/1.flac". the downside to this is that the parent directories have to be considered to get all the information. ## rating sorting files by a quality or other rating can be implemented with directories that have numbers as names, whose contained files and file paths are considered to have a rating of that number. for example, "movies/1/good-movie.mkv", "movies/2/less-good-movie.mkv". this works well for dividing a set of files when they are frequently accessed by some importance characteristic, for example frequently listened to music, or good and less good movies/images/etc. using nesting of numeric directories can be confusing, since the difference between nested ratings like 1/3 and 2/1 becomes less and less clear and less useful with each nesting, being similar to decimals. for unrated files, a directory named 0 has the advantage that it is also numeric. special commands in file managers, like for example custom right-click commands, can be particularly helpful for managing ratings. for example, a command that changes rating while keeping relative path, for example 1/x/y/z becomes 2/x/y/z. see also [rate](../../software/more/rate.html) ## automatic file sorting files can be sorted into directories automatically based on characteristics like mime type, file name, video resolution, image colors and more # example for music files this still works well for me for a 40000+ files music collection ~~~ rating > instrument-class > loudness/rhythm > artist > album.release-year > track-number.title ~~~ ~~~ 0 1 2 3 4 electronic guitar ochestra jazz piano beat calm noisy other artist release-year.album other track-number.title various-artists album.release-year track-number.artist.title ~~~ ~~~ music/1/electronic/calm/murcof/2007.cosmos/murcof.cosmos.03 cosmos i.flac ~~~ ~~~ 0 unrated-album ... 1 electronic beat calm murcof 2002.martes 2007.cosmos vangelis "opera sauvage" other guitar jazz orchestra piano 2 electronic guitar jazz other ~~~ * instrument-class describes the general texture: music with guitar and band usually sounds fundamentally different from orchestral music or electronic music * loudness/rhythm describes how "driving" the piece of music is: music with a beat usually has a different effect on the listener than ambient droning * the aforementioned characteristics could also be used for a way of automatic classification * things like audiobooks and comedy albums i put into a different parent directory under "other" on into a different outside directory # example for video files ~~~ class > rating > title > season > episode ~~~ ~~~ tv-show/1/futurama/s05/2.mkv ~~~ ~~~ movie 0 1 2 tv-show 1 curb your enthusiasm futurama s01 s02 s03 s04 s05 1.mkv 2.mkv 3.mkv 4.mkv 5.mkv 6.mkv monkey dust 2 other clips stand-up ~~~ alternatively, what is common on the internet is naming files like s05e01.mkv, where s stands for season and e for episode. # programming projects ~~~ exe license other readme.md src tmp ~~~ ~~~ {project-name} exe compile install test compiled/ other src {language-name} sc derivatives foreign main test scheme sph test sph pre-compiled modules sph test sph submodules {repository-name} tmp lib.so license readme.md ~~~ * "exe" is for executable files - files that have the executable bit set. traditionally something like "bin" (from /usr/bin, binary files) or "scripts" tends to be used * "readme" and "license" do not have to be uppercased * submodules can be kept separate from source and copied or linked in a compile step to not have to exclude a checked out submodule directory when selecting files from "source" * files not maintained in this project should be marked as such, like in this example with a "foreign" directory. otherwise it is difficult to differentiate what belongs to the project and what can be changed without synchronization issues * for projects that only contain modules or libraries of a single type, for example modules of a scheme project, a single "modules" directory with the module files directly underneath seems most appropriate * the directory hierarchy should be only as deep as it is beneficial. for example, if a project does not use multiple source languages, src/{language-name} are not useful and therefore not necessary # home directory ~~~ /home/username exe mnt tmp pp -> personal/projects/public ~~~ exe is in $PATH. mnt for mounted directories, with a special script in my case. this has the downside that recursive operations on the home directory might include other filesystems ~~~ projects/versioned/repository-name projects/unversioned/customer-name ~~~ # other example directory names and structures for various other file types ~~~ audio backup documents editable uneditable download other other compressed personal projects private public foreign picture tmp text machine-readable plain programming {language-name} video ~~~ # emacs configuration ~~~ .emacs .emacs.d snippets elpa lisp lib color-theme mode local work.el home.el local.el -> work.el helpers.el mode.el ~~~ # about tagging filesystems sometimes it makes sense to have the same file in multiple directories, if directories correspond to overlapping categories. this can be the case with things like musical genres for example, where one album can be of multiple genres. but also for accessing files by different facets, not just ones that have been chosen for a path. there are filesystems like tagsistant and i have not looked into it in depth. filesystem paths for tagging filesystems do not map 1:1 to posix paths, with some applications eventually not working correctly. for example, when having paths that are tag filter queries at the same time as being posix filesystem paths, a path that displays all files with tag-1 and tag-2 for example could look like "music/tag-1/tag-2" but could also be written "music/tag-2/tag-1", because of the number of possible combinations with this ambiguity, recursive directory search can become impractical. another question is, what happens if a file is copied, or a program tries to create files with specific names and so on (like hidden cache files when editors save), or try to create symlinks. i do not know how current tag filesystems deal with this, maybe they are read-only and have a separate interface for file management # see also [filesystem hierarchy standard](http://www.pathname.com/fhs/)