2018-10-29

file organisation

examples for large file collections

general observations

a directory name can be considered to represent a category with entries belonging to that category. directories can also be seen as graph nodes or even relation labels

when there is for example a directory "text-files", it is usually simpler to have it only contain text files so that the directory can act as a placeholder or entry point to files of a homogenous category type, and not a mix of types that might at some point require the extra step of differentiation

the traditional filesystem can be used to build hierarchies of directories (nodes) and other files (always leaf nodes). there is the root path that is superior to all others, usually no circular relationships and not more than one edge between two nodes

filesystem paths can contain information. for example a path for an audio file like this "artist/artist.album/artist.album.1.flac" contains redundant information. it could be simplified to "artist/album/1.flac". the downside to this is that parent directories have to be considered to get the full information

rating

a rating of files can be implemented with directories named by numbers and contained files having a rating of that number. for example "movies/1/good-movie.mkv", "movies/2/less-good-movie.mkv". this works well for dividing a set of files when they are frequently accessed by some importance characteristic, for example frequently listened to music, or good and less good movies/images/etc. nesting of numeric directories can be confusing since the difference between nested ratings like 1/3 and 2/1 becomes less and less clear for each nesting, it is similar to decimals. for unrated files a directory named 0 has the advantage that it is numeric like other ratings. special commands in file managers, like for example custom right-click commands, can be particularly useful for that. for example a command that can change rating while keeping relative path, for example 1/x/y/z becomes 2/x/y/z

see also rate

automatic file sorting

files could be sorted into directories automatically based on characteristics like mime type, file name, video resolution, image colors and more

music files

class > rating > instrument-class > loudness/rhythm > artist > album.album-release-date > track-number.track-title
music non-music
  0 1 2 3 4
    electronic guitar ochestra jazz piano
      beat calm noisy other
        artist-name
          release-date.album-name other
            track-number.track-name
          various-artists
            album-name.release-date
              track-number.artist-name.track-name
music/1/electronic/calm/murcof/2007.cosmos/murcof.cosmos.03 cosmos i.flac
1
  electronic
    beat
    calm
      murcof
        2002.martes
        2007.cosmos
      vangelis
        "opera sauvage"
    other
  guitar
  jazz
  orchestra
  piano
2
  0
    unrated-album ...
  1
    electronic
    guitar
    jazz

automatic classification

  • general texture by instruments used (jazz band, classical orchestra, synthesizer, etc)
  • pace, loudness or rhythm (calm ambient track, driving beat or chaotic noise)

video files

class > rating > title > season > episode
tv-show/1/futurama/s05/2.mkv
movie
  0
  1
  2
tv-show
  1
    curb your enthusiasm
    futurama
      s01
      s02
      s03
      s04
      s05
        1.mkv 2.mkv 3.mkv 4.mkv 5.mkv 6.mkv
    monkey dust
  2
other
  clips
  standup

common is also naming the files like s05e01.mkv where s stands for season and e for episode

automatic classification

  • resolution: the resolution is usually the most meaningful quality factor. there is an incentive when encoding videos to make something of a higher resolution not look like something of a lower resolution
  • duration: 30 second clip or 120 minute feature film

framerate, bitrate and a rating for the codec could also be used, but it seems difficult to create a meaningful and standard rating from this.

other

audio
backup
documents
  editable
  uneditable
download
other
  other
  compressed
personal
  authored
    projects
      private
      public
  foreign
picture
temp
text
  machine-readable
  plain
  programming
    {language-name}
video

programming projects

exe license other readme.md source temp
{project-name}
  exe
    compile
    install
    test
  other
  source
    {language-name}
    sc
      derivatives
      foreign
      main
      test
    scheme
      sph
      test
        sph
  modules
    sph
    test
      sph
  submodules
    {repository-name}
  tmp
    lib.so
  license
  readme.md
  • "exe" is for executable files - files that have the executable bit set. traditionally something like "bin" (from /usr/bin, binary files) or "scripts" tends to be used
  • "readme" and "license" do not have to be uppercased
  • submodules can be kept separate from source and copied or linked in a compile step to not have to exclude a checked out submodule directory when selecting files from "source"
  • files not maintained in this project should be marked as such, like in this example with a "foreign" directory. otherwise it is difficult to differentiate what belongs to the project and what can be changed without synchronisation issues
  • for projects that only contain modules or libraries of a single type, for example modules of a scheme project, a single "modules" directory with the module files directly underneath seems most appropriate

home directory

/home/username
  .exe
  .config
  mnt
  tmp
  pp -> personal/authored/projects/public

mnt for mounted directories. this has the downside that recursive operations on the home directory might include other filesystems

projects/versioned/repository-name
projects/unversioned/customer-name

emacs configuration

.emacs
.emacs.d
  snippets
  elpa
  lisp
    lib
      color-theme
      mode
    local
      work.el
      home.el
    local.el -> work.el
    helpers.el
    mode.el

note about tagging filesystems

sometimes it makes sense to have the same file in multiple directories, if directories correspond to overlapping categories. this can be the case with things like musical genres for example, where one album can be of multiple genres. but also for accessing files by different facets, not just ones that have been chosen for a path. there are filesystems like tagsistant and i have not looked into it in depth. filesystem paths for tagging filesystems do not map 1:1 to posix paths, with some applications eventually not working correctly. for example, when having paths that are tag filter queries at the same time as being posix filesystem paths, a path that displays all files with tag-1 and tag-2 for example could look like "music/tag-1/tag-2" but could also be written "music/tag-2/tag-1", because of the number of possible combinations with this ambiguity, recursive directory search can become impractical. another question is, what happens if a file is copied, or a program tries to create files with specific names and so on (like hidden cache files when editors save), or try to create symlinks. i dont know how current tag filesystems deal with this, maybe they are read-only and have a separate interface for file management

see also

filesystem hierarchy standard