2023-02-27

file organisation

examples for large file collections

general observations

a directory name can be considered to represent a category with entries belonging to that category. directories can also be seen as graph nodes or even relation labels

when there is, for example, a directory "text-files", it is usually simpler to have it only contain text files, so that the directory can act as a placeholder or entry point to files of a homogenous category type, and not a mix of types that might at some point require the extra step of differentiation

the traditional filesystem can be used to build hierarchies of directories (nodes) and regular files (always leaf nodes). there is the root path which is superior to all others. usually no circular relationships and not more than one edge between two nodes is allowed (hierarchy)

filesystem paths can contain information. this example path to an audio file "artist/artist.album/artist.album.1.flac" contains redundant information. it could be reduced to "artist/album/1.flac". the downside to this is that the parent directories have to be considered to get all the information

rating

sorting files by a quality or other rating can be implemented with directories that have numbers as names, whose contained files and file paths are considered to have a rating of that number. for example, "movies/1/good-movie.mkv", "movies/2/less-good-movie.mkv". this works well for dividing a set of files when they are frequently accessed by some importance characteristic, for example frequently listened to music, or good and less good movies/images/etc. using nesting of numeric directories can be confusing, since the difference between nested ratings like 1/3 and 2/1 becomes less and less clear and less useful with each nesting, being similar to decimals. for unrated files, a directory named 0 has the advantage that it is also numeric. special commands in file managers, like for example custom right-click commands, can be particularly helpful for managing ratings. for example, a command that changes rating while keeping relative path, for example 1/x/y/z becomes 2/x/y/z

see also rate

automatic file sorting

files can be sorted into directories automatically based on characteristics like mime type, file name, video resolution, image colors and more

example for music files

this still works well for me for a 30000+ files music collection

rating > instrument-class > loudness/rhythm > artist > album.release-year > track-number.title
0 1 2 3 4
  electronic guitar ochestra jazz piano
    beat calm noisy other
      artist
        release-year.album other
          track-number.title
        various-artists
          album.release-year
            track-number.artist.title
music/1/electronic/calm/murcof/2007.cosmos/murcof.cosmos.03 cosmos i.flac
1
  electronic
    beat
    calm
      murcof
        2002.martes
        2007.cosmos
      vangelis
        "opera sauvage"
    other
  guitar
  jazz
  orchestra
  piano
2
  0
    unrated-album ...
  1
    electronic
    guitar
    jazz
  • instrument-class describes the general texture: music with guitar and band usually sounds fundamentally different from orchestral music or electronic music
  • loudness/rhythm describes how "driving" the piece of music is: music with a beat usually has a different effect on the listener than ambient droning
  • the aforementioned characteristics could also be used for a way of automatic classification
  • things like audiobooks and comedy albums i put into a different parent directory "non-music"

example for video files

class > rating > title > season > episode
tv-show/1/futurama/s05/2.mkv
movie
  0
  1
  2
tv-show
  1
    curb your enthusiasm
    futurama
      s01
      s02
      s03
      s04
      s05
        1.mkv 2.mkv 3.mkv 4.mkv 5.mkv 6.mkv
    monkey dust
  2
other
  clips
  standup

alternatively, what is common on the internet is naming files like s05e01.mkv, where s stands for season and e for episode

automatic classification

  • resolution: there is some incentive when encoding videos to make something with a higher resolution not look like something of a lower resolution, but this is not always the case. even though, in practice, the resolution still seems to be the most dependable quality aspect
  • duration: 30 second clips or 120 minute feature films

framerate, bitrate and a rating for the codec could also be used, but it seems difficult to create a meaningful and standard rating from this.

programming projects

exe license other readme.md source temp
{project-name}
  exe
    compile
    install
    test
  other
  source
    {language-name}
    sc
      derivatives
      foreign
      main
      test
    scheme
      sph
      test
        sph
  modules
    sph
    test
      sph
  submodules
    {repository-name}
  tmp
    lib.so
  license
  readme.md
  • "exe" is for executable files - files that have the executable bit set. traditionally something like "bin" (from /usr/bin, binary files) or "scripts" tends to be used
  • "readme" and "license" do not have to be uppercased
  • submodules can be kept separate from source and copied or linked in a compile step to not have to exclude a checked out submodule directory when selecting files from "source"
  • files not maintained in this project should be marked as such, like in this example with a "foreign" directory. otherwise it is difficult to differentiate what belongs to the project and what can be changed without synchronisation issues
  • for projects that only contain modules or libraries of a single type, for example modules of a scheme project, a single "modules" directory with the module files directly underneath seems most appropriate

home directory

/home/username
  exe
  mnt
  tmp
  pp -> personal/projects/public

exe is in $PATH. mnt for mounted directories, with a special script in my case. this has the downside that recursive operations on the home directory might include other filesystems

projects/versioned/repository-name
projects/unversioned/customer-name

other

example directory names and structures for various other file types

audio
backup
documents
  editable
  uneditable
download
other
  other
  compressed
personal
  projects
    private
    public
  foreign
picture
temp
text
  machine-readable
  plain
  programming
    {language-name}
video

emacs configuration

.emacs
.emacs.d
  snippets
  elpa
  lisp
    lib
      color-theme
      mode
    local
      work.el
      home.el
    local.el -> work.el
    helpers.el
    mode.el

about tagging filesystems

sometimes it makes sense to have the same file in multiple directories, if directories correspond to overlapping categories. this can be the case with things like musical genres for example, where one album can be of multiple genres. but also for accessing files by different facets, not just ones that have been chosen for a path. there are filesystems like tagsistant and i have not looked into it in depth. filesystem paths for tagging filesystems do not map 1:1 to posix paths, with some applications eventually not working correctly. for example, when having paths that are tag filter queries at the same time as being posix filesystem paths, a path that displays all files with tag-1 and tag-2 for example could look like "music/tag-1/tag-2" but could also be written "music/tag-2/tag-1", because of the number of possible combinations with this ambiguity, recursive directory search can become impractical. another question is, what happens if a file is copied, or a program tries to create files with specific names and so on (like hidden cache files when editors save), or try to create symlinks. i do not know how current tag filesystems deal with this, maybe they are read-only and have a separate interface for file management

see also

filesystem hierarchy standard