2024-09-04

scheme variable naming

identifiers

identifier vocabulary. certain identifier names for specific use-cases, consistently used. finding names for variables is not necessarily easy, and time-consuming considerations about a name may in the long-run be appropriate but expensive to be repeatedly made. the compiler might not care about the names, humans do

f proc procedure

some procedure or function

loop

for inner parts of a procedure that are called recursively

a b c d

for unspecified, in some way related bindings or arguments in obvious scope

(list-sort (l (a b) (< a b)) books)

one reasoning is, that if one does not give something a descriptive name, then the next best thing is to at least give it a generic name that can be recognised as such
keeping alphabetical order when introducing those bindings can help to find the places where they are bound to values
also used for elements of a collection in expressions where it has mostly structural relevance, like in (map (l (a) (+ 1 a)) books)
- it needs to be clear that "a" is a "book". in this example the identifier for the collection "books" is near
- naming it explicitly "book" would make the naming of the inner structure dependent on the outer structure and require renaming if it is copied or the outside changes. using dependent names creates more trouble than an explicit naming would be useful in this case
- it shortens the code significantly without using other invented abbreviations and makes the presumably more important structure more visible
"item" is not necessarily a better word to use (see english language dictionary)

c

for a continuation procedure. for procedures that use continuation-passing-style. that means, they pass on results to a procedure that will determine the final return value example. conflicts with the abc variables though

(define (integer-and-fraction number c)
  ; ...calculations...
  (c integer fraction))

(integer-and-fraction 3.72 (l (integer fraction) #t))

{data-type}-{action}

example: list-sort, vector-ref
list procedures might be ok without a data-type prefix because lists are a very fundamental data-type in scheme

{namespace}-{name}

using a "namespace" name, for example the last part of a module name, for a set of procedures

rest

for everything that can be considered the rest of something. might be used as a suffix (l (a . a-rest) #t)

result

the result of a procedure or similar. usually in cases where things have to happen before the result is returned but after it has been created

identifier infixes

-

for word separation
separate words should be infixed. sometimes it can be difficult to decide where to separate. for example is it filename or file-name?
the minus to separate words in identifiers is widely employed in scheme (vector-* list-* hashtable-*) and it has a consistency benefit to use it
no "camelcase" - camelcase takes 26 additional symbols as 26 different word delimiters to make identifiers less readable, more difficult to type (shift key and the pressing of simultaneous keys every time, minus does not need the shift key), more difficult to split (automatic analysis, editor functions) and introduces new possible complexity about how and when a character will be uppercased or not (capitalisation, abbreviations in identifiers)
regarding typing effort: an identifier has usually at most one to two delimiters, which may be 10 percent of all characters. but the difference makes no practical difference. code is much more often read than written, and writing is easier, faster and more correct if it can be checked with less effort while writing
separate words should use a delimiter. words are harder to read when there is no visible separation, where the delimiter is basically the empty-string
negative example: r6rs introduced a new, competing naming scheme. for example with the flonum procedures that all use a "fl" prefix. the actual name of the procedure is significantly harder to discern - reading through the binding name listings, looking for procedures of interest, it is more difficult to discern the parts of the word and mask out the "fl" from flatan, flodd? flfinite? fl/ flabs and so on, to get the needed information. i am sure the inventor finds enough reasons for himself not to agree and to believe that this is optimal. could the type have been called float? so that it would be float-zero? float-odd? - it is a data type like others. could procedures have been named fl-atan fl-odd? fl-finite? or flo-atan flo-odd? flo-finite?

->

for "to"
for mapping/transformation/conversion/dictionaries or similar. {data-type}->{data-type}
parameters: a procedure that does a conversion should not repeat its source-argument data-type in the variable-name and should use an unspecific identifier
example: number->string :: a
example with redundant information: number->string :: number
variable names can create semantic dependencies to the type for example and this should be avoided where practical
to group by prefix differently: {data-type}-from-{data-type}

/

alternative, or
example: port/path

with, and, if, or

call-with-output-file
integer-and-fraction
false-if-exception
first-or-false

identifier suffixes

?

for procedures that evaluate to booleans (asking a question)
examples: list? equal? contains?
is- prefix for self-evaluating variables (the question is already answered)

-{integer}

enumerations of similar variables
example: index-1 index-2 index-3
for short names this seems better: x1 x2 x3

!

for destructive or otherwise side-effecting procedures where the warning character of it might make sense

-p

for procedures that take an extra predicate procedure
the rnrs procedures "remove" and "removep", where "remove" ist more general, are an example
suffixing the "p" without minus may lead to weird words

-s

for bindings explicitly marked as syntax.
for example syntax/macro alternatives to a procedure

filename suffix

.scm

principles

abbreviations by shortening words

word abbreviations built by shortening the length of words, without removing inbetween characters
the long words are then easier to guess. more arbitrary abbreviations can be confusing in that they are much more like new unrelated words that have to be learnt as such, or because they are ambiguous and could mean multiple things, and make many different variations by different authors possible
an alternative is shortening combined with vowel removal, especially for words that start with a consonant

examples

"abbrev" instead of "abvtn" for the word "abbreviation"
"ele" instead of "elmt" for the word "element"

one-letter identifiers

one-letter identifiers only if it is easy to figure out, almost obvious, what they mean. one possible rule is that the place where they are introduced should be in sight.
eventually not using them if words used instead save documentation
not more than a limited amount of one-letter variables in one scope, maybe 4 or so. otherwise there is too much lookup required from the reader

advantages of one-letter identifiers

structure is more visible. therefore potentially less noise
significantly shorter line widths

disadvantages

do not state their meaning

possible ambiguity

example: e: exception? element? event? error?

they are more difficult to search

individual variable names may be used as pointers for quick in-editor navigation by search
quick navigation to specific variables may also be necessary when errors have occured

they are more difficult to replace

usually just the pattern as is needs to be searched an replaced. with one letter identifiers, all possible delimiters of a language must be considered when searching, and their scope

comments to describe the variables

the meaning of variables should at least be stated in a comment if it is not obvious
include the meaning, or long names of short variable names, somewhere quick to navigate to and see
i have seen extensive use of one-letter variables in implementations of algorithms of academical source, with links to elaborate papers and books beneath them, papers that were too long and convoluted to be parsed for the actual meaning of the variables. it could theoretically just have used monotonically numbered identifiers, it would have been as difficult to understand. it should not always be necessary to buy and read every related book to figure out the meaning of variables, that is much too time consuming and financially costly
at least state what the variable stands for in the code. this could be done with a comment like so: d difference i intersection
the most important thing is that the information is available - avoid letting the reader guess

filename suffix

.scm

sc(he)m(e), built by vowel removal
follows the convention of how other extensions are created, by being some abbreviation of the programs or languages name

other suffixes advocated for or in use by others

some scheme implementations specify more filename extensions for plain scheme code for slightly different purposes. purposes which for the programmer thinking on the language - and not implementation - level are, i claim, irrelevantly different
i do not think it is a good idea for the same format/syntax to take up 6 or 7 different final filename suffixes. the meanings have all to be learned and dealt with by a language user. conflicts with other file formats are predestined. most of the existing usage of filename extensions makes it so, that the only thing that a filename suffix gives information about is the general content format - one suffix corresponds to one format; which is much simpler
for example for audio stored in the flac format it would be "somefile.flac" instead of "somefile.jazz", "somefile.pop" and so on.
the proposed extra suffixes are to make it so that several suffixes correspond to one format for different intended usages, usage by the interpreter, not so much by the user. on the implementation level it is likely to lead to more stat calls when searching for code to import

negative examples

more examples here: http://larceny.ccs.neu.edu/doc/user-manual.chunked/ar01s05.html

.ss

"scheme source". how about .sf "scheme file", .sc "scheme code"
and in what situation would it be beneficial to add an extra s to extensions, especially if not done with every source filename extension in use, which would look for example like this: shs jss csss rbs cs
and it is vague, everything can be a source of something else

.sls

the same about the dedicated "source" letter applies
why would we need an extra suffix stating it being a library? to have a non-library and library with the same name but different suffix in the same directory? is there any imaginable benefit in using "mycode.sls" and "mycode.ss" in the same directory that pays off more than the complication costs? i have since learned of the possibility of plain including code instead of including/importing modules when defining libraries where this could have some use
i would be more accepting if there were restrictions on the name before the extension. it is still scheme content and the suffix should tell about the format/type. and leave the user the freedom to change contents without having to rename the suffix. sometimes non-libraries become libraries, files that use import become files that do not use import

.s

already widely used for other formats

specifically excluded naming styles

figurative speech

let us not speak fantasy languages of vagueness and arbitrary personal associations that can only make sense to the author or small groups of insiders
better avoid to create bindings for "fairies that hop onto trains" for adding elements to a list, or "creating steam-boats on a river towards the evening sun" for sending data packets on a socket to the client or something. "(define boat (create-steam-boat 3)) (send-towards-sun (create-sun (quote evening)) boat)". i have seen such things in real-life code. on one hand, it is like reading a fantasy book, and it only makes sense if you read it start to finish, on the other hand it is needlessly obscuring and building a requirement for unnecessary knowledge

prefixes for data types

in a dynamically typed language, the type can change any time and that should be no problem. a type prefix burdens the developer with repeated variable-name refactoring when trying out things and modifying the code or type-casting. it is incredibly syntactically noisy. it is also not trustworthy and like comments quickly outdated and misleading

prefix for local variables

syntactic noise without a good reason. in this scenario you want "variable" but have to write "$variable" for example. one benefit of this is that routines can have the same name as variables, but count the years until you find use for that. in some languages the prefix is intended for replacement in strings
i do not see any net benefit in marking local variables - the variables most needed in context

underscore

the minus is usually enough

camelcase

avoid the camelcase craze. code that uses camelcase requires more documentation because the identifiers are not well readable and groupable based on their parts. systematic prefixing for identifier groups becomes nearly impractical. it requires upper-case characters and adds complexity and new rules concerning when something has to be uppercase and when not. sub-words begin with uppercase characters depending on context - if they are in or at the end of word. two different patterns for the same words are two different patterns to recognise for the same goal. the shift key is also not usually as easy to reach on keyboards than the minus