2018-10-29

scheme naming

identifiers

identifier vocabulary. certain identifier names for specific use-cases, consistently used. finding names for variables is not necessarily easy, and time-consuming considerations about a name may in the long-run be appropriate but expensive to be repeatedly made. the compiler might not care about the names, humans do

f proc procedure

some procedure or function

loop

for inner parts of a procedure that are called recursively

a b c d

for unspecified, in some way related bindings or arguments in obvious scope

(list-sort (l (a b) (< a b)) books)
  • one reasoning is, that if one does not give something a descriptive name, then the next best thing is to at least give it a generic name that can be recognised as such
  • keeping alphabetical order when introducing those bindings can help to find the places where they are bound to values
  • also used for elements of a collection in expressions where it has mostly structural relevance, like in (map (l (a) (+ 1 a)) books)

    • it needs to be clear that "a" is a "book". in this example the identifier for the collection "books" is near
    • naming it explicitly "book" would make the naming of the inner structure dependent on the outer structure and require renaming if it is copied or the outside changes. using dependent names creates more trouble than an explicit naming would be useful in this case
    • it shortens the code significantly without using other invented abbreviations and makes the supposedly more important structure more visible
  • "item" is not necessarily a better word to use (see english language dictionary)

c

for a continuation procedure. for procedures that use continuation-passing-style. that means, they pass on results to a procedure that will determine the final return value example

(define (integer-and-fraction number c)
  ; ...calculations...
  (c integer fraction))

(integer-and-fraction 3.72 (l (integer fraction) #t))

{data-type}-{action}

  • example: list-sort, vector-ref
  • list procedures might be ok without a data-type prefix because lists are a very fundamental data-type in scheme

{namespace}-{name}

using a "namespace" name, for example the last part of a module name, for a set of procedures

rest

for everything that can be considered the rest of something. might be used as a suffix (l (e . e-rest) #t)

result

the result of a procedure or similar. usually in cases where things have to happen before the result is returned but after it has been created

identifier infixes

-

  • for word separation
  • separate words should be infixed. sometimes it can be difficult to decide where to separate. for example is it filename or file-name?
  • the minus to separate words in identifiers is widely employed in scheme (vector-* list-* hashtable-*) and it has a consistency benefit to use it
  • no "camelcase" - camelcase takes 26 additional symbols as 26 different word delimiters to make identifiers less readable, more difficult to type (shift key and the pressing of simultaneous keys every time, minus does not need the shift key), more difficult to split (automatic analysis, editor functions) and introduces new possible complexity about how and when a character will be uppercased or not (capitalisation, abbreviations in identifiers)
  • regarding typing effort: an identifier has usually at most one to two delimiters, which may be 10 percent of all characters. but the difference makes no practical difference. code is much more often read than written, and writing is easier, faster and more correct if it can be checked with less effort while writing
  • separate words should use a delimiter - so no empty-string delimiter. words are much harder to read when there is no visible separation, where the delimiter is basically the empty-string
  • negative example: r6rs introduced a new, competing naming scheme. for example with the flonum procedures that all use a "fl" prefix. the actual name of the procedure is significantly harder to discern - reading through the binding name listings, looking for procedures of interest, it is more difficult to discern the parts of the word and mask out the "fl" from flatan, flodd? flfinite? fl/ flabs and so on, to get the needed information. i am sure the inventor finds enough reasons for himself not to agree and to believe that this is optimal. could the type have been called float? so that it would be float-zero? float-odd? - it is a data type like others. could procedures have been named fl-atan fl-odd? fl-finite? or flo-atan flo-odd? flo-finite?

->

  • for "to"
  • for mapping/transformation/conversion/dictionaries or similar
  • parameters: a procedure that does a conversion should not repeat its source-argument data-type in the variable-name and should use an unspecific identifier
  • example: number->string :: a
  • example with redundant information: number->string :: number
  • variable names can create semantic dependencies to the type for example and this should be avoided where practical

/

  • alternative, or
  • example: port/path

with, and, if, or

  • call-with-output-file
  • integer-and-fraction
  • false-if-exception
  • first-or-false

identifier suffixes

?

  • for procedures that evaluate to booleans (asking a question)
  • examples: list? equal? contains?

*

for, preferably temporary, derivations/extensions of existing bindings. (let* and-let* define* lambda*) are examples

-{integer}

enumerations of similar variables example: x-1 x-2 x-3

!

for destructive or otherwise side-effecting procedures where the warning character of it might make sense

-p

for procedures that take an extra predicate procedure the rnrs procedures "remove" and "removep", where "remove" ist more general, are an example suffixing the "p" without minus may lead to weird words

-s

for bindings explicitly marked as syntax. for example syntax/macro alternatives to a procedure

-c

for "continue". when a mapping procedure for an iteration/morphism has a procedure parameter that allows to control if the iteration continues or not

identifier prefixes

call-with- with-

filename suffix

.scm

principles

no arbitrary abbreviations

only word abbreviations built by shortening the length of words, without removing inbetween characters. the long words are then easier to guess. more arbitrary abbreviations can be confusing in that they are much more like new unrelated words that have to be learnt as such, or because they are ambiguous and could mean multiple things, and make many different variations by different authors possible

examples

  • "abbrev" instead of "abvtn" for the word "abbreviation"
  • "ele" instead of "elmt" for the word "element"

one-letter identifiers

no one-letter identifiers unless it is easy to figure out, almost obvious, what they mean. one possible rule is that the place where they are introduced should be in sight. eventually not using them if words used instead save documentation. not more than a limited amount of one-letter variables in one scope, maybe 4 or so. too much lookup required from the reader

advantages of one-letter identifiers

  • structure is more visible. therefore potentially less noise
  • shorter line widths

disadvantages

do not state their meaning

possible ambiguity

example: e: exception? element? event? error?

they are more difficult to search

individual variable names may be used as pointers for quick in-editor navigation by search. quick navigation to specific variables may also be necessary when errors have occured

they are more difficult to replace

sometimes an identifier needs to get a new name. usually just the pattern as is needs to be searched an replaced. with one letter identifiers, all possible delimiters of a language must be considered when searching, and their scope

comments to describe the variables

  • the meaning of variables should at least be stated in a comment if it is not obvious otherwise
  • include the meaning or long names of short variable names somewhere quick to navigate to and see
  • i have seen extensive use of one-letter variables in implementations of algorithms of academical source, with links to elaborate papers and books beneath them, papers that were too long and convoluted to be parsed for the actual meaning of the variables
  • it could theoretically just have used monotonically numbered identifiers, it would have been the same. it should not always be necessary to buy and read every related book to figure out the meaning of variables, that is much too time consuming and financially costly. to translate the variable names while reading, in the end one needs a lookup table, maybe printed out. and the variables are not that visible to easily keep track of them because the names are so short
  • at least state what the variable stands for in the code. this could be done with a comment like so: d difference i intersection
  • the most important thing is that the information is available - avoid letting the reader guess

filename suffix

.scm

  • sc(he)m(e)
  • not a shorter-only abbreviation (instead of sch), but i get it
  • it is a combination that is quite distinct
  • follows the convention of how other extensions are created, by being some abbreviation of the programs or languages name

other suffixes advocated for or in use

  • some scheme implementations specify more filename extensions for plain scheme code for slightly different purposes. purposes which for the programmer thinking on the language - and not implementation - level are, i claim, irrelevantly different
  • i find it preposterous for the same one format/syntax to take up 6 or 7 different suffixes. and their meanings have all to be memorised and dealt with by a language user, and conflicts with other file formats are predestined. most of the existing usage of filename extensions makes it so, that the only thing that a filename suffix gives information about is the format - one suffix corresponds to one format; which is much simpler
  • for example for audio stored in the flac format it is "somefile.flac", not "somefile.jazz", "somefile.pop" and so on.
  • the proposed extra suffixes are to make it so that several suffixes correspond to one format for different intended usages, usage by the interpreter, not so much by the user. on the implementation level it is likely to lead to more stat calls when searching for code to import. there is no appropriate benefit for dropping the simplicity and consistency in interpreting filename extensions by introducing new exceptional rules and complication.

negative examples

more examples here: http://larceny.ccs.neu.edu/doc/user-manual.chunked/ar01s05.html

.ss
  • "scheme source", really? how about .sf "scheme file", .sc "scheme code"
  • and in what situation would it be beneficial to add an extra s to extensions, especially if not done with every source filename extension in use, which would look for example like this: shs jss csss rbs cs
  • and it is vague, everything can be a source of something else
.sls
  • the same about the dedicated "source" letter applies
  • why would we need an extra suffix stating it being a library? to have a non-library and library with the same name but different suffix in the same directory? is there any imaginable benefit in using "mycode.sls" and "mycode.ss" in the same directory that pays off more than the complication costs? i have since learned of the possibility of plain including code instead of importing modules when defining libraries where this could have some use
  • i would be more accepting if there were restrictions on the name, for example before the last suffix, but on the last suffix? it is still scheme-content and the suffix should tell about the format/type. and leave the user the freedom to change contents without having to rename the suffix. sometimes non-libraries become libraries, files that use import become files that do not use import - and anyway, what is the difference and why should the programmer and scheme implementation have to be so concerned and deal with changing, learning about and managing those file-name extensions
.s

already widely used for other formats

specifically excluded naming styles

figurative speech

  • let us try to speak the practically the same language, not fantasy languages of vagueness and arbitrary personal associations that can only make sense to the author or small groups of insiders
  • better avoid to create bindings for "fairies that hop onto trains" for adding elements to a list, or "creating steam-boats on a river towards the evening sun" for sending data packets on a socket to the client or something. "(define boat (create-steam-boat 3)) (send-towards-sun (create-sun (quote evening)) boat)". i have seen such things in real-life production code. one one hand it is like reading a fantasy book, on the other hand it is needlessly obscuring and creating unnecessary knowledge

prefixes for data types

in a dynamically typed language the type can change at any time and that should be no problem. a type prefix burdens the developer with repeated variable-name refactoring when trying out things and modifying the code or type-casting. it is incredibly syntactically noisy. it is also not trustworthy and like comments quickly outdated and misleading

prefix for local variables

  • syntactic noise without a good reason. in this scenario you want "variable" and get "$variable". one benefit of this is that routines can have the same name as variables, but count the years until you find use for that. in some languages the prefix is intended to use the identifiers in string interpolation, where it inserts its value into strings
  • it is not necessary for mark variable-names to differentiate them from globals. marking them function local makes no sense either

underscore

no need, the minus is allowed

camelcase

avoid the camelcase craze. code that uses camelcase requires more documentation because the identifiers are not well readable and groupable based on their parts. systematic prefixing for identifier groups becomes nearly impractical. it requires upper-case characters and adds complexity and arbitrariness concerning the rules of when something has to be uppercase and when not. sub-words begin with uppercase characters depending on context - if they are in or at the end of word. two different patterns for the same words are two different patterns to recognise for the same goal. the shift key is also not usually as easy to reach on keyboards than the minus