2025-10-04

applied software engineering

maintenance

  • objective: reach productive change velocity in an unfamiliar codebase with minimal risk
  • scope: analysis, debugging, modification, verification
  • constraint: technology-agnostic

workflow overview

  • establish reproducibility: deterministic setup, stable failure reproduction
  • build a minimal working model: environment, system topology, data, behavior
  • locate the smallest change surface: isolate path, component, input
  • apply a safe patch: half-automated where feasible
  • verify: targeted tests, invariants, operational signals
  • document deltas: facts, assumptions, open questions

codebase analysis

core questions

  • how do you execute and reproduce the system locally
  • what inputs reproduce the observed bug or feature path
  • what are the main entry points (apis, commands, uis)
  • what is the routing map from entry point to handler
  • which components mutate shared state
  • what persistent state is involved (tables, files, indexes)
  • what identifiers map across code, database, user interface
  • what key invariants or contracts must hold
  • what are the main sources of nondeterminism (time, randomness, concurrency)
  • what feature toggles change behavior
  • how are errors surfaced (logs, codes)
  • what metrics or dashboards are monitored for this path
  • what are the most churned or defect-prone files
  • what test coverage exists for this path
  • who are the maintainers or owners of this code
  • what past fixes or designs are documented
  • what is the directory structure and naming convention
  • how do artifacts such as logs, errors, or ui strings map back to code
  • what types of structured facts can be derived automatically from this codebase
  • what automated methods exist to obtain those facts efficiently

environment and reproducibility

  • facts

    • execution commands, environment, ports, minimal request trace, seed strategy
    • dependencies: build, test, runtime
    • configuration sources and precedence
    • resource limits and variability sources
    • platform constraints: os, architecture, runtime, toolchain
  • outputs

    • one command to run and reproduce

    • fixture capturing inputs and expected outputs

system topology

  • facts

    • components and roles, routing maps, interceptors
    • sync vs async edges with latency and capacity notes
    • singleton vs replicated elements, blast radii
  • outputs

    • minimal path diagram from entry to effect

    • list of mutators on the path

data and schema

  • facts

    • schemas, relations, indexes, migrations
    • identifier names across routes, dtos, configs, storage, i18n
    • serialization rules and compatibility strategy
    • localization bundles and fallbacks
    • retention and archival policies
  • outputs

    • entity-relationship slice for the path

    • example payloads before and after transformation

behavior and contracts

  • facts

    • authentication, authorization, permission rules
    • input schemas, normalization, server checks
    • preconditions, postconditions, concurrency model
    • isolation levels, timing assumptions, clock usage
    • randomness sources and seed handling
    • feature flags and rollout rules
  • outputs

    • contract checklist for the path

    • toggles affecting target behavior

operations and policy

  • facts

    • trace correlation across logs and metrics
    • error dictionary: codes, messages, status, owners
    • dashboards, alerts, slos, paging rules
    • compliance and audit scope
  • outputs

    • error-to-owner lookup for target codes

    • observable signals to validate a fix

evolution and history

  • facts

    • defect-prone files and co-change clusters
    • property tests and formal specs
    • end-of-life items, upgrade plans, release cadence
    • concept alignment: canonical terms vs aliases
  • outputs

    • risk map of files on the path

    • minimal test set to guard the change

socio-technical context

  • facts

    • maintainers, escalation paths
    • architectural decision records, design docs, runbooks
    • known socio-technical incongruences
  • outputs

    • stakeholder list for review

    • knowledge and documentation gaps

navigation aids

  • techniques

    • keyword search in source
    • structural search with symbol indexers or language servers
    • filesystem mapping of layout, naming, code-to-feature mapping
    • configuration mapping from descriptors to routes, entities, resources
    • call path tracing through handlers and modules
    • artifact linking from observed outputs to source locations
  • outputs

    • index of relevant files and configs

    • cross-links between observed behavior and code locations

fact extraction

  • categories

    • source symbols: functions, classes, variables, definitions, references
    • dependencies: package manifests, imports, version constraints
    • configuration descriptors in yaml, json, xml, ini
    • routing maps from paths to handlers
    • database schemas, fields, indexes, migrations
    • build and packaging artifacts
    • filesystem layout, file types, churn metrics
    • version control history, churn hotspots, author mapping
    • localization tables and string bundles
    • template and asset inclusion trees
  • outputs

    • structured fact tables

    • basis for system topology, data schema, and behavioral contracts

fact collection techniques

  • techniques

    • static analysis of source and configs
    • dynamic analysis with execution traces
    • symbol indexing via language servers
    • version history mining for churn and co-change
    • configuration parsing from manifests and descriptors
  • outputs

    • reproducible fact datasets

    • links between fact categories across models

exploration and debugging techniques

core questions

  • can the problem be reproduced deterministically
  • what data can be observed at the failure point without altering behavior
  • what is the smallest input or path that still triggers the failure
  • where in the codebase is the failure first observable
  • what changes in history correlate with the failure
  • what invariants or contracts are violated
  • how can the problem be replayed or simulated to confirm a fix

replicate - observe - deduce

  • replicate: deterministic reproduction as precondition
  • observe: collect objective data only (logs, traces, snapshots)
  • deduce: eliminate impossibilities, converge on causes
  • properties: cyclic, convergent, tool-agnostic, domain-independent

isolation

  • path narrowing by disabling or bypassing segments
  • input minimization to find the smallest failing case
  • binary search across versions, commits, config flags

tracing and instrumentation

  • targeted logging on path boundaries
  • request identifiers for correlation
  • snapshots of state at invariant boundaries

history-guided search

  • version bisection to locate defect introduction
  • spectrum-based fault localization using coverage deltas

semantic debugging

  • algorithmic debugging via execution trees and developer confirmations
  • contract checks inserted at pre and post points

determinism aids

  • record and replay of inputs and nondeterministic sources
  • seed control for randomness
  • clock stubbing for time dependencies

patch application

core questions

  • what is the smallest safe surface for applying a change
  • how can the edit be constrained to preserve structure and contracts
  • what automated aids can reduce human error in applying the patch

half-automated edits

  • ast patching for structure-preserving changes
  • function-level replacement where isolation is clear
  • llm-assisted edits constrained by anchors to avoid offset drift

safeguards

  • diff validation against contracts and invariants
  • regeneration of fixtures and replay traces

verification

core questions

  • how can the fix be confirmed against the original failure
  • what additional regressions must be guarded against
  • what signals prove correctness under operational conditions

methods

  • targeted tests aligned to the minimal path
  • negative tests for observed failures
  • operational checks on live signals within agreed slos
  • rollback and guardrail criteria defined before deploy

pairwise-invertible triplets

  • definition: triplet where any two elements determine the third
  • examples:

    • program, input -> output
    • state0, transition -> state1
    • schema, instance -> validation result
    • contract, behavior -> verdict
  • tactics:

    • identify two controllable elements and compute the third

    • invert to synthesize inputs or states that expose discrepancies

    • compare computed vs observed third to locate violations

fundamental concepts

  • side effects: external state transitions
  • mutability: bounds and ownership of changeable data
  • evaluation order: explicit sequencing where it matters
  • coupling: degree of interdependency between modules
  • referential transparency: outputs depend only on inputs
  • typing: tradeoffs between detection and flexibility
  • variables: pointers to state, not abstract values
  • idempotency: retry semantics and keys on mutating paths

outputs and artifacts

  • single-command reproduction with fixture inputs
  • path diagram and mutator list
  • contract checklist and toggle map
  • error dictionary slice with owners
  • minimal targeted test set
  • risk map and review stakeholders
  • patch diff with invariant proofs and replay logs

scope boundaries

  • no technology-specific prescriptions
  • primitives compose across languages, runtimes, platforms
  • prefer orthogonal facts and reversible transformations