2025-10-04

applied software engineering

maintenance

objective: reach productive change velocity in an unfamiliar codebase with minimal risk
scope: analysis, debugging, modification, verification
constraint: technology-agnostic

workflow overview

establish reproducibility: deterministic setup, stable failure reproduction
build a minimal working model: environment, system topology, data, behavior
locate the smallest change surface: isolate path, component, input
apply a safe patch: half-automated where feasible
verify: targeted tests, invariants, operational signals
document deltas: facts, assumptions, open questions

codebase analysis

core questions

how do you execute and reproduce the system locally
what inputs reproduce the observed bug or feature path
what are the main entry points (apis, commands, uis)
what is the routing map from entry point to handler
which components mutate shared state
what persistent state is involved (tables, files, indexes)
what identifiers map across code, database, user interface
what key invariants or contracts must hold
what are the main sources of nondeterminism (time, randomness, concurrency)
what feature toggles change behavior
how are errors surfaced (logs, codes)
what metrics or dashboards are monitored for this path
what are the most churned or defect-prone files
what test coverage exists for this path
who are the maintainers or owners of this code
what past fixes or designs are documented
what is the directory structure and naming convention
how do artifacts such as logs, errors, or ui strings map back to code
what types of structured facts can be derived automatically from this codebase
what automated methods exist to obtain those facts efficiently

environment and reproducibility

facts
- execution commands, environment, ports, minimal request trace, seed strategy
- dependencies: build, test, runtime
- configuration sources and precedence
- resource limits and variability sources
- platform constraints: os, architecture, runtime, toolchain
outputs
- one command to run and reproduce
- fixture capturing inputs and expected outputs

system topology

facts
- components and roles, routing maps, interceptors
- sync vs async edges with latency and capacity notes
- singleton vs replicated elements, blast radii
outputs
- minimal path diagram from entry to effect
- list of mutators on the path

data and schema

facts
- schemas, relations, indexes, migrations
- identifier names across routes, dtos, configs, storage, i18n
- serialization rules and compatibility strategy
- localization bundles and fallbacks
- retention and archival policies
outputs
- entity-relationship slice for the path
- example payloads before and after transformation

behavior and contracts

facts
- authentication, authorization, permission rules
- input schemas, normalization, server checks
- preconditions, postconditions, concurrency model
- isolation levels, timing assumptions, clock usage
- randomness sources and seed handling
- feature flags and rollout rules
outputs
- contract checklist for the path
- toggles affecting target behavior

operations and policy

facts
- trace correlation across logs and metrics
- error dictionary: codes, messages, status, owners
- dashboards, alerts, slos, paging rules
- compliance and audit scope
outputs
- error-to-owner lookup for target codes
- observable signals to validate a fix

evolution and history

facts
- defect-prone files and co-change clusters
- property tests and formal specs
- end-of-life items, upgrade plans, release cadence
- concept alignment: canonical terms vs aliases
outputs
- risk map of files on the path
- minimal test set to guard the change

socio-technical context

facts
- maintainers, escalation paths
- architectural decision records, design docs, runbooks
- known socio-technical incongruences
outputs
- stakeholder list for review
- knowledge and documentation gaps

navigation aids

techniques
- keyword search in source
- structural search with symbol indexers or language servers
- filesystem mapping of layout, naming, code-to-feature mapping
- configuration mapping from descriptors to routes, entities, resources
- call path tracing through handlers and modules
- artifact linking from observed outputs to source locations
outputs
- index of relevant files and configs
- cross-links between observed behavior and code locations

fact extraction

categories
- source symbols: functions, classes, variables, definitions, references
- dependencies: package manifests, imports, version constraints
- configuration descriptors in yaml, json, xml, ini
- routing maps from paths to handlers
- database schemas, fields, indexes, migrations
- build and packaging artifacts
- filesystem layout, file types, churn metrics
- version control history, churn hotspots, author mapping
- localization tables and string bundles
- template and asset inclusion trees
outputs
- structured fact tables
- basis for system topology, data schema, and behavioral contracts

fact collection techniques

techniques
- static analysis of source and configs
- dynamic analysis with execution traces
- symbol indexing via language servers
- version history mining for churn and co-change
- configuration parsing from manifests and descriptors
outputs
- reproducible fact datasets
- links between fact categories across models

exploration and debugging techniques

core questions

can the problem be reproduced deterministically
what data can be observed at the failure point without altering behavior
what is the smallest input or path that still triggers the failure
where in the codebase is the failure first observable
what changes in history correlate with the failure
what invariants or contracts are violated
how can the problem be replayed or simulated to confirm a fix

replicate - observe - deduce

replicate: deterministic reproduction as precondition
observe: collect objective data only (logs, traces, snapshots)
deduce: eliminate impossibilities, converge on causes
properties: cyclic, convergent, tool-agnostic, domain-independent

isolation

path narrowing by disabling or bypassing segments
input minimization to find the smallest failing case
binary search across versions, commits, config flags

tracing and instrumentation

targeted logging on path boundaries
request identifiers for correlation
snapshots of state at invariant boundaries

history-guided search

version bisection to locate defect introduction
spectrum-based fault localization using coverage deltas

semantic debugging

algorithmic debugging via execution trees and developer confirmations
contract checks inserted at pre and post points

determinism aids

record and replay of inputs and nondeterministic sources
seed control for randomness
clock stubbing for time dependencies

patch application

core questions

what is the smallest safe surface for applying a change
how can the edit be constrained to preserve structure and contracts
what automated aids can reduce human error in applying the patch

half-automated edits

ast patching for structure-preserving changes
function-level replacement where isolation is clear
llm-assisted edits constrained by anchors to avoid offset drift

safeguards

diff validation against contracts and invariants
regeneration of fixtures and replay traces

verification

core questions

how can the fix be confirmed against the original failure
what additional regressions must be guarded against
what signals prove correctness under operational conditions

methods

targeted tests aligned to the minimal path
negative tests for observed failures
operational checks on live signals within agreed slos
rollback and guardrail criteria defined before deploy

pairwise-invertible triplets

definition: triplet where any two elements determine the third
examples:
- program, input -> output
- state0, transition -> state1
- schema, instance -> validation result
- contract, behavior -> verdict
tactics:
- identify two controllable elements and compute the third
- invert to synthesize inputs or states that expose discrepancies
- compare computed vs observed third to locate violations

fundamental concepts

side effects: external state transitions
mutability: bounds and ownership of changeable data
evaluation order: explicit sequencing where it matters
coupling: degree of interdependency between modules
referential transparency: outputs depend only on inputs
typing: tradeoffs between detection and flexibility
variables: pointers to state, not abstract values
idempotency: retry semantics and keys on mutating paths

outputs and artifacts

single-command reproduction with fixture inputs
path diagram and mutator list
contract checklist and toggle map
error dictionary slice with owners
minimal targeted test set
risk map and review stakeholders
patch diff with invariant proofs and replay logs

scope boundaries

no technology-specific prescriptions
primitives compose across languages, runtimes, platforms
prefer orthogonal facts and reversible transformations