# applied software engineering ## maintenance * objective: reach productive change velocity in an unfamiliar codebase with minimal risk * scope: analysis, debugging, modification, verification * constraint: technology-agnostic ### workflow overview * establish reproducibility: deterministic setup, stable failure reproduction * build a minimal working model: environment, system topology, data, behavior * locate the smallest change surface: isolate path, component, input * apply a safe patch: half-automated where feasible * verify: targeted tests, invariants, operational signals * document deltas: facts, assumptions, open questions ### codebase analysis #### core questions * how do you execute and reproduce the system locally * what inputs reproduce the observed bug or feature path * what are the main entry points (apis, commands, uis) * what is the routing map from entry point to handler * which components mutate shared state * what persistent state is involved (tables, files, indexes) * what identifiers map across code, database, user interface * what key invariants or contracts must hold * what are the main sources of nondeterminism (time, randomness, concurrency) * what feature toggles change behavior * how are errors surfaced (logs, codes) * what metrics or dashboards are monitored for this path * what are the most churned or defect-prone files * what test coverage exists for this path * who are the maintainers or owners of this code * what past fixes or designs are documented * what is the directory structure and naming convention * how do artifacts such as logs, errors, or ui strings map back to code * what types of structured facts can be derived automatically from this codebase * what automated methods exist to obtain those facts efficiently #### environment and reproducibility * facts * execution commands, environment, ports, minimal request trace, seed strategy * dependencies: build, test, runtime * configuration sources and precedence * resource limits and variability sources * platform constraints: os, architecture, runtime, toolchain * outputs * one command to run and reproduce * fixture capturing inputs and expected outputs #### system topology * facts * components and roles, routing maps, interceptors * sync vs async edges with latency and capacity notes * singleton vs replicated elements, blast radii * outputs * minimal path diagram from entry to effect * list of mutators on the path #### data and schema * facts * schemas, relations, indexes, migrations * identifier names across routes, dtos, configs, storage, i18n * serialization rules and compatibility strategy * localization bundles and fallbacks * retention and archival policies * outputs * entity-relationship slice for the path * example payloads before and after transformation #### behavior and contracts * facts * authentication, authorization, permission rules * input schemas, normalization, server checks * preconditions, postconditions, concurrency model * isolation levels, timing assumptions, clock usage * randomness sources and seed handling * feature flags and rollout rules * outputs * contract checklist for the path * toggles affecting target behavior #### operations and policy * facts * trace correlation across logs and metrics * error dictionary: codes, messages, status, owners * dashboards, alerts, slos, paging rules * compliance and audit scope * outputs * error-to-owner lookup for target codes * observable signals to validate a fix #### evolution and history * facts * defect-prone files and co-change clusters * property tests and formal specs * end-of-life items, upgrade plans, release cadence * concept alignment: canonical terms vs aliases * outputs * risk map of files on the path * minimal test set to guard the change #### socio-technical context * facts * maintainers, escalation paths * architectural decision records, design docs, runbooks * known socio-technical incongruences * outputs * stakeholder list for review * knowledge and documentation gaps #### navigation aids * techniques * keyword search in source * structural search with symbol indexers or language servers * filesystem mapping of layout, naming, code-to-feature mapping * configuration mapping from descriptors to routes, entities, resources * call path tracing through handlers and modules * artifact linking from observed outputs to source locations * outputs * index of relevant files and configs * cross-links between observed behavior and code locations #### fact extraction * categories * source symbols: functions, classes, variables, definitions, references * dependencies: package manifests, imports, version constraints * configuration descriptors in yaml, json, xml, ini * routing maps from paths to handlers * database schemas, fields, indexes, migrations * build and packaging artifacts * filesystem layout, file types, churn metrics * version control history, churn hotspots, author mapping * localization tables and string bundles * template and asset inclusion trees * outputs * structured fact tables * basis for system topology, data schema, and behavioral contracts #### fact collection techniques * techniques * static analysis of source and configs * dynamic analysis with execution traces * symbol indexing via language servers * version history mining for churn and co-change * configuration parsing from manifests and descriptors * outputs * reproducible fact datasets * links between fact categories across models ### exploration and debugging techniques #### core questions * can the problem be reproduced deterministically * what data can be observed at the failure point without altering behavior * what is the smallest input or path that still triggers the failure * where in the codebase is the failure first observable * what changes in history correlate with the failure * what invariants or contracts are violated * how can the problem be replayed or simulated to confirm a fix #### replicate - observe - deduce * replicate: deterministic reproduction as precondition * observe: collect objective data only (logs, traces, snapshots) * deduce: eliminate impossibilities, converge on causes * properties: cyclic, convergent, tool-agnostic, domain-independent #### isolation * path narrowing by disabling or bypassing segments * input minimization to find the smallest failing case * binary search across versions, commits, config flags #### tracing and instrumentation * targeted logging on path boundaries * request identifiers for correlation * snapshots of state at invariant boundaries #### history-guided search * version bisection to locate defect introduction * spectrum-based fault localization using coverage deltas #### semantic debugging * algorithmic debugging via execution trees and developer confirmations * contract checks inserted at pre and post points #### determinism aids * record and replay of inputs and nondeterministic sources * seed control for randomness * clock stubbing for time dependencies ### patch application #### core questions * what is the smallest safe surface for applying a change * how can the edit be constrained to preserve structure and contracts * what automated aids can reduce human error in applying the patch #### half-automated edits * ast patching for structure-preserving changes * function-level replacement where isolation is clear * llm-assisted edits constrained by anchors to avoid offset drift #### safeguards * diff validation against contracts and invariants * regeneration of fixtures and replay traces ### verification #### core questions * how can the fix be confirmed against the original failure * what additional regressions must be guarded against * what signals prove correctness under operational conditions #### methods * targeted tests aligned to the minimal path * negative tests for observed failures * operational checks on live signals within agreed slos * rollback and guardrail criteria defined before deploy ### pairwise-invertible triplets * definition: triplet where any two elements determine the third * examples: * program, input -> output * state0, transition -> state1 * schema, instance -> validation result * contract, behavior -> verdict * tactics: * identify two controllable elements and compute the third * invert to synthesize inputs or states that expose discrepancies * compare computed vs observed third to locate violations ### fundamental concepts * side effects: external state transitions * mutability: bounds and ownership of changeable data * evaluation order: explicit sequencing where it matters * coupling: degree of interdependency between modules * referential transparency: outputs depend only on inputs * typing: tradeoffs between detection and flexibility * variables: pointers to state, not abstract values * idempotency: retry semantics and keys on mutating paths ### outputs and artifacts * single-command reproduction with fixture inputs * path diagram and mutator list * contract checklist and toggle map * error dictionary slice with owners * minimal targeted test set * risk map and review stakeholders * patch diff with invariant proofs and replay logs ### scope boundaries * no technology-specific prescriptions * primitives compose across languages, runtimes, platforms * prefer orthogonal facts and reversible transformations