# applied software engineering

## maintenance
* objective: reach productive change velocity in an unfamiliar codebase with minimal risk
* scope: analysis, debugging, modification, verification
* constraint: technology-agnostic

### workflow overview
* establish reproducibility: deterministic setup, stable failure reproduction
* build a minimal working model: environment, system topology, data, behavior
* locate the smallest change surface: isolate path, component, input
* apply a safe patch: half-automated where feasible
* verify: targeted tests, invariants, operational signals
* document deltas: facts, assumptions, open questions

### codebase analysis

#### core questions
* how do you execute and reproduce the system locally
* what inputs reproduce the observed bug or feature path
* what are the main entry points (apis, commands, uis)
* what is the routing map from entry point to handler
* which components mutate shared state
* what persistent state is involved (tables, files, indexes)
* what identifiers map across code, database, user interface
* what key invariants or contracts must hold
* what are the main sources of nondeterminism (time, randomness, concurrency)
* what feature toggles change behavior
* how are errors surfaced (logs, codes)
* what metrics or dashboards are monitored for this path
* what are the most churned or defect-prone files
* what test coverage exists for this path
* who are the maintainers or owners of this code
* what past fixes or designs are documented
* what is the directory structure and naming convention
* how do artifacts such as logs, errors, or ui strings map back to code
* what types of structured facts can be derived automatically from this codebase
* what automated methods exist to obtain those facts efficiently

#### environment and reproducibility
* facts
  * execution commands, environment, ports, minimal request trace, seed strategy
  * dependencies: build, test, runtime
  * configuration sources and precedence
  * resource limits and variability sources
  * platform constraints: os, architecture, runtime, toolchain
* outputs
  * one command to run and reproduce
  * fixture capturing inputs and expected outputs

#### system topology
* facts
  * components and roles, routing maps, interceptors
  * sync vs async edges with latency and capacity notes
  * singleton vs replicated elements, blast radii
* outputs
  * minimal path diagram from entry to effect
  * list of mutators on the path

#### data and schema
* facts
  * schemas, relations, indexes, migrations
  * identifier names across routes, dtos, configs, storage, i18n
  * serialization rules and compatibility strategy
  * localization bundles and fallbacks
  * retention and archival policies
* outputs
  * entity-relationship slice for the path
  * example payloads before and after transformation

#### behavior and contracts
* facts
  * authentication, authorization, permission rules
  * input schemas, normalization, server checks
  * preconditions, postconditions, concurrency model
  * isolation levels, timing assumptions, clock usage
  * randomness sources and seed handling
  * feature flags and rollout rules
* outputs
  * contract checklist for the path
  * toggles affecting target behavior

#### operations and policy
* facts
  * trace correlation across logs and metrics
  * error dictionary: codes, messages, status, owners
  * dashboards, alerts, slos, paging rules
  * compliance and audit scope
* outputs
  * error-to-owner lookup for target codes
  * observable signals to validate a fix

#### evolution and history
* facts
  * defect-prone files and co-change clusters
  * property tests and formal specs
  * end-of-life items, upgrade plans, release cadence
  * concept alignment: canonical terms vs aliases
* outputs
  * risk map of files on the path
  * minimal test set to guard the change

#### socio-technical context
* facts
  * maintainers, escalation paths
  * architectural decision records, design docs, runbooks
  * known socio-technical incongruences
* outputs
  * stakeholder list for review
  * knowledge and documentation gaps

#### navigation aids
* techniques
  * keyword search in source
  * structural search with symbol indexers or language servers
  * filesystem mapping of layout, naming, code-to-feature mapping
  * configuration mapping from descriptors to routes, entities, resources
  * call path tracing through handlers and modules
  * artifact linking from observed outputs to source locations
* outputs
  * index of relevant files and configs
  * cross-links between observed behavior and code locations

#### fact extraction
* categories
  * source symbols: functions, classes, variables, definitions, references
  * dependencies: package manifests, imports, version constraints
  * configuration descriptors in yaml, json, xml, ini
  * routing maps from paths to handlers
  * database schemas, fields, indexes, migrations
  * build and packaging artifacts
  * filesystem layout, file types, churn metrics
  * version control history, churn hotspots, author mapping
  * localization tables and string bundles
  * template and asset inclusion trees
* outputs
  * structured fact tables
  * basis for system topology, data schema, and behavioral contracts

#### fact collection techniques
* techniques
  * static analysis of source and configs
  * dynamic analysis with execution traces
  * symbol indexing via language servers
  * version history mining for churn and co-change
  * configuration parsing from manifests and descriptors
* outputs
  * reproducible fact datasets
  * links between fact categories across models

### exploration and debugging techniques

#### core questions
* can the problem be reproduced deterministically
* what data can be observed at the failure point without altering behavior
* what is the smallest input or path that still triggers the failure
* where in the codebase is the failure first observable
* what changes in history correlate with the failure
* what invariants or contracts are violated
* how can the problem be replayed or simulated to confirm a fix

#### replicate - observe - deduce
* replicate: deterministic reproduction as precondition
* observe: collect objective data only (logs, traces, snapshots)
* deduce: eliminate impossibilities, converge on causes
* properties: cyclic, convergent, tool-agnostic, domain-independent

#### isolation
* path narrowing by disabling or bypassing segments
* input minimization to find the smallest failing case
* binary search across versions, commits, config flags

#### tracing and instrumentation
* targeted logging on path boundaries
* request identifiers for correlation
* snapshots of state at invariant boundaries

#### history-guided search
* version bisection to locate defect introduction
* spectrum-based fault localization using coverage deltas

#### semantic debugging
* algorithmic debugging via execution trees and developer confirmations
* contract checks inserted at pre and post points

#### determinism aids
* record and replay of inputs and nondeterministic sources
* seed control for randomness
* clock stubbing for time dependencies

### patch application

#### core questions
* what is the smallest safe surface for applying a change
* how can the edit be constrained to preserve structure and contracts
* what automated aids can reduce human error in applying the patch

#### half-automated edits
* ast patching for structure-preserving changes
* function-level replacement where isolation is clear
* llm-assisted edits constrained by anchors to avoid offset drift

#### safeguards
* diff validation against contracts and invariants
* regeneration of fixtures and replay traces

### verification

#### core questions
* how can the fix be confirmed against the original failure
* what additional regressions must be guarded against
* what signals prove correctness under operational conditions

#### methods
* targeted tests aligned to the minimal path
* negative tests for observed failures
* operational checks on live signals within agreed slos
* rollback and guardrail criteria defined before deploy

### pairwise-invertible triplets
* definition: triplet where any two elements determine the third
* examples:
  * program, input -> output
  * state0, transition -> state1
  * schema, instance -> validation result
  * contract, behavior -> verdict
* tactics:
  * identify two controllable elements and compute the third
  * invert to synthesize inputs or states that expose discrepancies
  * compare computed vs observed third to locate violations

### fundamental concepts
* side effects: external state transitions
* mutability: bounds and ownership of changeable data
* evaluation order: explicit sequencing where it matters
* coupling: degree of interdependency between modules
* referential transparency: outputs depend only on inputs
* typing: tradeoffs between detection and flexibility
* variables: pointers to state, not abstract values
* idempotency: retry semantics and keys on mutating paths

### outputs and artifacts
* single-command reproduction with fixture inputs
* path diagram and mutator list
* contract checklist and toggle map
* error dictionary slice with owners
* minimal targeted test set
* risk map and review stakeholders
* patch diff with invariant proofs and replay logs

### scope boundaries
* no technology-specific prescriptions
* primitives compose across languages, runtimes, platforms
* prefer orthogonal facts and reversible transformations