2026-05-15

character link score

the character link score measures how strongly a character can be connected to other characters through visual containment and pinyin similarity. the central relation is the ordered overlap between the pinyin of a character and the pinyin of characters that contain it, or are contained by it. higher overlap and more numerous linked characters produce a higher score.

pinyin is split into comparison units before scoring. the multi-character pinyin groups zh, ch, and sh are treated as single units. other letters and tone numbers are treated as individual units. pinyin is always represented in number format, such as yi1, fan2, and nao3.

the pinyin comparison is computed as a weighted longest common subsequence: the maximum total unit similarity over all order-preserving alignments, normalized by the longer unit length. gaps are allowed, continuity is not required, and each unit can participate in at most one alignment step. this makes the score sensitive to ordered phonetic resemblance without requiring exact equality.

unit similarity defines the contribution of each aligned pair. identical units have full similarity, with value 1. similar phonetic units have partial similarity according to a fixed similarity table. tone 2 and tone 4 have a small contrast similarity when compared in either direction, for example about 0.25.

tone contributes through the same unit comparison. identical tones strengthen the link. tone 2 and tone 4 can function as an opposition pair, so a 2/4 or 4/2 comparison contributes a small positive similarity.

outward

outward scoring measures the productivity of a character as a component in characters that contain it. productivity is not determined by raw containment frequency. a component that appears in many containing characters but shares little pinyin overlap with them is not productive. the useful quantity is the count of containing characters with sufficient ordered pinyin overlap.

the strongest outward case is when a containing character has identical pinyin to the component, because this creates a direct visual and phonetic link with minimal mediation. where pinyin overlap is low across all containing characters, the outward score is simply low; no mnemonic structure beyond the containment itself is implied.

useful outward orderings rank components by their count of phonetically overlapping containing characters, with identical-pinyin cases weighted most heavily.

inward

inward scoring measures how well a character can be explained by the components inside it. components are considered recursively: both direct components and the components of those components contribute to the inward score. a character receives a strong inward score when one or more of its components, at any depth, have pinyin that overlaps with the character's own pinyin. the strongest case is an internal component with identical pinyin, since it gives a direct visual link from the target character back to a known component.

inward links are recall-oriented. they estimate whether the internal structure of a character contains enough phonetic evidence to cue its pronunciation. ordered pinyin overlap, similar phonetic units, tone agreement, and useful tone contrast all contribute according to their strength.

characters with direct visual links are especially tractable in this direction, because they can be memorized through a visible internal anchor rather than through an external mnemonic.

recall decision tree

shared component
  same pinyin -> hit
  ordered pinyin overlap -> cue
  similar phonetic unit -> cue
  same tone -> weak cue
  opposite tone (2/4) -> contrast cue
uniqueness (unique pinyin-tone combination) -> hit
same pinyin -> hit
ordered pinyin overlap -> cue
similar phonetic unit -> cue
same tone -> weak cue
opposite tone (2/4) -> contrast cue
visual similarity -> rough cue
same field -> weak cue
opposites -> weak cue
rarity (<3 contexts) -> ad-hoc mnemonic