2025-08-02

basics for learning chinese

phonology

  • vowels: aoeiuü
  • syllables consist of either an initial sound (声母), or a final sound (韵母), or both, and a tone

    • initials: b c ch d f g h j k l m n p q r s sh t x z zh

    • finals: a ai an ang ao e ei en eng i ia ian iang iao ie in ing iong iu o ong ou u ü ua uai uan üan uang üe ueng ui un ün uo

tones

  • can be conceptualized as pitch moving across five levels that divide the natural comfortable voice range
  • 4 tones and neutral

    • 1 flat 5 ā
    • 2 rising 2-5 á
    • 3 deep 2-1-3, 2-1 ǎ
    • 4 falling 5-2 à
    • 0 quickly/lightly no-pitch-change a
  • pitch changes start with the tonal vowel and end at the end of the word
  • word combining all tones: 三十九岁 sānshíjiǔ suì
  • exceptions not reflected in pinyin

    • third tones followed by another third tone are pronounced as second tone; for example, 33 -> 23. longer sequences, such as 333, may become 233 or 223, depending on phrasing

      • 很好看 223, 老板娘 233, 你很努力 323_
    • 不 bu4 is second tone when followed by a fourth tone. 44 -> 24

    • 一 yi1 is pronounced like 不 except it is pronounced as first tone when it stands alone. 13/12/11 -> 43/42/41, 14 -> 24

sounds

  • sound examples for all common initials, finals, and tone pairs
  • pronunciation dictionary
  • distinguish "j q x zh ch sh r z c s"
  • five sounds of e
  • pinyin to ipa: a a, ai aɪ, ao aʊ, b p, ch ʈʂʰ, c tsʰ, d t, e ɤ, ei eɪ, en ən, eng əŋ, er ɚ, f f, g k, h x, -ian jɛn, -ie jɛ, -i i, -i- j, j tɕ, k kʰ, l l, m m, -ng ŋ, n n, -ong oŋ, -ou oʊ, ou oʊ, p pʰ, q tɕʰ, r ʐ, sh ʂ, s s, t tʰ, -üan yɛn, -üe yɛ, -uo wɔ, -u u, -ü- ɥ, -u- w, -ü y, wo wɔ, wu u, w- w, x ɕ, yan jɛn, ye jɛ, yi i, y- j, yuan yɛn, yue jɛ, yu- ɥ, yu y, zh ʈʂ, z ts
  • ipa to pinyin: a a, er ɚ, aɪ ai, aʊ ao, ɕ x, eɪ ei, e ye, ə ən, əŋ eng, jɛn ian/yian/yuan/üan, yɛ ie/ye/yue/üe, i i/yi/i, j j/-i-, k g, kʰ k, l l, m m, n n, ŋ ng, oʊ ou, oŋ ong, o wo/ouo, ɔ wɔ/uo/ou, p p, pʰ p, s s, ʂ sh, t t, tʰ t, tɕ tɕ, tɕʰ q, ts c, ts z, ʈʂ zh, ʈʂʰ ch, u u, w w, ɥ yu/ü-, y y, ɔ o, j j
  • pronunciation guide - vowels
  • pronunciation guide - tones

script

  • the characters are logograms made of strokes and components in square areas
  • 简体字, simplified characters: officially used since 1950 in the peoples republic of china, malaysia and singapore. 98% of new chinese publications worldwide
  • 繁体字, traditional characters: in common use in hong kong, macau and taiwan, as well as in south korea and japan to a certain extent

    • components differing between simplified and traditional: 纟糹 讠訁 钅釒 贝貝 阝阜 马馬 页頁 车車 门門 饣飠 鸟鳥 见見
  • 36 strokes in unicode: ㇀㇁㇂㇃㇄㇅㇆㇇㇈㇉㇊㇋㇌㇍㇎㇏㇐㇑㇒㇓㇔㇕㇖㇗㇘㇙㇚㇛㇜㇝㇞㇟㇠㇡㇢㇣
  • stroke order
  • gb stroke-based character order standard for character sorting
  • chinese character description languages

character memoization recall decision tree

shared component
  same pinyin -> hit
  same syllable
    same tone -> hit
    different tone -> contrast cue
  same syllable prefix -> cue
  same syllable suffix -> cue
  similar phonetic prefix (c/z, s/sh, j/q, zh/ch, k/g, t/d) -> cue
  same tone -> weak cue
  opposite tone (2/4) -> strong contrast cue (if same syllable)
uniqueness (unique pinyin-tone combination) -> hit
same pinyin -> hit
same syllable
  same syllable prefix -> cue
  same syllable suffix -> cue
  same tone -> weak cue
similar phonetic prefix -> cue
same tone -> weak cue
opposite tone (2/4) -> contrast cue (if tone contrast is known)
visual similarity -> rough cue
same field -> weak cue
opposites -> weak cue
rarity (<3 contexts) -> ad-hoc mnemonic
  • focus priority: shared component > syllable > tone > uniqueness > similar phonetic prefix > suffix/prefix > opposite tone
  • disambiguation: always look up confused characters
  • casually practice the tone series when characters are grouped by component
  • studying the set of pinyin readings with only one character (even if just in the top4000) can be useful

romanization: 汉语拼音, han4yu3 pin1yin1

  • tones are marked with numbers

    • a number for the tone, between one to five, is placed after each syllable
    • easy to type
    • easy to read
    • examples: a1 a2 a3 a4 a bei3jing1
    • for the neutral/fifth tone, the number can be left out. some use 0 or 5
  • tones are alternatively marked with diacritics

    • more difficult to input on a computer
    • more difficult to read because they are small. especially on lower resolution displays, or with smaller fonts, or from a distance
    • introduces the complication of where to place the diacritic. if the first vowel is a medial (i, u or ü), the tone mark is on the vowel letter immediately following the medial. otherwise the tone mark is on the first vowel letter
    • suggests that the marked vowel is pronounced with a tone, whereas the whole syllable is pronounced using the tone
    • more complicated to search for because tone insensitive search requires checking all variations with diacritics
    • all characters with diacritics: āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜü
  • apostrophe is used to delimit ambiguous syllable combinations
  • input methods commonly use pinyin to type chinese characters. lv is used for lü
  • silent i: chi, ci, ri, shi, si, zhi, zi
  • written u but read ü: yuan juan quan xuan, yue jue que xue, yu ju qu xu, yun jun qun xun
  • same pronounciation but different writing as initial or final: wai uai, wang uang, wa ua, wei ui, wen un, wo uo, wu u, ya ia, yang iang, ye ie, yi i, yong iong, you iu, yuan uan, yue üe ue, yun ün un, yu ü u

grammar

vocabulary

data

word classes

名词 noun, 代词 pronoun, 动词 verb, 形容词 adjective, 副词 adverb, 数词 numeral, 量词 measure word, 连词 conjunction, 介词 preposition, 助词 particle, 叹词 interjection, 拟声词 onomatopoeia

grammatical structures

主语 subject, 谓语 predicate, 宾语 object, 定语 attributive, 状语 adverbial, 补语 complement, 短语 phrase, 分句 clause, 句子 sentence

links

1

2

numerals

  • digits: 零 ling2 0, 一 yi1 1, 二 er4 2, 三 san1 3, 四 si4 4, 五 wu3 5, 六 liu4 6, 七 qi1 7, 八 ba1 8, 九 jiu3 9
  • months: 一月 yi1yue4 january
  • dates: 2021年10月30日 2021 nian2 10 yue4 30 ri4
  • daytime: 一点二分 yi1dian3er4fen1 1:02
  • ordinals: 第一 di4yi1 1st
  • fractions: 三分之二 san3fen1zhi1er4 2/3
  • percentages: 百分之二十五 bai3fen1zhi1er4shi2wu3 25%
  • decimals: 十六点九八 shi2liu4dian3jiu3ba1 16.98
  • negative numbers: 負一 fu4yi1 -1

pronounced zeros

  • within a number: a single zero is read between non-zero digits. 205 is e4ling2wu3.
  • consecutive zeros: only one zero is pronounced, even if multiple zeros appear consecutively. 1004 is yi1ling2si4
  • ending zeros: zeros at the end of a number are not pronounced. 1200 is yi1qian1er4bai3
  • beginning zeros: a zero at the start of a number is typically omitted. 012 is shi2er4

patterns

_ 五 wu3 5
十 十 shi2 10
十_ 十四 shi2si4 14
_十 三十 san1shi2 30
_十_ 五十七 wu3shi2qi1 57
_百 三百 san1bai3 300
_百零_ 二百零二 er4bai3ling2er4 202
_百_十 一百二十 yi1bai3er4shi2 120
_百_十_ 一百三十五 yi1bai3san1shi2wu3 135
_千 一千 yi1qian1 1000
_千零_ 一千零一 yi1qian1ling2yi1 1001
_千_百 一千二百 yi1qian1er4bai3 1200
_千_百_十 一千二百三十 yi1qian1er4bai3san1shi2 1230
_千_百_十_ 一千二百三十五 yi1qian1er4bai3san1shi2wu3 1235
_万 一万 yi1wan4 10000
_万零_ 一万零一 yi1wan4ling2yi1 10001
_万零_十 一万零一十 yi1wan4ling2yi1shi2 10010
_万_千_百 一万二千三百 yi1wan4er4qian1san1bai3 12300
_万_千_百_十_ 一万二千三百四十五 yi1wan4er4qian1san1bai3si4shi2wu3 12345
十万 十万 shi2wan4 100000
十万零_ 十万零一 shi2wan4ling2yi1 100001
十万零_十 十万零一十 shi2wan4ling2yi1shi2 100010
十万_千 十万一千 shi2wan4yi1qian1 110000
十万_千_百_十_ 十二万三千四百五十六 shi2er4wan4san1qian1si4bai3wu3shi2liu4 123456

other

  • here is a formula for estimating reading difficulty: max(1, 10 * (unique_chars_length / all_possible_chars_length + median(last_10(unique_chars_frequency_indices)) / all_possible_chars_length))
  • here is a javascript regular expression for matching all chinese characters: /[\u{30A0}-\u{30FF}\u{2E80}-\u{2EFF}\u{31C0}-\u{31EF}\u{4E00}-\u{9FFF}\u{3400}-\u{4DBF}\u{20000}-\u{2A6DF}\u{2A700}-\u{2B73F}\u{2B740}-\u{2B81F}\u{2B820}-\u{2CEAF}\u{2CEB0}-\u{2EBEF}\u{30000}-\u{3134F}\u{31350}-\u{323AF}\u{2EBF0}-\u{2EE5F}]/gu