
learning chinese



  • vowels: aoeiuü
  • syllables consist of either an initial sound (声母), or a final sound (韵母), or both, and a tone

    • initials: b c ch d f g h j k l m n p q r s sh t x z zh

    • finals: a ai an ang ao e ei en eng i ia ian iang iao ie in ing iong iu o ong ou u ü ua uai uan üan uang üe ueng ui un ün uo


  • can be conceptualized as pitch moving across five levels that divide the natural comfortable voice range
  • 4 tones and neutral

    • 1 flat 5 ā
    • 2 rising 2-5 á
    • 3 deep 2-1-3, 2-1 ǎ
    • 4 falling 5-2 à
    • 0 quickly/lightly no-pitch-change a
  • pitch changes start with the tonal vowel and end at the end of the word
  • word combining all tones: 三十九岁 sānshíjiǔ suì
  • exceptions not reflected in pinyin

    • third tones followed by a third tone become second tone. 33 -> 23, 333 -> 223

    • 不 bu4 is second tone when followed by a fourth tone. 44 -> 24

    • 一 yi1 is pronounced like 不 except it is pronounced as first tone when it stands alone. 13/12/11 -> 43/42/41, 14 -> 24


  • sound examples for all common initials, finals, and tone pairs
  • pronunciation dictionary
  • distinguish "j q x zh ch sh r z c s"
  • five sounds of e
  • pinyin to ipa: a a, ai aɪ, ao aʊ, b p, ch ʈʂʰ, c tsʰ, d t, e ɤ, ei eɪ, en ən, eng əŋ, er ɚ, f f, g k, h x, -ian jɛn, -ie jɛ, -i i, -i- j, j tɕ, k kʰ, l l, m m, -ng ŋ, n n, -ong oŋ, -ou oʊ, ou oʊ, p pʰ, q tɕʰ, r ʐ, sh ʂ, s s, t tʰ, -üan yɛn, -üe yɛ, -uo wɔ, -u u, -ü- ɥ, -u- w, -ü y, wo wɔ, wu u, w- w, x ɕ, yan jɛn, ye jɛ, yi i, y- j, yuan yɛn, yue jɛ, yu- ɥ, yu y, zh ʈʂ, z ts
  • ipa to pinyin: a a, er ɚ, aɪ ai, aʊ ao, ɕ x, eɪ ei, e ye, ə ən, əŋ eng, jɛn ian/yian/yuan/üan, yɛ ie/ye/yue/üe, i i/yi/i, j j/-i-, k g, kʰ k, l l, m m, n n, ŋ ng, oʊ ou, oŋ ong, o wo/ouo, ɔ wɔ/uo/ou, p p, pʰ p, s s, ʂ sh, t t, tʰ t, tɕ tɕ, tɕʰ q, ts c, ts z, ʈʂ zh, ʈʂʰ ch, u u, w w, ɥ yu/ü-, y y, ɔ o, j j


  • the characters are logograms made of strokes and components in square areas
  • 简体字, simplified characters: officially used since 1950 in the peoples republic of china, malaysia and singapore. 98% of new chinese publications worldwide
  • 繁体字, traditional characters: in common use in hong kong, macau and taiwan, as well as in south korea and japan to a certain extent

    • components differing between simplified and traditional: 纟糹 讠訁 钅釒 贝貝 阝阜 马馬 页頁 车車 门門 饣飠 鸟鳥 见見
  • 36 strokes in unicode: ㇀㇁㇂㇃㇄㇅㇆㇇㇈㇉㇊㇋㇌㇍㇎㇏㇐㇑㇒㇓㇔㇕㇖㇗㇘㇙㇚㇛㇜㇝㇞㇟㇠㇡㇢㇣
  • stroke order
  • gb stroke-based character order standard for character sorting
  • chinese character description languages
  • visually similar characters: 干千于 冉再 车专 部都 常堂 索素 休体 复夏 筒简 肯背 实买 船般 伦论 严产 广厂 幸辛亲 善喜 穴六 衣长以 似依 或咸 录隶 凶区冈 杯坏环 宜宣 习匀 官宫 束吏 卯卬叩 良艮 亮壳 夭天禾夫大牛午失矢未来本米夹朱

romanization: 汉语拼音, han4yu3 pin1yin1

  • tones are marked with numbers

    • a number for the tone, between one to five, is placed after each syllable
    • easy to type
    • easy to read
    • examples: a1 a2 a3 a4 a bei3jing1
    • for the neutral/fifth tone, the number can be left out. some use 0 or 5
  • tones are alternatively marked with diacritics

    • more difficult to input on a computer
    • more difficult to read because they are small. especially on lower resolution displays, or with smaller fonts, or from a distance
    • introduces the complication of where to place the diacritic. if the first vowel is a medial (i, u or ü), the tone mark is on the vowel letter immediately following the medial. otherwise the tone mark is on the first vowel letter
    • suggests that the marked vowel is pronounced with a tone, whereas the whole syllable is pronounced using the tone
    • more complicated to search for because tone insensitive search requires checking all variations with diacritics
    • all characters with diacritics: āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜü
  • apostrophe is used to delimit ambiguous syllable combinations
  • input methods commonly use pinyin to type chinese characters. lv is used for lü
  • silent i: chi, ci, ri, shi, si, zhi, zi
  • written u but read ü: yuan juan quan xuan, yue jue que xue, yu ju qu xu, yun jun qun xun
  • same pronounciation but different writing as initial or final: wai uai, wang uang, wa ua, wei ui, wen un, wo uo, wu u, ya ia, yang iang, ye ie, yi i, yong iong, you iu, yuan uan, yue üe ue, yun ün un, yu ü u




word classes

名词 noun, 代词 pronoun, 动词 verb, 形容词 adjective, 副词 adverb, 数词 numeral, 量词 measure word, 连词 conjunction, 介词 preposition, 助词 particle, 叹词 interjection, 拟声词 onomatopoeia

grammatical structures

主语 subject, 谓语 predicate, 宾语 object, 定语 attributive, 状语 adverbial, 补语 complement, 短语 phrase, 分句 clause, 句子 sentence





  • here is a formula for estimating reading difficulty: max(1, 10 * (unique_chars_length / all_possible_chars_length + median(last_10(unique_chars_frequency_indices)) / all_possible_chars_length))
  • here is a javascript regular expression for matching all chinese characters: /[\u{30A0}-\u{30FF}\u{2E80}-\u{2EFF}\u{31C0}-\u{31EF}\u{4E00}-\u{9FFF}\u{3400}-\u{4DBF}\u{20000}-\u{2A6DF}\u{2A700}-\u{2B73F}\u{2B740}-\u{2B81F}\u{2B820}-\u{2CEAF}\u{2CEB0}-\u{2EBEF}\u{30000}-\u{3134F}\u{31350}-\u{323AF}\u{2EBF0}-\u{2EE5F}]/gu


  • digits: 零 ling2 0, 一 yi1 1, 二 er4 2, 三 san1 3, 四 si4 4, 五 wu3 5, 六 liu4 6, 七 qi1 7, 八 ba1 8, 九 jiu3 9
  • patterns:

    • _ 五 wu3 5
    • 十 十 shi2 10
    • 十_ 十四 shi2si4 14
    • _十 三十 san1shi2 30
    • 五十七 wu3shi2qi1 57
    • _百 三百 san1bai3 300
    • 百零 二百零二 er4bai3ling2er4 202
    • _百_十 一百二十 yi1bai3er4shi2 120
    • 百_十 一百三十五 yi1bai3san1shi2wu3 135
    • _千 一千 yi1qian1 1000
    • 千零 一千零一 yi1qian1ling2yi1 1001
    • _千_百 一千二百 yi1qian1er4bai3 1200
    • _千_百_十 一千二百三十 yi1qian1er4bai3san1shi2 1230
    • 千_百_十 一千二百三十五 yi1qian1er4bai3san1shi2wu3 1235
    • _万 一万 yi1wan4 10000
    • 万零 一万零一 yi1wan4ling2yi1 10001
    • _万零_十 一万零一十 yi1wan4ling2yi1shi2 10010
    • _万_千_百 一万二千三百 yi1wan4er4qian1san1bai3 12300
    • 万_千_百_十 一万二千三百四十五 yi1wan4er4qian1san1bai3si4shi2wu3 12345
    • 十万 十万 shi2wan4 100000
    • 十万零_ 十万零一 shi2wan4ling2yi1 100001
    • 十万零_十 十万零一十 shi2wan4ling2yi1shi2 100010
    • 十万_千 十万一千 shi2wan4yi1qian1 110000
    • 十万_千_百_十_ 十二万三千四百五十六 shi2er4wan4san1qian1si4bai3wu3shi2liu4 123456
  • months: 一月 yi1yue4 january
  • dates: 2021年10月30日 2021 nian2 10 yue4 30 ri4
  • daytime: 一点二分 yi1dian3er4fen1 1:02
  • ordinals: 第一 di4yi1 1st
  • fractions: 三分之二 san3fen1zhi1er4 2/3
  • percentages: 百分之二十五 bai3fen1zhi1er4shi2wu3 25%
  • decimals: 十六点九八 shi2liu4dian3jiu3ba1 16.98
  • negative numbers: 負一 fu4yi1 -1

pronounced zeros

  • within a number: a single zero is read between non-zero digits. 205 is e4ling2wu3.
  • consecutive zeros: only one zero is pronounced, even if multiple zeros appear consecutively. 1004 is yi1ling2si4
  • ending zeros: zeros at the end of a number are not pronounced. 1200 is yi1qian1er4bai3
  • beginning zeros: a zero at the start of a number is typically omitted. 012 is shi2er4

the 410 common syllables

a ai an ang ao ba bai ban bang bao bei ben beng bi bian biang biao bie bin bing bo bu ca cai can cang cao ce cei cen ceng cha chai chan chang chao che chen cheng chi chong chou chu chua chuai chuan chuang chui chun chuo ci cong cou cu cuan cui cun cuo da dai dan dang dao de dei den deng di dian diao die ding diu dong dou du duan dui dun duo e ei en eng er fa fan fang fei fen feng fo fou fu ga gai gan gang gao ge gei gen geng gong gou gu gua guai guan guang gui gun guo ha hai han hang hao he hei hen heng hong hou hu hua huai huan huang hui hun huo ji jia jian jiang jiao jie jin jing jiong jiu ju juan jue jun ka kai kan kang kao ke kei ken keng kong kou ku kua kuai kuan kuang kui kun kuo la lai lan lang lao le lei leng li lia lian liang liao lie lin ling liu lo long lou lu luan lun luo lü lüe ma mai man mang mao me mei men meng mi mian miao mie min ming miu mo mou mu na nai nan nang nao ne nei nen neng ni nian niang niao nie nin ning niu nong nou nu nuan nuo nü nüe o ou pa pai pan pang pao pei pen peng pi pian piao pie pin ping po pou pu qi qia qian qiang qiao qie qin qing qiong qiu qu quan que qun ran rang rao re ren reng ri rong rou ru rua ruan rui run ruo sa sai san sang sao se sen seng sha shai shan shang shao she shei shen sheng shi shou shu shua shuai shuan shuang shui shun shuo si song sou su suan sui sun suo ta tai tan tang tao te teng ti tian tiao tie ting tong tou tu tuan tui tun tuo wa wai wan wang wei wen weng wo wu xi xia xian xiang xiao xie xin xing xiong xiu xu xuan xue xun ya yan yang yao ye yi yin ying yong you yu yuan yue yun za zai zan zang zao ze zei zen zeng zha zhai zhan zhang zhao zhe zhei zhen zheng zhi zhong zhou zhu zhua zhuai zhuan zhuang zhui zhun zhuo zi zong zou zu zuan zui zun zuo