# basics for learning chinese # phonology * vowels: aoeiuü * syllables consist of either an initial sound (声母), or a final sound (韵母), or both, and a tone * initials: b c ch d f g h j k l m n p q r s sh t x z zh * finals: a ai an ang ao e ei en eng i ia ian iang iao ie in ing iong iu o ong ou u ü ua uai uan üan uang üe ueng ui un ün uo ## tones * can be conceptualized as pitch moving across five levels that divide the natural comfortable voice range * 4 tones and neutral * 1 flat 5 ā * 2 rising 2-5 á * 3 deep 2-1-3, 2-1 ǎ * 4 falling 5-2 à * 0 quickly/lightly no-pitch-change a * pitch changes start with the tonal vowel and end at the end of the word * word combining all tones: 三十九岁 sānshíjiǔ suì * exceptions not reflected in pinyin * third tones followed by a third tone become second tone. 33 -> 23, 333 -> 223 * 不 bu4 is second tone when followed by a fourth tone. 44 -> 24 * 一 yi1 is pronounced like 不 except it is pronounced as first tone when it stands alone. 13/12/11 -> 43/42/41, 14 -> 24 ## sounds * [sound examples](https://yoyochinese.com/chinese-learning-tools/Mandarin-Chinese-pronunciation-lesson/pinyin-chart-table) for all common initials, finals, and tone pairs * [pronunciation dictionary](https://forvo.com/languages/zh/) * [distinguish "j q x zh ch sh r z c s"](https://www.youtube.com/watch?v=9WwfI_4v34Q&list=PLwFUKjRMEUxzyydLE5ll_hFArPDWOSxUy) * [five sounds of e](https://www.youtube.com/watch?v=gtLNOAIdzsY) * pinyin to [ipa](https://www.ipachart.com/): a a, ai aɪ, ao aʊ, b p, ch ʈʂʰ, c tsʰ, d t, e ɤ, ei eɪ, en ən, eng əŋ, er ɚ, f f, g k, h x, -ian jɛn, -ie jɛ, -i i, -i- j, j tɕ, k kʰ, l l, m m, -ng ŋ, n n, -ong oŋ, -ou oʊ, ou oʊ, p pʰ, q tɕʰ, r ʐ, sh ʂ, s s, t tʰ, -üan yɛn, -üe yɛ, -uo wɔ, -u u, -ü- ɥ, -u- w, -ü y, wo wɔ, wu u, w- w, x ɕ, yan jɛn, ye jɛ, yi i, y- j, yuan yɛn, yue jɛ, yu- ɥ, yu y, zh ʈʂ, z ts * ipa to pinyin: a a, er ɚ, aɪ ai, aʊ ao, ɕ x, eɪ ei, e ye, ə ən, əŋ eng, jɛn ian/yian/yuan/üan, yɛ ie/ye/yue/üe, i i/yi/i, j j/-i-, k g, kʰ k, l l, m m, n n, ŋ ng, oʊ ou, oŋ ong, o wo/ouo, ɔ wɔ/uo/ou, p p, pʰ p, s s, ʂ sh, t t, tʰ t, tɕ tɕ, tɕʰ q, ts c, ts z, ʈʂ zh, ʈʂʰ ch, u u, w w, ɥ yu/ü-, y y, ɔ o, j j # script * the [characters](https://en.wikipedia.org/wiki/Chinese_characters) are [logograms](https://en.wikipedia.org/wiki/Logogram) made of [strokes](https://en.wikipedia.org/wiki/Stroke_(CJK_character)) and [components](https://en.wikipedia.org/wiki/Chinese_character_components) in square areas * 简体字, simplified characters: officially used since 1950 in the peoples republic of china, malaysia and singapore. 98% of new chinese publications worldwide * 繁体字, traditional characters: in common use in hong kong, macau and taiwan, as well as in south korea and japan to a certain extent * components differing between simplified and traditional: 纟糹 讠訁 钅釒 贝貝 阝阜 马馬 页頁 车車 门門 饣飠 鸟鳥 见見 * 36 strokes in unicode: ㇀㇁㇂㇃㇄㇅㇆㇇㇈㇉㇊㇋㇌㇍㇎㇏㇐㇑㇒㇓㇔㇕㇖㇗㇘㇙㇚㇛㇜㇝㇞㇟㇠㇡㇢㇣ * [stroke order](https://en.wikipedia.org/wiki/Stroke_order) * [gb stroke-based character order](https://en.wikipedia.org/wiki/GB_stroke-based_order) standard for character sorting * [chinese character description languages](https://en.wikipedia.org/wiki/Chinese_character_description_languages) ## romanization: 汉语拼音, han4yu3 pin1yin1 * tones are marked with numbers * a number for the tone, between one to five, is placed after each syllable * easy to type * easy to read * examples: a1 a2 a3 a4 a bei3jing1 * for the neutral/fifth tone, the number can be left out. some use 0 or 5 * tones are alternatively marked with diacritics * more difficult to input on a computer * more difficult to read because they are small. especially on lower resolution displays, or with smaller fonts, or from a distance * introduces the complication of where to place the diacritic. if the first vowel is a medial (i, u or ü), the tone mark is on the vowel letter immediately following the medial. otherwise the tone mark is on the first vowel letter * suggests that the marked vowel is pronounced with a tone, whereas the whole syllable is pronounced using the tone * more complicated to search for because tone insensitive search requires checking all variations with diacritics * all characters with diacritics: āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜü * apostrophe is used to delimit ambiguous syllable combinations * input methods commonly use pinyin to type chinese characters. lv is used for lü * silent i: chi, ci, ri, shi, si, zhi, zi * written u but read ü: yuan juan quan xuan, yue jue que xue, yu ju qu xu, yun jun qun xun * same pronounciation but different writing as initial or final: wai uai, wang uang, wa ua, wei ui, wen un, wo uo, wu u, ya ia, yang iang, ye ie, yi i, yong iong, you iu, yuan uan, yue üe ue, yun ün un, yu ü u # grammar * [wikipedia: chinese grammar](https://en.wikipedia.org/wiki/Chinese_grammar) # vocabulary * [perapera](https://addons.mozilla.org/en-US/firefox/addon/perapera-chinese/) hover dictionary * [zdic](https://www.zdic.net/) * [vocabulary list generator](https://www.purpleculture.net/vocabulary-list-generator/) and other tools * [colors](https://yoyochinese.com/blog/guide-infographic-reference-learn-colors-mandarin-chinese) with audio examples ## data * [cc-cedict](https://www.mdbg.net/chinese/dictionary?page=cc-cedict) chinese to english dictionary database * [edict](https://github.com/skywind3000/ECDICT) english to chinese dictionary database * [hsk 3.0 word list](https://github.com/krmanik/HSK-3.0-words-list/tree/main) * [make-me-a-hanzi](https://github.com/skishore/makemeahanzi) data for drawing characters and character graphics * [hanyu](https://github.com/sph-mn/hanyu) source code for the dictionary mentioned above and other lists and data files * frequency lists * [frequency list 1](https://github.com/ernop/anki-chinese-word-frequency/blob/master/internet-zh.num) 50000 entries * [frequency list 2](https://github.com/thyrlian/namedict/blob/master/data/Modern%20Chinese%20Character%20Frequency%20List) 9933 entries * [frequency list 3](https://en.wiktionary.org/wiki/Appendix:Mandarin_Frequency_lists) 10000 entries ## word classes 名词 noun, 代词 pronoun, 动词 verb, 形容词 adjective, 副词 adverb, 数词 numeral, 量词 measure word, 连词 conjunction, 介词 preposition, 助词 particle, 叹词 interjection, 拟声词 onomatopoeia ## grammatical structures 主语 subject, 谓语 predicate, 宾语 object, 定语 attributive, 状语 adverbial, 补语 complement, 短语 phrase, 分句 clause, 句子 sentence # links ## 1 * [mandarin corner](https://www.youtube.com/@MandarinCorner2/videos) listening practice * [stories](https://www.youtube.com/watch?v=fwvouxo2Srw&list=PL7VdqFXO0LzfGbXAsnBqWcnIwMvAV1IWR) * [walk around](https://www.youtube.com/watch?v=b_d-Yf-Gzyw&list=PL7VdqFXO0LzeM0vkA5teOY9lEpSgZaQ3Z) * [etgushi](https://www.etgushi.com) reading practice * [chinese poetry on github](https://github.com/chinese-poetry/chinese-poetry) * [chinese text project](https://ctext.org/ens) pre-modern chinese texts * beginner * [slow & clear chinese channel](https://www.youtube.com/channel/UCdwdSGQsSbcapDmODtOr58g/videos) * [learn chinese with yi zhao](https://www.youtube.com/@YiZhaoHomemadeChinese/playlists) * [pronunciation guide - vowels](https://www.youtube.com/watch?v=-5x7SwWiZlE) * [pronunciation guide - tones](https://www.youtube.com/watch?v=SqI3BCMIhJc) * [duolingo](https://www.duolingo.com/course/zh/en/Learn-Chinese) ## 2 * [yoyo chinese](https://www.youtube.com/c/YangyangCheng/videos) * [kids yay](https://www.youtube.com/channel/UCgD1_blj1w2ax1_s7IdgNsA/playlists) * [children's stories and songs](https://www.youtube.com/watch?v=EReU1BKtAXo&list=PLZ27m2K2W5n5aR3sYflBT91Ujil1I8faP) * [difference between simplified and traditional chinese in unicode](https://r12a.github.io/scripts/chinese/) * [color-code characters by tone](https://www.purpleculture.net/color-code-chinese-by-tone/) * [list of internet resources](https://ling-lingchinese.com/internet-resources/) * [cjk unified ideographs](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) relevant unicode blocks * programming * [pinyin-utils](https://github.com/pepebecker/pinyin-utils) * [hanzi-tools](https://github.com/peterolson/hanzi-tools) # other * here is a formula for estimating reading difficulty: ``max(1, 10 * (unique_chars_length / all_possible_chars_length + median(last_10(unique_chars_frequency_indices)) / all_possible_chars_length))`` * here is a javascript regular expression for matching all chinese characters: ``/[\u{30A0}-\u{30FF}\u{2E80}-\u{2EFF}\u{31C0}-\u{31EF}\u{4E00}-\u{9FFF}\u{3400}-\u{4DBF}\u{20000}-\u{2A6DF}\u{2A700}-\u{2B73F}\u{2B740}-\u{2B81F}\u{2B820}-\u{2CEAF}\u{2CEB0}-\u{2EBEF}\u{30000}-\u{3134F}\u{31350}-\u{323AF}\u{2EBF0}-\u{2EE5F}]/gu`` # numerals * digits: 零 ling2 0, 一 yi1 1, 二 er4 2, 三 san1 3, 四 si4 4, 五 wu3 5, 六 liu4 6, 七 qi1 7, 八 ba1 8, 九 jiu3 9 * patterns: * _ 五 wu3 5 * 十 十 shi2 10 * 十_ 十四 shi2si4 14 * _十 三十 san1shi2 30 * _十_ 五十七 wu3shi2qi1 57 * _百 三百 san1bai3 300 * _百零_ 二百零二 er4bai3ling2er4 202 * _百_十 一百二十 yi1bai3er4shi2 120 * _百_十_ 一百三十五 yi1bai3san1shi2wu3 135 * _千 一千 yi1qian1 1000 * _千零_ 一千零一 yi1qian1ling2yi1 1001 * _千_百 一千二百 yi1qian1er4bai3 1200 * _千_百_十 一千二百三十 yi1qian1er4bai3san1shi2 1230 * _千_百_十_ 一千二百三十五 yi1qian1er4bai3san1shi2wu3 1235 * _万 一万 yi1wan4 10000 * _万零_ 一万零一 yi1wan4ling2yi1 10001 * _万零_十 一万零一十 yi1wan4ling2yi1shi2 10010 * _万_千_百 一万二千三百 yi1wan4er4qian1san1bai3 12300 * _万_千_百_十_ 一万二千三百四十五 yi1wan4er4qian1san1bai3si4shi2wu3 12345 * 十万 十万 shi2wan4 100000 * 十万零_ 十万零一 shi2wan4ling2yi1 100001 * 十万零_十 十万零一十 shi2wan4ling2yi1shi2 100010 * 十万_千 十万一千 shi2wan4yi1qian1 110000 * 十万_千_百_十_ 十二万三千四百五十六 shi2er4wan4san1qian1si4bai3wu3shi2liu4 123456 * months: 一月 yi1yue4 january * dates: 2021年10月30日 2021 nian2 10 yue4 30 ri4 * daytime: 一点二分 yi1dian3er4fen1 1:02 * ordinals: 第一 di4yi1 1st * fractions: 三分之二 san3fen1zhi1er4 2/3 * percentages: 百分之二十五 bai3fen1zhi1er4shi2wu3 25% * decimals: 十六点九八 shi2liu4dian3jiu3ba1 16.98 * negative numbers: 負一 fu4yi1 -1 ## pronounced zeros * within a number: a single zero is read between non-zero digits. 205 is e4ling2wu3. * consecutive zeros: only one zero is pronounced, even if multiple zeros appear consecutively. 1004 is yi1ling2si4 * ending zeros: zeros at the end of a number are not pronounced. 1200 is yi1qian1er4bai3 * beginning zeros: a zero at the start of a number is typically omitted. 012 is shi2er4