Vocabulary is another “sub-skill”. This post is about how vocabulary fits into Synergy. How much vocabulary do you need? Language consists of words and grammar, or at least that’s one way to think of it. While grammar can be challenging and take some time to learn, it takes much more time to learn vocabulary. That’s because there’s a lot of it. I find this post about vocabulary made by Alexander Arguelles very useful for addressing this. I have heard the 20,000 passive word figure for C2 from several sources. I used that number and cut it in half for each step on the CEFR scale, and I assumed active vocabulary is about 50% of passive to come up with the following very rough table: A1: 300 active 600 passive A2: 600 active 1,200 passive B1: 1,200 active 2,500 passive B2: 2,500 active 5,000 passive C1: 5,000 active 10,000 passive C2: 10,000 active 20,000 passive I’m not saying you should keep track of the number of words you know, and stop when you reach the appropriate number. I just want to make the point that there is a lot of vocabulary, and therefore, a lot of work involved in learning a language to a high level due to vocabulary. On the other hand, there’s nothing wrong with having a rough idea about your word count. Some people can find this within the statistics of their SRS’s. If you aren’t one of those, you might be able to take a short test online designed to measure your vocabulary. Try to get by without isolated vocabulary study. Now that you know the size of the task in front of you, you can imagine the amount of reviewing it would take to learn thousands of words in lists, flashcards or SRS’s. But there’s good news. Many people don’t require this type of study. Most of the really good YouTube polyglots, for example, don’t do it. Imo, having a balanced plan like Synergy, where you are attacking the language from all sides, decreases the need for isolated study. This is because you are constantly reinforcing old vocabulary in many ways and improving your chances to remember it. Another thing that helps you remember vocabulary – learn only one language at a time. This supports what I suggested in the How to learn many languages to a high level post. When all the new material in your mind is from the same language, it’s much easier to remember words. For example “Hmm…dog. It’s that word that starts with a p…” This type of grasping for straws that my mind does works better if there is only one language to choose from. If you aren’t sure whether or not you need isolated vocabulary study, I suggest trying without and seeing how it goes. This will save you a lot of time, which you can use to do more of your other studies. My personal experiences with isolated vocabulary study. Unfortunately, as I mentioned in the post about writing, I am one of the many people who really need to memorize and review words in isolation. I’ve made many mistakes and discoveries over the years, and I believe I’m still evolving. Let me explain. I don’t remember doing any vocabulary study in Spanish, although I’m sure I must have done some when I took it in middle school and high school. The first time I remember doing it was while studying Swahili in Tanzania. With Swahili, I actually memorized a small phrasebook before going to Africa. Grueling, but effective, to a point. After I got there, I started learning the language in earnest. I was with a group of about 50 people. In the beginning, I knew more than anybody. I quickly dropped to below average because I wasn’t studying much. I had a final exam coming up, which was a conversation with a native. The week before, I was sent away from the big city I was in to stay in my future site. Rather than getting a feel for my site, I stayed in and studied vocabulary most of my free time. I made lists, and for the first time in my life, figured out a systematic way to memorize them. When I returned to the big city and too the exam, I did much better than expected. The interview wasn’t too bad, and I finished somewhere near the top. That was 3 months into my stay in Tanzania, and I stayed for a total of 3 years. I don’t want to give the false impression that I was one of the best speakers after 3 years; that was certainly not the case. I just wanted to illustrate that memorizing vocabulary really helped me at that time. The next vocabulary study I did was for Thai. The text book had word lists written for me, so I memorized them in much the same way as before. When I talked with tutors, I’d write the words, in transliteration, in a notebook, and memorize them. Then I got to Japanese. The Japanese phonetic script, kana, unlike Thai, is very simple. I had no excuse for avoiding it. It was my first foreign script, and even though it’s one of the easiest ones in the world, it was really hard for me. You have to understand that at this time I wasn’t reading. Up to this point the words that I saw used familiar text. Swahili uses the same alphabet as English, and the Thai transliteration I used was mostly from English letters. So there was a strong visual link with my vocabulary, which really helped me remember it. With kana, there was no link. It was so unfamiliar to me that I didn’t visualize how a word was spelled. There was no quick connection in my mind. This made it very difficult to memorize vocabulary, so I started using mnemonics. Mnemonics are memory tricks. I could remember a few of the words without any help, but for all others I used memory tricks. The ones that worked best for me used a “sound alike” component. For example, the Japanese word for rock is pronounced ishi (いしin kana). Is she really going out with him is a famous rock song. This is only one example of the many possibilities. They might seem far fetched, but they work really well. They disappear from your memory when you get comfortable with the word, after they have served their purpose. This method isn’t nearly as fast as having a strong visual connection, but with practice, it’s sufficient. Eventually I got away from transliteration in all my languages, and relied more heavily on mnemonics. The more reading I do, the more comfortable I get with the scripts, and the stronger the visual aid is in helping me remember the word. So the more I read, the less I have to rely on mnemonics. I avoid them when they aren’t needed, but don’t hesitate to use them when they can be of help. I liked word lists because they proved themselves to be useful. But I needed to review them on a regular basis to keep from forgetting the words. As I got more and more lists, the reviews got longer and longer. I remember reviewing lists for over 3 hours once. I got sick of it, and started doing random reviews if any. I wasn’t happy with the results. About the time I started learning Mandarin, I learned about SRS’s. This seemed to be the answer to my prayers. I could review all my words in a logical fashion in the minimal amount of time. It was great for the first few months, slowly loading more and more words into it. Then the review sessions got over an hour long, and I liked it less. About 2 years into using my SRS, I had something like 20,000 entries, and 3 hours of review. Now, these weren’t all unique words, and there were 3 languages involved, but it was still too much. I had a minor melt down, and deleted the program from my computer. At first I was panicky, because I hadn’t missed a review session for 2 years, and I always considered my SRS to be priority one. But after several days, I felt a lot better. I then realized that I’d done the right thing. I still use SRS’s, particularly for the beginning stages of a languages and for grammar. But if a set of vocabulary gets out of hand, I delete it. Nothing is more liberating than cleaning out an SRS. I have improved significantly in my vocabulary learning over the years. For example, I used to need to learn words in list form for a few days before putting them in the SRS, otherwise it would take forever for new words to pass. Now I don’t need to do that as much. Sometimes I memorize a list of words one time, then dump it into my SRS, sometimes I put words in without memorizing. Words seem to stick better in general. I think this is because I’m listening and reading more. I’m hoping that the increase in writing, the full implementation of Synergy, will allow me to reduce isolated word study, or even skip it completely. Even though I suggested trying to get by without isolated vocabulary studies, you may need it, especially in Steps 2 and 3 of Synergy. That being said, you don’t want to overload your wordlists, etc. Here is a ranking of the most important words to learn, in case you have to pick and choose: 1. conversation, writing, Pimsleur 2. grammars, textbooks, audio programs 3. everything else
In line with our discussions of vocabulary in the other thread I started, I thought I would comment on this scale. I obviously agree with your statement that you need a lot of vocabulary. Regarding how to specify it by CEFR level, it obviously is subjective. If you look at the study I linked to in the other thread, Kusseling & Decoo (2007): Europe and language learning: The challenges of comparable assessment, you can scroll down to table 6 on p. 7 for a comparison of the figures for various studies. All of them are lowball except for Instituto Cervantes, which is inflated as noted, and Nation (2006)/Schmitt (2008), which give a 15,000 figure for C2. On HTLAL emk did a great analysis of the prime lowball study mentioned by many, i.e. Meara & Milton (2003), in which he demonstrated that they capped their analysis at 5000 words and only studied what percentage of the most frequent 5000 words test subjects knew in various level tests. That 15K figure given by Nation is also given by Laufer's study on the lexcial threshold. I would suggest that it still is a little low, and prefer to err on the side of being conservative, so I accept your 20,000 figure for C2. The question then is how to divide this by level. I think it is overly simplistic to merely divide the top number down by half to the lowest level. Rather, going from the stated competencies for each level, as well as various word lists I have seen for different CEFR level commercial courses, I think the big jumps of doubling up occur starting lower, and then diminish at the top levels. Speaking only as to passive vocabulary, I would propose the following scale: A1: 1250 passive A2: 2500 passive B1: 5000 passive B2: 10,000 passive C1: 15,000 passive C2: 20,000 passive I suspect that if I did a survey of courses created by commercial L2 language companies, as opposed to L1 companies (i.e. Pimsleur, Assimil, etc.), that a vocabulary count would come close to the above by level. If it is too conservative, then sliding the figures up one level puts 15,000 at C2 (and they might be too conservative or too liberal for a given language, since this is based on English). The figures above also would be in line with the assessments of *experienced* language learners, which dismiss the claims of Assimil and other course authors as to their CEFR level as being far too generous. Also pegging B2 at 10,000, which matches the passive vocabulary of non-college educated speakers of English as an L1, would also seem right in that the description of B2 to me more closely matches a competent high school graduate who did not go on to college. The fact that the C levels have a more academic rhetorical focus also argues for the difference between B2 and C1/C2 being more about other competencies than vocabulary, and the vocabulary differences being mainly of an academic/scientific/upper register nature. Again going from reviews on HTLAL of Assimil products, a base course plus an advanced course only gets a learner to a little above A2. Elexi commented multiple times on this (and I note he is now a member of this forum so perhaps he will weigh in here). Being realistic about the vocabulary levels required for the lower CEFR levels should help new learners avoid disappointment upon finishing so-called "advanced" courses when they are not close to being able to read a newspaper or follow the course of a conversation among native speakers.
Excellent post Peregrinus. It's amazing how the numbers in that linked study are all over the place - strange that vocabulary isn't nailed down for each level for each language. You have put a lot of thought into this, and I appreciate it. The assumptions I made for that table may not have been valid.