Is anyone familiar with the book "Fluent Forever" by Gabriel Wyner? While I have not read it, I have read alot about it on the website. The vocabulary section of the website is particularly interesting. He basically says (and backs it up with some research) that by knowing the 1000 most commonly used words in a language, one could understand roughly 70% of the language. By knowing the 2000 most common words, understanding would increase to 80%. He has created a list of the 625 most commonly used words in English, which he says will correspond to the 1000 most common in other languages. How he came up with 625 I do not know. I will add that verbs in the word list only include the infinitives, and he assumes that you will learn the conjugations in the most commonly used tenses. Needless to say, I have decided to try this out. I am slowly creating a deck in Anki of these 625 words for my Spanish learning. I am drilling from this deck daily (or almost daily). We will see how it goes. Any thoughts?
Hi H.F. and welcome to the forum! Check out the review thread for Benny's Fluent in 3 months. A lot of the same criticisms apply to Wyner. This has been discussed a lot, but though it is true that relatively few words account for a majority of those found in general texts and speech, you've got 15,000-20,000 other words constantly rotating through the remaining 20% at various frequencies, and which keep you from getting much of the meaning. The lexical threshold, where one understands 98% of the words encountered to allow comfortable listening and reading and being able to have a shot at inferring the meaning of unknown words from context, is between 6000-7000 word families (10,200-11,900 words) for English spoken text, and 8000-9000 word families (13,600-15,300 words) for written text for English. I suspect that due to the high number of cognates and some relationship to English, that the threshold for Spanish is somewhat lower, maybe 20% lower. If you just want to chit-chat, then 2000-3000 word families is probably sufficient. But if you wish to read widely in non-fiction and fiction, let alone for any professional or academic purpose, and understand most everything you encounter, then you need to hit the lexical threshold. Those 2000-3000 word families though have the most colloquial variations and uses and they are the firm basis on which to build, so any time spent learning them is never wasted, and is in fact absolutely necessary.
Does he mean you will understand 70%-80% at the word level, or the sentence level? If he's just noting that knowing 1000 words will mean you know 70% of the total number of words in most texts, it's a little misleading putting it the way he did. If I'm not understanding the entire sentence, I'm not understanding the language. I agree with this, and just wanted to add that there's a big vocabulary difference between 1) conversing about almost anything you want, one-on-one and 2) understanding natives when they talk to each other, movies, TV, radio, etc.
put on - put off - put through - put up - put up with - put behind you - put away - put away - put out - put down - put down - put down... and many more besides. All of these so-called "phrasal verbs" are made up of very common English words, but in most cases the words aren't enough. These "most common words" lists are useful, but as the others have already said, understanding words isn't the same as understanding phrases. Consider: I was fromblish that I had grimbled it without triblobding. 70% of the vocabulary is familiar to you... does that count as understanding 70% of the sentence...? While the author has statistical sources, this isn't all that much more than a statistical curiosity. Furthermore, his rule of 625 English = 1000 <<other_language>> is pretty much baseless conjecture -- it really all depends on the language, and when you get away from English the question of "what is a word?" is more than just an amusing academic debate. I mean, how many words do Eskimos really have for snow? The Eskimo languages are agglutinative, which means words can be an arbitrary length, so Eskimos technically have many millions of possible words for snow. The author's assumption would presumably be Western European languages, but if we redefined our "word" to include English phrasal verbs as a minimal unit of meaning, we'd find that English "come back" and "go back" are one word in Spanish (volver) and the same is true of many other examples. So while it makes a lot of sense to learn the most common words first, there is none of the magic in it that writers like to ascribe it.
In order to be fair to the author of the book, I will say that I have not read it. To answer your question, I think he means that by knowing the 1000 most common words in a language, along with understanding the grammar and sentence structure, etc. , that one should be able to understand approximately 70% of what is being said or what is written. The illustration he uses is showing paragraphs, written in English, with the words that are among the 1000 most common in English viewable. Words that are not among the 1000 most common are left blank. He states that even with the words left blank, you can pretty much understand in pretty good detail what is being said. My question is this: Could we strategically focus our vocabulary learning (along with grammar, pronunciation, listening, etc.) on the most commonly used words to make our learning more efficient?
I agree. Learning a new language, just like any other task, requires time, practice and effort. There are no shortcuts, in my opinion.
Absolutely. But I think to infer meaning means you need to know the language works (which is why we can fill in the blanks in English so well), and to do this you are probably going to have to look up allot of words at first so that you can see how sentences work in your target language. I'll do 1000 or so words that are in whatever course I'm doing, then use the course build up my understanding of how everything goes together, then it's mainly a matter of continuing work to boost vocab again in one way or another. It's still possible to get blocked by two words in a sentence, and very often when they are right next to each other.
Disregarding potential problems with frequency lists, if you learned the most frequent 5000 or so words, then you would be doing that, After that though, the rest of the words are relatively infrequent apart from specialist language. The lexical threshold I mentioned above works mainly by exclusion, i.e. the exact words you know are not as important as the cumulative effect of knowing them all (and vice versa when you have not yet reached the threshold). Once you know such a base of vocabulary in a language, you can merely read what interests you, intensively at first, and extract unknown words and their definitions to put into Anki. Or you can "specialize" in various thematic areas in a newspaper for instance, and learn all the new words there and soon be able to read such areas fairly well. General news is the hardest because it covers such a broad area. Or start reading novels and do the same. If you want to focus on conversational dialogue, then popular novels with lots of dialogue and soap operas provide that.
In my experience, where I see some beginners going wrong with learning the most common words on their own is by focusing so much on them that they neglect their learning of how to manipulate them. It can become a means without an end. What tends to slowly happen to some of these folks is that they end up concentrating so much on SRS and how many words they "know" that this becomes the core of their learning. It's an attractive trap. Statistics and word count, tend to encourage some beginners to keep pursuing this method while they lose sight of what language is actually used for, communication. "Yaaay, I now know 500 Russian words". That's nice. What can you do with them?
Nicely stated. Personally, I don't like memorizing words which I haven't encountered in context, although in the beginning it's a little hard to pull off. The OP is only talking about 625 words though, so he's probably ok.
To continue on Iguanamon's point. I fell right into that trap. Just last year, my only means of studying Russian consisted on doing SRS (memrise). I learned about 3000 words with no context over the course of one year. Big suprise I couldn't do or understand much. Then I discovered anki and MCD (massive cloze deletion) and I did another burst of SRS. But for nearly one year I haven't done any SRS at all and my comprehension keeps improving (doing mainly L+R and watching TV series). Alot of things have changed since then. It probably did help to get me started in the language, but I think SRS gets counterproductive at some point. At some point I wasn`t learning new words, just reviewing the same ones that just wouldn't stick. I guess they would have stuck better if seen in different contexts. To get back to the OP, I do believe a certain amount of SRS to learn 625 words is good to start. It just can't occupy 100% of your study time and it should be used as a base to quickly move on to native material.
A good way to start using Anki while also learning in context, is simply at first to put in words from a substantial beginner course that has around 1500 words+. In doing so it will be hard not to get the vast majority of those 625 words or whatever. I personally like to pre-learn the vocabulary for a lesson from the list associated with same before actually studying the lesson/dialogue.
I watched Gabe Wyner's videos and read most of the stuff on his fluentforever website a few weeks ago. Part of the idea behind the 625 of 1000 words is that those 625 were words he could find sensible images for. He's got a clever Anki flashcard approach. It takes longer to build cards. I made 3 of them with his technique and they ended up being easy to learn. The cards are so cool that I always say "good", rather than "easy" when I get the card because I want to see it again. Those cards though matured very quickly. He incorporates: The word. Asks you to spell it. The sound (a short audio clip). An image. Since you choose your own images for the cards, they make sense to you personally. I'm more on the Peregrinus (he's a teenage Anki whore) page. That is, I'm working on 5000 words in my SRS at the rate of 10 per day.
You're an underachiever. I'm over 13K at the rate of 30-60 a day. I've lapped you so many times I've quit counting . At the rate of 10/day it will take 17 months to reach 5K, and 3x that to hit close to the lexical threshold. Of course you are the actual teenage Anki whore and not me who is in his 50s. So you've got time that I don't. At least if I want to be able to use a language before my teeth fall out.
Unless one intends on stopping at 650/1000/whatever, it can't really matter much exactly which words are included in such lists, as long as the list on the whole is in the appropriate frequency band.
You are the true teenage Anki whore. I just used that phrase because I liked it. I'm astonished that you can put that many words into an SRS consistently. I don't want to spend much more than ten minutes per day on Anki. I'm not yet convinced of it's power for me. If I remember right, you are very diligent and careful with your Anki. I use Anki as a "check and fill the gaps" method. Assuming I can continue adding words at the desired rate of about 10 per day, I should have a vocabulary of nearly double the official number. By that I mean, I learn most of my vocabulary from other sources and just use Anki to ensure I'm not missing some of the common words. I haven't convinced myself that as the words become less frequent that I'll be able to maintain the pace. But at a high level, if I maintain a few less than 10 Anki words per day, my vocabulary should be in the 7000-9000 range after about 3 years of study, which some call a lexical threshold. I'm doing French, so I've got the cognate discount. Also, there is the related word discount, which helps a lot as your vocabulary grows. I.E., understanding a word because you know some words that use the same root. P.S., I'm as old as you are. I'm just not as mature.
When looking at numbers for the lexical threshold, one must distinguish whether word families or words are being discussed. For English, 7000 word families is the upper range of the spoken text lexical threshold, and 9000 the one for written text. I was speaking of words above, and to get the equivalent number of words, one multiplies by 1.7. Thus, 7000 word families = 11,900 words, and 9000 word families = 15,3000 words. So if you want to reach the threshold in 3 years, you need to learn 17 words per day instead of 10. The more words you learn, the easier it becomes to learn new ones, because you have not just related words, but also more words/concepts to "hang" new ones on, like synonyms. The exception, for me at least, is more abstract words, or words where all members of a word family are relatively rare. I have mentioned before, and actually continue to be amazed, that the 40-60 minutes per day I spend on Anki reviews, and which at first covered only a couple thousand words, now covers over 13K due to the algorithmic effect of well-learned words being constantly pushed further into the future for subsequent reviews. This is apart from time spent on creating cards for review, and learning new words via the hybrid Iversen list/Anki method I currently use. So probably actually 2 hours per day. If the reviews start to pile up too much, then I do cut back for a day or two on learning new words.
The 70% resp. 80% refer to coverage: 70% of all word forms in a standard corpus represent approximately 1000 unique wordforms, and 80% correspond to approximately 2000 unique wordforms. I know this because I recently analysed the Kilgareff frequency list from around 1990, which is based on the 100 mio. wordforms in the British National Corpus. Using word forms means that you count be, am, is as three distinct items (and also 's and figures are also counted in this case - which I find quite idiotic. Headwords or lemmas are dictionary words, i.e. you count everything in a paradigm as one item, but for instance be and the noun being would be two items. And finally you can count word families, where be and being count as one item. Everybody has to learn the basic grammar words, but there are max. a few hundred word forms in this category (even with inflected pronouns and auxiliary verbs). There are a few thousand words which are so common that you can count on finding them in almost all 'normal' texts - and the rest are so rare that you have to do an effort to learn them when you see them. But these calculations are based on wordforms (or maybe headwords or word families), and they don't take expressions into account - which definitely is a problem for the whole frequency discussion. I haven't seen any decent frequency lists for expressions , but most expressions are just as rare as single rare words, and they should be learned as you find them - otherwise it may last a long time before next time you see them. As for the frequency bands you shouldn't care too much about them. The variation in frequency among words above the 1000-2000 word tresholds is tiny compared to the variation between different genres, authors and topics. Read things which you find interesting and let the materials decide which words you learn.
As Iversen mentions, we are talking about lemma forms, i.e. base forms and not all the conjugated and declined forms. However regularly derived forms are counted separately and make up a word family. For myself I do count and learn separately, past participles (not applicable in all languages), if they have a separate dictionary entry. n-grams, or lexical chunks, which is what expressions are, i.e. not mere commonly found together words that are not collocations, are indeed what I would like to find frequency lists for in various languages. Google does have available a huge database of files in excel format for various languages, but the files are indeed huge (I mean HUGE), and there are dozens to hundreds of such files for each language. There is so much junk in the files that one would have to manually go through them which would take ages. However perhaps some professional level (i.e. expensive) corpus analysis software exists that could sort such expressions by frequency and perhaps auto-delete some of the junk. As I mentioned in another thread, the reason to learn 15-20,000 words is not because the words above a certain level are important frequency-wise, because as Iversen says they are not. Rather it is the cumulative effect of knowing that many words that provides the operative effect of exclusion to reduce the number of unknowns down to a reasonable level, i.e. 1 to 2% per page. Someone could probably learn their 2nd 10,000 words in the 40-50K range instead of the 10-20K range, but I think that they would have to do it intentionally, or be concentrating on just one field of interest. That might matter but it seems highly unlikely for it to occur outside of say concentrating on the medical field only and then not knowing rarer words found in newspaper articles, let alone literary fiction. And even then, one is simply going to run across more general terms found in lower frequency bands. Echoing Iversen's advice above to read what interests one, Kato Lomb discussed several foreign language learners in her book who started with an introductory course or just a grammar and dictionary, and then proceeded to read stuff that interested them. All of them despite widely varying primary interests managed to reach fluency in reading at least. But they all read A LOT in their L2s.
If you look on Dr. Arguelles' website, you can scroll down and find a table of text coverage, based on a 400 word/page book. At 95% coverage, the point where you could, albeit with strain, try to read a book without looking up words, you are still missing 20 words/page, or maybe a dozen per page for a smaller sized book. 95% coverage is what you get with around 4000 word families, or around 6800 lemma form words in English (and according to my own estimation 20%+ more for German). So you then have to more than double that number of words just to advance to the 98% level of the lexical threshold. The people who claim spoken fluency with 3000 words where they discuss nothing in depth, use awkward circumlocutions and control the conversation, are totally off-base if they try to apply that standard to reading. Which is why they mostly avoid discussing reading fluency.