To get some sense of my progress, I've been trying to estimate how many words I know in Thai. I have approached this in a couple of different ways. First, I've thought through related sets of words. I know twelve months, ten colors, seven days of the week, fourty-four consonants, etc. Secondly, I've glanced over word lists, such as the vocabulary index of a Thai textbook and the entries in a learner's dictionary. By using the size of a list and the percentage of words I recognize, I can produce an estimate. My best estimate is that I know about a thousand words.
In doing this exercise, I've realized that it's actually not clear to me what it means to "know" a word. While very frequent words are clearly "known", and words I've never heard are "unknown", there is a whole range of other possibilities. There are words that I understand when listening but cannot correctly use. There are words that I recognize and understand only in context, and there are words for which my sense is still emerging and incomplete. Even taking into account this ambiguity, I think 1000 words is reasonably accurate.
Now that I know what I know, I'd like to find statistics for Thai showing that the most frequent 1000 words cover x% of spoken language, the most frequent 2000 words cover y%, etc. This lexical coverage information is easy to find for English, but I have been unable to find anything for Thai. So I've resigned myself to trying to estimate for Thai by using what is known for English.
One consideration in trying to apply English lexical coverage to Thai is that Thai morphology is not as productive as that of English. An ESL learner who acquires a word like "create" also acquires a whole family of words, including "creates", "created", "creative", "creation", and "recreate". In Thai, there are no such families of words. Other words function in place of morphology. For example, to say "created", a Thai speaker would say "create already". Word families in Thai are families of one.
I found some research on
Marlise Horst's website showing that, with a vocabulary of the thousand most frequent word families in English, students understand about 85% of spoken language. To increase that comprehension to 98%, a vocabulary of 6000-7000 word families is needed. Due to the difference in morphology, statistics for word families in English might give a rough approximation of statistics for individual words in Thai. This jibes with my experience. With my thousand word vocabulary, I think it's accurate that I understand about 85% of spoken Thai. This assumes an idealization where the only impediments to following a dialogue are vocabulary and grammar. The ability to listen to spoken dialogue at a normal rate of speed in a variety of regional accents is a separate issue.
The Linguist, an interesting ESL website, has another way to measure proficiency in a second language using the number of known words.
Beginner a) 2,000 b) 3,500 Intermediate a) 5,000 b) 7,500 Advanced a) 10,000 b) 12,500 (source:
The Linguist blog)
This system is for English, and every word in a word family is counted, so an attempt to apply it to Thai would again require taking into account the difference in morphology. Playing with the numeric data from
Horst's site, it appears that there is an average of two words in an English word family, with the most frequent families being the largest. Since Thai has word families of one word each, it seems reasonable to multiply the number of words in my vocabulary by a little more than 2 to acquire a rough estimate of an equivalent ESL vocabulary. With my thousand word vocabulary, I'm the equivalent of an ESL student a little past "Beginner A". This seems about right.