Building a wordbank/phrasebank...?

Discussion in 'Language Resources' started by Cainntear, Jun 8, 2014.

  1. Cainntear

    Cainntear Active Member VIP member

    Joined:
    Apr 29, 2014
    Messages:
    343
    Native Language:
    English
    Advanced Languages:
    Catalan, French, Italian, Scottish_Gaelic, Spanish
    Intermediate Languages:
    Corsican
    Basic Languages:
    Dutch, German, Irish, Polish, Russian, Welsh, Sicilian
    Peregrinus was discussing elsewhere the idea of a collaborative language course built by the forum membership, an idea that has been attempted several times at other forums and websites, but ultimately has always been dropped.

    The logistical problems in such a task are manifold:
    • design-by-committee rarely works
    • every language is different, and a "universal" course template does not and cannot exist
    • by the time such a project fails, many man-hours of work have typically been spent, and the result is three-to-five lessons that will never be of use to anyone.
    I would like to propose instead something that will provide immediate value, even if never completed.

    Quite simply: a bank of words and phrases. Imagine we all record the 100 most common verbs in our own languages -- it's an immediate resource that can be built into Anki decks. Numbers from one to one-hundred. Again, immediately useful. Most common nouns. Pronouns. Greetings, pleasantries, leave-taking phrases. Blah-blah-blah.

    A few hours from each of us, and we've got something immediately useful and infinitely reusable. Then if anyone later wants to start building a structured course, they'll have all this material to build on.

    Most of this basic stuff is available free somewhere or other, but typically under a license that prohibits redistribution, so you can't create anything new with it or build on it.

    So I propose a directed word-bank/phrase-bank that covers roughly similar territory in each language, with a permissive open license to allow material to be reused in future projects.

    What do you think?
  2. Peregrinus

    Peregrinus Active Member

    Joined:
    May 27, 2014
    Messages:
    613
    Native Language:
    English
    Intermediate Languages:
    German
    Basic Languages:
    Spanish
    This is a good idea and more realistic than my suggestion. If that is it doesn't end up being a typical phrase book. But if it gives a workout in phrasal verbs, and includes a lot of lexical chunks and discourse markers, then it could be hugely helpful.

    I am not sure though that as native speakers, we could come up with frequency based material off the tops of our heads. But if I took a frequency list of English verbs, easy to find, looked them up on wordreference.com for example sentences, and THEN used my judgment as a native speaker to select the most useful ones, that would be enormously helpful. What I am saying is that I prefer for efficiency to use ready-made sources as much as possible, and then tweak that. There might be copyright issues though with pulling example sentences from such online dictionaries, even though they pulled them from somewhere themselves.

    This obviously has the most benefit for languages which are not well supported on the net in the form of extensive dictionaries and frequency lists.

    Discourse markers specifically, which were discussed in detail a few times on HTLAL, are hugely important, and the "glue" that can help learners put together what they have learned into smooth flowing discourse.
  3. Cainntear

    Cainntear Active Member VIP member

    Joined:
    Apr 29, 2014
    Messages:
    343
    Native Language:
    English
    Advanced Languages:
    Catalan, French, Italian, Scottish_Gaelic, Spanish
    Intermediate Languages:
    Corsican
    Basic Languages:
    Dutch, German, Irish, Polish, Russian, Welsh, Sicilian
    I would say start simple and then start increasing the complexity. Complexity means decisions, and decisions mean disagreements.

    Note that I'm not talking about a "phrasebook", but specifically a phrasebank. The problem with phrasebooks (including, eg, Book2) is that the material is stuck in an ordered list, with only one way to use it (without time-intensive cut-and-paste work). A phrasebank of individual recordings could be immediately used with gradint, for example. Gradint's a great tool, but at the moment you have to spend a lot of time gathering your own material before you can start using it.

    If we start with complicated stuff, direct translation is difficult, and therefore SRSing it is difficult.

    Also, if you want really complicated stuff, you'd be better off heading to tatoeba.org. where there are sentences of arbitrary complexity just waiting for you.
  4. Peregrinus

    Peregrinus Active Member

    Joined:
    May 27, 2014
    Messages:
    613
    Native Language:
    English
    Intermediate Languages:
    German
    Basic Languages:
    Spanish
    I used "phrasebook" to mean the typical kind of phrases found in same, i.e. not very advanced and easily available in most languages. When I say "easily" though, it is true that like Book2 you mention, it is typically available only one way. So you are proposing a very granular database of recorded audio, that one may arrange at will.

    Would it still not be more efficient, if having a verb frequency list nearby, I looked up each verb on wordreference.com, and then individually recorded the example sentences under each entry, omitting any that I though were too repetitive or infrequent? As opposed to making up my own?

    Also as to recording, would you advocate some short training/instructions for posters wishing to contribute, said training to stress enunciation and prosody?

    tatoeba is a good idea with a poor result. "Arbitrary complexity" is the right description. So many examples seem to be random, not partticulary helpful, sentences from textbooks and reports. Same goes for linguee (which is helpful for looking up words not found in dictionaries online).
  5. hrhenry

    hrhenry Member VIP member

    Joined:
    May 21, 2014
    Messages:
    51
    Native Language:
    English
    Advanced Languages:
    Catalan, Italian, Portuguese, Spanish, Galician
    Intermediate Languages:
    Norwegian, Turkish
    Basic Languages:
    Indonesian, Polish, Ojibwe
    One thing I like about tatoeba.org, though, is that you can usually find something is lesser studied languages, even if it's just a handful of words/phrases. I was thrilled to find some Piedmontese phrases there when I was studying it.

    The larger, more popular languages are already quite well represented in courses and phrasebooks.

    R.
    ==

Share This Page