Compete with LingQ?

Discussion in 'Technical Issues & Suggestions' started by SaraH, Jun 15, 2014.

  1. SaraH

    SaraH New Member

    Joined:
    May 14, 2014
    Messages:
    8
    Basic Languages:
    French
    In your mission statement I saw you want to build a tool like LingQ. I just want to say I hope you do it soon because it's really frustrating how they are down so often. I have a membership there but I want to quit and go somewhere more stable so you would have one customer right away. :p
    Big_Dog likes this.
  2. Big_Dog

    Big_Dog Administrator Staff Member

    Joined:
    Jan 11, 2014
    Messages:
    1,039
    Native Language:
    English
    Advanced Languages:
    Spanish
    Intermediate Languages:
    French, Japanese, Mandarin, Russian, Swahili, Thai
    Basic Languages:
    Korean
    Ok, this is what I wrote in the Mission Statement:
    My intention wasn't really to compete with LingQ. I just want a tool that lets me do this for free, and I want to do it in a community like this where I can share lessons and comments. Also, I want one that works with Thai. There was a big group of people who got enough material together to qualify to become a supported language at lingq, but after all that work lingq turned us down, citing difficulty to parse. I wasn't the one who submitted the work, but I was still pretty disappointed, and it made me promise myself that I'd do what I could to get this tool. There is at least one other member who feels that way.

    Anyway, thanks for your support of this idea. Do you know what kind of programmers this would take, and where we could find them? I'm willing to pay, but I want to get my money's worth, and I don't know what the hell I'm doing at this point.

    Btw - LingQ is down a lot, but I haven't had any problems for over a month.
  3. garyb

    garyb Member

    Joined:
    Jun 5, 2014
    Messages:
    35
    Native Language:
    English
    Advanced Languages:
    French
    Intermediate Languages:
    Italian
    Basic Languages:
    Spanish
    I've already heard of a couple of tools similar to LingQ that have been made: Learning With Texts and Readlang. I've not used either very much so I can't comment on how they compare; LWT seems a bit less shiny but very configurable, and it can be installed on a server like forum software etc. I'm just worried it'll end up like language exchange sites, where people keep making more of them even though perfectly good ones already exist and it just ends up fragmenting the user base.
    Big_Dog likes this.
  4. Cainntear

    Cainntear Active Member VIP member

    Joined:
    Apr 29, 2014
    Messages:
    343
    Native Language:
    English
    Advanced Languages:
    Catalan, French, Italian, Scottish_Gaelic, Spanish
    Intermediate Languages:
    Corsican
    Basic Languages:
    Dutch, German, Irish, Polish, Russian, Welsh, Sicilian
    If you're intending to do it for free, there's the question of what "free" really means. Would I have to use your tool to have access to the material, or would I be free to export it, perhaps under a CC or GNU FDL license? The downside with such a license would be that others would be able to copy your database and make a bigger competitor, potentially squeezing you out, but it would be more flexible and more useful in the long run if the collected material didn't live-and-die with the site.
    Big_Dog likes this.
  5. Peregrinus

    Peregrinus Active Member

    Joined:
    May 27, 2014
    Messages:
    613
    Native Language:
    English
    Intermediate Languages:
    German
    Basic Languages:
    Spanish
    Check out the now discontinued google project Foreign Language Text Reader. I have not downloaded it, so cannot say all the stuff you need is in the download package, or if it is just the codebase. However you don't need to install a personal linux server on your desktop to use it. There is a reddit discussion on this which may indicate the installer package is no longer findable.

    This parsing difficulty for not just Thai but Chinese, Arabic, etc., seems to be thorny one, and if you google you can find research papers devoted to the topic. I think either LWT or Readlang gives you options to extend a character to several more, in order to find compound words and chengyu, but that would slow you down a lot. Thai seems to have the added complication of no punctuation marks. Readlang still lists the hard to parse langs as betas. I guess as a crude way of parsing, one could use google translate and then experiment adding line breaks to see how it changes the translations, but again slow.

    In the end it might be faster to just SRS and drill a buttload of Thai/Chinese/etc. vocab and be your own parser, effectively giving up on parallel texts.

    One thing I have noticed with google translate, besides the difficulties it has with various languages which a grammar knowledge of same will often indicate to me the reason for same, is that sometimes a tool based on the GT api like the one I use as a mouse over for Firefox, will be unable to translate an individual word, just returning it back as a result, but if I highlight an entire phrase or sentence, it will then translate all of it correctly, including that word. GT matches against a huge database of prior translation as well as individual word dictionaries.

    How is GT for Thai in general? For Chinese you often get no more than the bare gist, and it never approaches the degree of accuracy for languages like Spanish or German, which are far from 100% accurate themselves.
    Big_Dog likes this.
  6. Big_Dog

    Big_Dog Administrator Staff Member

    Joined:
    Jan 11, 2014
    Messages:
    1,039
    Native Language:
    English
    Advanced Languages:
    Spanish
    Intermediate Languages:
    French, Japanese, Mandarin, Russian, Swahili, Thai
    Basic Languages:
    Korean
    Good tools, but I'm definitely not capable of integrating one of them into this site. I'm not a programmer. I fully expect to hire one or more people.
    Are you saying that you are worried about LingQ getting broken up?
    I actually hadn't planned on creating material, or at least not exclusively for use with the tool. The materials posted elsewhere on this site are free. Do you think I need to license them in some way?
    Parallel texts or mouse-over dictionaries? Why would one want to replace one with the other? Personally, I use them in conjunction.
    Terrible. It makes GT for Chinese look really advanced.
  7. Cainntear

    Cainntear Active Member VIP member

    Joined:
    Apr 29, 2014
    Messages:
    343
    Native Language:
    English
    Advanced Languages:
    Catalan, French, Italian, Scottish_Gaelic, Spanish
    Intermediate Languages:
    Corsican
    Basic Languages:
    Dutch, German, Irish, Polish, Russian, Welsh, Sicilian
    If you want to achieve critical mass, you'll need to host a lot of material, and you'll need to make sure it's material that you're allowed to host (so you won't have lawyers swamping you with takedown requests). If things are put up without any explicit license, then the implied license is that you're allowed to download this for you own use, but you're not allowed to redistribute it.

    I don't like tying material to one tool. If the tool dies, the material goes with it. If the tool proves inadequate, the material goes unused.
    The next guy who creates a tool needs to start from ground zero on collecting the material.
    A material database that can be built on later appeals more to me (as I said in this thread).
  8. t123

    t123 New Member VIP member

    Joined:
    Jun 15, 2014
    Messages:
    25
    Native Language:
    English
    Advanced Languages:
    Afrikaans
    Intermediate Languages:
    German
    Basic Languages:
    Polish, Turkish
    If anyone is interested I wrote one of these type of programs for myself, which by convenient coincidence is named ReadingTool. Basically it works in similarly to LWT, here are some screenshots which probably explain better. It's written in Python and PyQT so in theory it should work on Windows/Linux/Mac (but I haven't tested on Mac yet). I'm still working on it and there's no installer yet so you'll probably need a bit of computer know-how to install it. However I'm reasonably confident it works since I use it for my own data and haven't lost anything yet. The source code is here. (Note if you use it for Turkish it currently suffers from the common İi/Iı bug.)

    As for the parsing of the text that is problematic. At the moment my program requires the text to be segmented which is a bit unfortunate because segmenters don't always do that great a job. I did come up with a plan to handle these types of languages in a generic way but it wouldn't be compatible with languages with spaces so I never finished implementing it.
    pat tou and Big_Dog like this.
  9. Bjorn

    Bjorn Active Member VIP member

    Joined:
    Apr 17, 2014
    Messages:
    165
    Native Language:
    Norwegian
    Intermediate Languages:
    English
    Basic Languages:
    French, German
    I have given up on LingQ. I export text to my kindle or www.readlang.com .
    And using a mp3 player. Good enough for me.
  10. Peregrinus

    Peregrinus Active Member

    Joined:
    May 27, 2014
    Messages:
    613
    Native Language:
    English
    Intermediate Languages:
    German
    Basic Languages:
    Spanish
    One aspect of LingQ or such a tool, is not just the SRS portion, but how words are actually counted. The way LingQ counts is a big turnoff for me and large part of the reason I have never used. They basically count all infected and conjugated forms of words as separate words, which grossly inflates the number of "known" words. And for highly inflected languages it surely is even uglier. Perhaps this is not a problem that is easily solvable, if at all, or at least for large groups of data. There are lemmatizer tools online that you can use, but they usually have limits on the number of words one can input. And of course, such lemmatization will vary by language.

    Iversen has always said that he counts dictionary headwords (lemmas), and that is what I do. So you get each word only once not counting inflected/conjugated forms, but you do count regularly derived words separately, such as related adjectives and adverbs, etc. One exception that I count separately is past participles if they have their own dictionary entry, which actually makes it not an exception since all past participles don't have dictionary entries.

    So my point is that I believe any such tool that counts words (and it should), needs to count lemma forms.
  11. edwin

    edwin New Member

    Joined:
    Jun 20, 2014
    Messages:
    1
    I have been frustrated with LingQ over the years., but I am still paying to use it month after month. The reason is that I cannot find another product like it!

    LWT is not a scalable product. It was never designed to be one. LingQ has performance issues from time to time, but overall it can scale.

    ReadLang and other similar products, in my opinion, are different from LingQ. They are more like an online 'reader + dictionary' tools. LingQ provides a better 'engine' to manage my vocabulary.

    Like many LingQ users, I have a long list of complaints on LingQ's product features/bugs. Somehow the LingQ product team has their own priority to work on. But until another similar product arises, I am stuck with LingQ.

Share This Page