Use a recognised scale for language proficiency

Issue #90 resolved
Kristian K created an issue

What scale is used now? Most probably the user will not know the scale. The European Language Portfolio could be seen as the current standard for evaluating language skills. MT is not compatible with it though, since passive and acctive skills are not separated and no differentiation is made between understanding, speaking and writing proficiencies.

The scale used by the ELP has the six grades A1, A2, B1, B2, C1 and C2 which I believe is allready familiar to all of us.

Comments (28)

  1. Kristian K reporter

    A link to either one should be shown in the preferences dialog as simple documentation and thus as user help.

  2. Peeter Tinits

    I did use a scale, and I reported the logic behind it somewhere in this issues/documentation. Can't find it just now though. Using wikipedia's guide is an idea (this would add more levels), though the page isn't translated to Estonian for example, and it doesnt give terms to each level.

  3. Peeter Tinits

    Copying the e-mail text on this feature here. If you think the scale should be improved, it's possible. But definitely it should include both number (be it 1 or A1) and name in the dropdown menu.

    Tore näha, et asi edeneb! Keeleoskused võiks olla viie elemendiga, lähtusin ILR skaalast: https://en.wikipedia.org/wiki/ILR_scale 1) Elementary / Algeline 2) Limited / Piiratud 3) Good / Hea 4) Excellent / Suurepärane 5) Native / Emakeel

    Ma ei ole terminites 100% kindel, aga need võiks töötada küll. Neile saaks lisada ka muidugi täpsed seletused, mis nende all mõeldud (nt quick start-i või help-i). Nende seletuste leidmisega läheb aga aega (alguses saab ilma ka).

    Teen ettepaneku lisada Preferences menüü alla "languages" ja "text corpora" menüü alla üks seletav lause.

  4. Peeter Tinits

    @keeleleek - I checked out the wikipedia system, https://meta.wikimedia.org/wiki/User_language, and it seems a good system to adopt so as to be compatible with wikipedia in the long run. I can't find what their numbers are based on though - and they lack 1-word definitions for now, though maybe the issue can be fixed.

    I asked about the origins here too https://meta.wikimedia.org/wiki/Talk:User_language, but I don't expect an answer. Actually now I've found some discussion on it here: https://meta.wikimedia.org/wiki/Talk:User_language/2008. But the 0-5+N system seems to have just emerged randomly from practice - some had preferred 1-3+N or 1-4+N in 2008.

    Also note that it is not actually massively popular to mark them up like this: https://meta.wikimedia.org/wiki/Category:Users_by_language

    Still the template exists and may be useful for us in the long run.

  5. Kristian K reporter

    Since it is not massively popular to mark language skills by Wikipedia users in Wikipedia, we can't rely it will be massively popular within MT either. But remember also that not all cultures are depending much on language for cultural identification/belonging.

    Simple proposal: Use the 7 point scale as used in Wikipedia The translations can be found in Extension:Babel. Add a link to https://meta.wikimedia.org/wiki/User_language to the preferences pane. Since the translations are "self explaining", this will not be the primarily source of help to the user but still a nice thing to add for clarity.

    As an example, the Estonian translation is this

  6. Kristian K reporter

    By using the same scale as Wikipedia, we can also add links to the "find other users knowing language X" or otherwise use the same infrastructure.

  7. Kristian K reporter

    Since Extension:Babel is used and it uses the 7 point scale, we can rely on it too. Peeter, you could leave a comment on the talk page about them not matching.

  8. Peeter Tinits

    Woah! It already has translations! Awesome. MT is a completely different context than normal wikipedia for popularity. WP popularity means that you first have to find the template and think that it is necessary, in MT it is just plain lazyness if you don't mark language skills that we tell them to mark at every step.

    Found any other info on the 0-4+N vs 0-5+N? I personally find level 5 a bit wierd to understand.

    The current translation tries to form it in words as simple as possible: hence "Basic" instead of "Elementary" and "Good" instead of "Intermediate". Any thoughts on this?

  9. Peeter Tinits

    Additional note: keep in mind that the translations can't be directly retreived from Babel as they are in the context of a sentence, thus sometimes too short, or declinated. Four examples:

    "babel-0": "[[$2|$3]] — [[$1|oskus väga nõrk või puudub]]", "babel-1": "[[$2|$3]] — [[$1|algtase]]", "babel-2": "[[$2|$3]] — [[$1|keskmine tase]]", "babel-3": "[[$2|$3]] — [[$1|hea tase]]", "babel-4": "[[$2|$3]] — [[$1|emakeele lähedane tase]]", "babel-5": "[[$2|$3]] — [[$1|professionaalne tase]]", "babel-N": "[[$2|$3]] — [[$1|emakeel]]",

    "babel-0": "This user has [[$1|no]] knowledge of [[$2|$3]] (or understands it with considerable difficulty).", "babel-1": "This user has [[$1|basic]] knowledge of [[$2|$3]].", "babel-2": "This user has [[$1|intermediate]] knowledge of [[$2|$3]].", "babel-3": "This user has [[$1|advanced]] knowledge of [[$2|$3]].", "babel-4": "This user has [[$1|near native speaker]] knowledge of [[$2|$3]].", "babel-5": "This user has [[$1|professional]] knowledge of [[$2|$3]].", "babel-N": "This user has a [[$1|native]] understanding of [[$2|$3]].",

    "babel-0": "Taa pruukja mõist [[$1|väega veidüq vai ei sukugi]] [[$2|$3]] kiilt.", "babel-1": "Taa pruukja mõist [[$1|veidükese]] [[$2|$3]] kiilt.", "babel-2": "Taa pruukja mõist [[$1|küländ häste]] [[$2|$3]] kiilt.", "babel-3": "Taa pruukja mõist [[$1|väega häste]] [[$2|$3]] kiilt.", "babel-4": "Taa pruukja mõist [[$2|$3]] kiilt [[$1|pia nigu imäkiilt]].", "babel-5": "Taa pruukja om [[$2|$3]] keele pääle [[$1|vällä opnuq]].", "babel-N": "Taa pruukja [[$1|imäkiil]] om [[$2|$3]] kiil.",

    "babel-0": "{{GENDER:$4|Dieser Benutzer|Diese Benutzerin}} beherrscht [[$2|$3]] [[$1|nicht]] (oder versteht es nur mit beträchtlichen Schwierigkeiten).", "babel-1": "{{GENDER:$4|Dieser Benutzer|Diese Benutzerin}} beherrscht [[$2|$3]] auf [[$1|grundlegendem]] Niveau.", "babel-2": "{{GENDER:$4|Dieser Benutzer|Diese Benutzerin}} beherrscht [[$2|$3]] auf [[$1|fortgeschrittenem]] Niveau.", "babel-3": "{{GENDER:$4|Dieser Benutzer|Diese Benutzerin}} beherrscht [[$2|$3]] auf [[$1|hohem]] Niveau.", "babel-4": "{{GENDER:$4|Dieser Benutzer|Diese Benutzerin}} beherrscht [[$2|$3]] auf [[$1|muttersprachlichem Niveau]].", "babel-5": "{{GENDER:$4|Dieser Benutzer|Diese Benutzerin}} beherrscht [[$2|$3]] auf [[$1|professionellem]] Niveau.", "babel-N": "{{GENDER:$4|Dieser Benutzer|Diese Benutzerin}} spricht [[$2|$3]] als [[$1|Muttersprache]].",

  10. Peeter Tinits

    1) Estonian has great translations that don't use a sentence.

    2) English: the first "No" has to be changed to "No knowledge" I think. Also "Near native speaker" to "Near native"? - not sure about this one.

    3) Võro: The names are declinated, need a native speaker to rephrase, or rephrase the context such that declinated is normal.

    4) German: Again level 0 is complicated. The others are declinated or need extra context.

    Solutions? @keeleleek ? We may still have to rely on a skilled speaker coming up with one-word translations that are used only in MT. Other options?

  11. Kristian K reporter

    We need to use the translations as is since no one will translate nor modify them. Instead we need to modify our preference pane to suit the translations. I propose we have the preferences page tell the user to "please mark which user best fits your language skills".

    There is no need to talk about adapting ILR or anything else since we are not doing it. If we follow Wikipedias language policies, then that's what we are going to tell.

    Since Extension:Babel is versioned with Git, we can link it as a submodule and have the latest translations json files be pulled in with a build script. But they are not likely to change often. I think the Babel team tries to have quite stable release times.

  12. Peeter Tinits

    Yes, that's a solution. Use whatever's in the Babel translations as the content, full sentence if its there. So let's do this.

    Here's the solution @andrjus - to copy the sentences from babel as such: "Please mark which statement best matches the user's language skills." "1 - algtase" "1 - This user has basic knowledge of English.

    Since it is currently dependent on the interface language, we can now just hardcode the translation for the 4 interface languages available. @andrjus retreiving them from Git would be great, but if it's difficult, it can be left for future development.

    Small note to @keeleleek : The scale matching ILR is a separate issues from the scale matching Wikipedia's language policies. From documentation the WP policies have not been designed to match any theory/research in language learning. However for linguistic analyses a match with this would be useful. This is not important at this stage though, and is just conveniently similar. We don't have to mention it in the corpus.

  13. Peeter Tinits

    The result would then be this. With no adaptation, the name of the language ought to be placed in the sentence too. Skill level 0 is the longest.

    preferences.program.corpus.proficiency.unspecified = Undefined

    preferences.program.corpus.proficiency.level0 = 0 - This user has no knowledge of [[$2|$3] (or understands it with considerable difficulty).

    preferences.program.corpus.proficiency.level1 = 1 - This user has basic knowledge of [[$2|$3]].

    preferences.program.corpus.proficiency.level2 = 2 - This user has intermediate knowledge of [[$2|$3]].

    preferences.program.corpus.proficiency.level3 = 3 - This user has advanced knowledge of [[$2|$3]].

    preferences.program.corpus.proficiency.level4 = 4 - This user has near native speaker]] knowledge of [[$2|$3]].

    preferences.program.corpus.proficiency.level5 = 5 - This user has professional knowledge of [[$2|$3]].

    preferences.program.corpus.proficiency.levelN = N - This user has a native understanding of [[$2|$3]].

  14. Peeter Tinits

    I think we need manual involvement here, and adapt each language while doing interface translations (at the moment new language will be relevant with new interface language, these might be slow to arrive). The following seems much more realistic:

    mb explanation before the table = Choose the skill level the user has in the language (and this sentence will ask the question in a way that we can easiest use few-word combinations for each level.

    preferences.program.corpus.proficiency = Skill level

    preferences.program.corpus.proficiency.unspecified = Undefined

    preferences.program.corpus.proficiency.level0 = 0 - no knowledge

    preferences.program.corpus.proficiency.level1 = 1 - basic knowledge

    preferences.program.corpus.proficiency.level2 = 2 - intermediate knowledge

    preferences.program.corpus.proficiency.level3 = 3 - advanced knowledge

    preferences.program.corpus.proficiency.level4 = 4 - near native speaker knowledge

    preferences.program.corpus.proficiency.level5 = 5 - professional knowledge

    preferences.program.corpus.proficiency.levelN = N - native understanding

  15. Peeter Tinits

    Thoughts? @keeleleek Currently we need a solution for English, Võro, Estonian, Russian I think. But making adding more of them easier in the future is a good goal.

  16. Peeter Tinits

    I think the solution for now, @andrjus, would be:

    (@keeleleek - do you agree?)

    1) Make levels: Undefined+[0-5]+Native

    2) Use same system for display, and let's generate one or few-word names for them based on wikipedia language skills ranks.

    English: 1) Undefined 2) 0 - No knowledge (or limited understanding) 3) 1 - Basic knowledge 4) 2 - Intermediate knowledge 5) 3 - Advanced knowledge 6) 4 - Near native speaker 7) 5 - Professional knowledge 8) N - Native understanding

    Eesti 1) Määratlemata 2) 0 - Oskus väga nõrk või puudub 3) 1 - Algtase 4) 2 - Keskmine tase 5) 3 - Hea tase 6) 4 - Emakeele lähedane tase 7) 5 - Professionaalne tase 8) N - Emakeel

    Võro

    1) - 2) 0 - Väega veidüq vai ei sukugi 3) 1 - Veidükese 4) 2 - Küländ häste 5) 3 - Väega häste 6) 4 - Pia nigu imäkiilt 7) 5 - Vällä opnuq 8) N - Imäkiil

    Russian (anyone with cyrillic keyboard for capitalization?):

    1) неопределенным 2) 0 - не владеет (или с трудом понимает) 3) 1 - начальный уровень 4) 2 - средний уровень 5) 3 - хороший уровень 6) 4 - почти как родной 7) 5 - профессиональный уровень 8) N - родной

    Hopefully we can get them more automatically later, but for now, let's just migrate to Wikipedia language skills category system.

  17. Peeter Tinits

    Added new scale as fork to English, Estonian and Russian. They need translations for other updates too though.

  18. Log in to comment