Description of subcorpora

Issue #23 new
Promme Bosken created an issue

It would be useful for us as users/content suppliers to have a list of all subcorpora, and a brief description. For example:
ald = subcorpus in the Academy spelling (1937?-1980?)
int = subcorpus in the Selskip spelling (1870?-1937?)
(and so on)

The abbreviations appear in filenames, and they correspond to
- different time slices
- different degrees of annotation (yes or no lemmatisation, e.g)
- different spelling conventions of the Frisian languages

So we need a brief description of the subcorpora.
This leads me to my second proposal. Apart from a FAQ for plain users, there must also be a FAQ2 (a repository of information) for non-ICT experts involved in the ICT product in question. So, there must also be a FAQ2 for us as linguists who are involved in the FLC. Needless to say, FAQ1 and FAQ2 must also be co-authored by ICT (Eduard). Basic knowledge about the corpus, which we don't have as experts/linguists, must be written down somewhere. Because now, when we are talking about the FLC, it is as if we are “writing in the sand”, because of our lack of shared exact knowledge of the FLC.

Comments (1)

  1. Fryske Akademy repo owner

    Yes, but I wouldn’t call it faq because it is not a faq. Perhaps “corpus description”.

  2. Log in to comment