02 May Serbian lexicon: srLex
srLex is an inflectional lexicon of Serbian.
The size of the lexicon is 108,829 lemmas, or 5,326,726 surface forms.
Each entry in the lexicon consists of a (wordform, lemma, MSD, absolute frequency, in-million frequency) quintuple, e.g.: (ženu, žena, Ncfsa, 15838, 0.028556). The frequencies were estimated on the Serbian web corpus srWaC.
The set of morphosyntactic tags used in the lexicon follows the MULTEXT-East V5 tagset for Bosnian (and Serbian), available here.
srLex can also be accessed and queried via our online interface, which can also be used as an API (application programming interface), and can be found here.
Nikola Ljubešić, Filip Klubička, Željko Agić, Ivo-Pavao Jazbec (2016). New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož, Slovenia. [Link] [.bib]