Serbian lexicon: srLex

sr­Lex is an in­flec­tion­al lex­i­con of Ser­bian.
The size of the lex­i­con is 169,328 lem­mas, or 6,905,941 sur­face forms.
Each en­try in the lex­i­con con­sists of a (word­form, lem­ma, MSD, MSD fea­tures, UPOS, mor­pho­log­i­cal fea­tures, ab­solute fre­quen­cy, in-mil­lion fre­quen­cy) 8-tu­ple. The fre­quen­cies were es­ti­mat­ed on the Ser­bian web cor­pus srWaC.

The set of mor­phosyn­tac­tic tags used in the lex­i­con fol­lows the MUL­TEXT-East V6 tagset for Ser­bo-Croa­t­ian macro-lan­guage, avail­able here.

Au­thors
Niko­la Ljubešić
Avail­abil­i­ty
For lo­cal use, sr­Lex can be down­loaded as a raw text file here.
sr­Lex can also be ac­cessed and queried via our web ser­vices, which can also be used as an API (ap­pli­ca­tion pro­gram­ming in­ter­face).
Pub­li­ca­tions
The lex­i­con and its con­struc­tion process have been de­scribed in de­tail in the fol­low­ing pa­per:
Niko­la Ljubešić, Fil­ip Klu­bič­ka, Željko Agić, Ivo-Pavao Jazbec (2016). New In­flec­tion­al Lex­i­cons and Train­ing Cor­po­ra for Im­proved Mor­phosyn­tac­tic An­no­ta­tion of Croa­t­ian and Ser­bian. Pro­ceed­ings of the Tenth In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC’16). Por­torož, Slove­nia. [Link] [.bib]


Licence and citation

The resource on this page is available under the Creative Commons Attribution-ShareAlike 4.0 International License. By downloading the resource, you agree to the terms of use defined by this license.

Creative Commons License

When using the resource it is necessary to cite the papers listed with it as well as the ReLDI repository page.