Serbian lexicon: srLex

sr­Lex is an in­flec­tion­al lex­i­con of Ser­bian.
The size of the lex­i­con is 108,829 lem­mas, or 5,326,726 sur­face forms.
Each en­try in the lex­i­con con­sists of a (word­form, lem­ma, MSD, ab­solute fre­quen­cy, in-mil­lion fre­quen­cy) quin­tu­ple, e.g.: (ženu, žena, Ncf­sa, 15838, 0.028556). The fre­quen­cies were es­ti­mat­ed on the Ser­bian web cor­pus srWaC.

The set of mor­phosyn­tac­tic tags used in the lex­i­con fol­lows the MUL­TEXT-East V5 tagset for Bosn­ian (and Ser­bian), avail­able here.

Niko­la Ljubešić, Fil­ip Klu­bič­ka
For lo­cal use, sr­Lex can be down­loaded as a raw text file here.
sr­Lex can also be ac­cessed and queried via our on­line in­ter­face, which can also be used as an API (ap­pli­ca­tion pro­gram­ming in­ter­face), and can be found here.
The lex­i­con and its con­struc­tion process have been de­scribed in de­tail in the fol­low­ing pa­per:
Niko­la Ljubešić, Fil­ip Klu­bič­ka, Željko Agić, Ivo-Pavao Jazbec (2016). New In­flec­tion­al Lex­i­cons and Train­ing Cor­po­ra for Im­proved Mor­phosyn­tac­tic An­no­ta­tion of Croa­t­ian and Ser­bian. Pro­ceed­ings of the Tenth In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC’16). Por­torož, Slove­nia. [Link] [.bib]

Licence and citation

The re­source on this page is avail­able un­der the GNU Gen­er­al Pub­lic Li­cense 3.0. By down­load­ing the re­source, you agree to the terms of use de­fined by this li­cense.

When us­ing the re­source it is nec­es­sary to cite the pa­pers list­ed with it as well as the ReLDI repos­i­to­ry page.