Croatian lexicon: hrLex

hrLex is an in­flec­tion­al lex­i­con of Croa­t­ian.
The size of the lex­i­con is 103,077 lem­mas, or 4,970,520 sur­face forms.
Each en­try in the lex­i­con con­sists of a (word form, lem­ma, MSD, ab­solute fre­quen­cy, in-mil­lion fre­quen­cy) quin­tu­ple, e.g.: (ženu, žena, Ncf­sa, 54158, 0.038746). The fre­quen­cies were es­ti­mat­ed on the Croa­t­ian web cor­pus hrWaC.

The set of mor­phosyn­tac­tic tags used in the lex­i­con fol­lows the MUL­TEXT-East V5 tagset for Croa­t­ian, avail­able here.

Niko­la Ljubešić, Fil­ip Klu­bič­ka
For lo­cal use, hrLex can be down­loaded as a raw text file here.
hrLex can also be ac­cessed and queried via our on­line in­ter­face, which can also be used as an API (ap­pli­ca­tion pro­gram­ming in­ter­face), and can be found here.
The lex­i­con and its con­struc­tion process have been de­scribed in de­tail in the fol­low­ing pa­per:
Niko­la Ljubešić, Fil­ip Klu­bič­ka, Željko Agić, Ivo-Pavao Jazbec (2016). New In­flec­tion­al Lex­i­cons and Train­ing Cor­po­ra for Im­proved Mor­phosyn­tac­tic An­no­ta­tion of Croa­t­ian and Ser­bian. Pro­ceed­ings of the Tenth In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC’16). Por­torož, Slove­nia. [Link] [.bib]

Licence and citation

The re­source on this page is avail­able un­der the GNU Gen­er­al Pub­lic Li­cense 3.0. By down­load­ing the re­source, you agree to the terms of use de­fined by this li­cense.

When us­ing the re­source it is nec­es­sary to cite the pa­pers list­ed with it as well as the ReLDI repos­i­to­ry page.