Croatian lexicon: hrLex

hrLex is an in­flec­tion­al lex­i­con of Croa­t­ian.
The size of the lex­i­con is 164,206 lem­mas, or 6,427,709 4,970,520 sur­face forms.
Each en­try in the lex­i­con con­sists of a (word form, lem­ma, MSD, MSD fea­tures, UPOS, mor­pho­log­i­cal fea­tures, ab­solute fre­quen­cy, in-mil­lion fre­quen­cy) 8-tu­ple. The fre­quen­cies were es­ti­mat­ed on the Croa­t­ian web cor­pus hrWaC.

The set of mor­phosyn­tac­tic tags used in the lex­i­con fol­lows the MUL­TEXT-East V6 tagset for Ser­bo-Croa­t­ian macro-lan­guage, avail­able here.

Niko­la Ljubešić
For lo­cal use, hrLex can be down­loaded as a raw text file here.
hrLex can also be ac­cessed and queried via our web ser­vices, which can also be used as an API (ap­pli­ca­tion pro­gram­ming in­ter­face).
The lex­i­con and its con­struc­tion process have been de­scribed in de­tail in the fol­low­ing pa­per:
Niko­la Ljubešić, Fil­ip Klu­bič­ka, Željko Agić, Ivo-Pavao Jazbec (2016). New In­flec­tion­al Lex­i­cons and Train­ing Cor­po­ra for Im­proved Mor­phosyn­tac­tic An­no­ta­tion of Croa­t­ian and Ser­bian. Pro­ceed­ings of the Tenth In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC’16). Por­torož, Slove­nia. [Link] [.bib]

Licence and citation

The resource on this page is available under the Creative Commons Attribution-ShareAlike 4.0 International License. By downloading the resource, you agree to the terms of use defined by this license.

Creative Commons License

When using the resource it is necessary to cite the papers listed with it as well as the ReLDI repository page.