Croatian and Serbian lemmatiser [legacy]

This tool is con­sid­ered a lega­cy tool as the NLP pipeline achieves bet­ter re­sults on the same task, but is not avail­able as a web ser­vice yet.

A tool for au­to­mat­ic lem­ma­ti­sa­tion (re­turn­ing the base or dic­tio­nary form of an in­flect­ed word). The tool looks up the hrLex/sr­Lex lex­i­cons and uses a pre­dic­tive mod­el for lem­ma­tis­ing OOVs (out of vo­cab­u­lary words) which was trained on avail­able cor­po­ra and lex­i­cons.

Niko­la Ljubešić
The lem­ma­tis­er is freely avail­able in three forms:
  1. For lo­cal use, the code and mod­els of the lem­ma­tis­er can be down­loaded from this GitHub repos­i­to­ry.
  2. The lem­ma­tis­er web ser­vice can be used on­line, via our web in­ter­face that can be found here.
  3. Our web ser­vice can be ac­cessed from of our Python li­brary, which can also be down­loaded from the CLARIN.SI GitHub repos­i­to­ry. In­struc­tions on how to in­stall the ReLDI li­brary from GitHub can be found here (in Ser­bian). Al­ter­na­tive­ly, the eas­i­est way to in­stall it is through PyPI from the com­mand line in­ter­face. (De­tailed in­struc­tions also on GitHub.)

The third op­tion, i.e. us­ing the ReLDI Python li­brary, is most rec­om­mend­ed for han­dling larg­er amounts of data.

Licence and citation

The soft­ware on this page is avail­able un­der the Apache Li­cense 2.0. By down­load­ing the soft­ware, you agree to the terms of use de­fined by this li­cense.

When us­ing the soft­ware it is nec­es­sary to cite the pa­pers list­ed with it as well as the ReLDI repos­i­to­ry page.