Croatian and Serbian part of speech (POS) and morphosyntactic (MSD) tagger [legacy]

This tool is con­sid­ered a lega­cy tool as the NLP pipeline achieves bet­ter re­sults on the same task, but is not avail­able as a web ser­vice yet.

A tool for au­to­mat­ic an­no­ta­tion on the mor­phosyn­tac­tic lev­el. It is ca­pa­ble of tag­ging both Croa­t­ian and Ser­bian as mod­els for both lan­guages are present in the tool.
The tag­ger is based on the CRF al­go­rithm trained on a 500,000-token Croa­t­ian train­ing cor­pus and the hrLex/sr­Lex lex­i­cons for each re­spec­tive lan­guage.

The set of mor­phosyn­tac­tic tags used in the cor­pus fol­lows the re­vised MUL­TEXT-East V5 tagset for Croa­t­ian and Ser­bian, avail­able here.

Ac­cu­ra­cies cal­cu­lat­ed on test sets for each lan­guage:
  • Croa­t­ian: 92.53%
  • Ser­bian: 92.33%
Niko­la Ljubešić
The tag­ger is freely avail­able in three forms:
  1. For lo­cal use, the code and mod­els of the tag­ger can be down­loaded from this GitHub repos­i­to­ry.
  2. The tag­ger web ser­vice can be used on­line, via our web in­ter­face that can be found here.
  3. Our web ser­vice can be ac­cessed from of our Python li­brary, which can also be down­loaded from the CLARIN.SI GitHub repos­i­to­ry. In­struc­tions on how to in­stall the ReLDI li­brary from GitHub can be found here (in Ser­bian). Al­ter­na­tive­ly, the eas­i­est way to in­stall it is through PyPI from the com­mand line in­ter­face. (De­tailed in­struc­tions also on GitHub.)

The third op­tion, i.e. us­ing the ReLDI Python li­brary, is most rec­om­mend­ed for han­dling larg­er amounts of data.

The tag­ger and its con­struc­tion process have been de­scribed in de­tail in the fol­low­ing pa­per:
Niko­la Ljubešić, Fil­ip Klu­bič­ka, Željko Agić, Ivo-Pavao Jazbec (2016). New In­flec­tion­al Lex­i­cons and Train­ing Cor­po­ra for Im­proved Mor­phosyn­tac­tic An­no­ta­tion of Croa­t­ian and Ser­bian. Pro­ceed­ings of the Tenth In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC’16). Por­torož, Slove­nia. [Link] [.bib]

Licence and citation

The soft­ware on this page is avail­able un­der the Apache Li­cense 2.0. By down­load­ing the soft­ware, you agree to the terms of use de­fined by this li­cense.

When us­ing the soft­ware it is nec­es­sary to cite the pa­pers list­ed with it as well as the ReLDI repos­i­to­ry page.