Stemmers for Serbian and Croatian: SCStemmers

This pack­age is a Java reim­ple­men­ta­tion of four pre­vi­ous­ly pub­lished stem­ming al­go­rithms for Ser­bian and Croa­t­ian:

  • The greedy and the op­ti­mal sub­sump­tion-based stem­mer for Ser­bian, by Vla­do Kešelj and Danko Šip­ka
  • A re­fine­ment of the greedy sub­sump­tion-based stem­mer, by Niko­la Miloše­vić
  • A “Sim­ple stem­mer for Croa­t­ian v0.1”, by Niko­la Ljubešić and Ivan Pandžić

All the stem­mers ex­pect the in­put text to be for­mat­ted in UTF-8. Their out­puts are also UTF-8 en­cod­ed.

Au­thor
Vuk Batanović
Avail­abil­i­ty
The pack­age and a more ex­ten­sive doc­u­men­ta­tion can be down­loaded from the SC­Stem­mers GitHub repos­i­to­ry.
Pub­li­ca­tions

The SC­Stem­mers pack­age was in­tro­duced in:

Vuk Batanović, Boško Nikolić, Mi­lan Milosavl­je­vić (2016). Re­li­able Base­lines for Sen­ti­ment Analy­sis in Re­source-Lim­it­ed Lan­guages: The Ser­bian Movie Re­view Dataset. Pro­ceed­ings of the 10th In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC 2016), pp. 2688–2696, Por­torož, Slove­nia. [Link] [.bib]

The orig­i­nal pa­pers de­scrib­ing each im­ple­ment­ed stem­ming al­go­rithm are:

  • For the greedy and the op­ti­mal sub­sump­tion-based stem­mer for Ser­bian: Vla­do Kešelj, Danko Šip­ka (2008). A Suf­fix Sub­sump­tion-Based Ap­proach to Build­ing Stem­mers and Lem­ma­tiz­ers for High­ly In­flec­tion­al Lan­guages with Sparse Re­sources. In­fothe­ca 9(1–2), pp. 23a-33a. [Link]
  • For the re­fine­ment of the greedy sub­sump­tion-based stem­mer: Niko­la Miloše­vić (2012). Stem­mer for Ser­bian lan­guage. arX­iv preprint arXiv:1209.4471. [Link]
  • For the “Sim­ple stem­mer for Croa­t­ian v0.1”: Niko­la Ljubešić, Damir Bo­ras, Ozren Kubel­ka (2007). Re­triev­ing In­for­ma­tion in Croa­t­ian: Build­ing a Sim­ple and Ef­fi­cient Rule-Based Stem­mer. Dig­i­tal In­for­ma­tion and Her­itage, pp. 313–320. [Link]


Licence and citation

The soft­ware on this page is avail­able un­der the GNU Gen­er­al Pub­lic Li­cense 3.0. By down­load­ing the soft­ware, you agree to the terms of use de­fined by this li­cense.

When us­ing the soft­ware it is nec­es­sary to cite the pa­pers list­ed with it as well as the ReLDI repos­i­to­ry page.