02 May Diacritic restoration tool
A tool for automatic diacritic restoration on text with potentially missing diacritics (e.g. it turns kuca into kuća if necessary). Reported accuracy of the tool: 99.5% on standard language and 99.2% on non-standard language.
Authors
Nikola Ljubešić, Tomaž Erjavec, Darja Fišer
Availability
The tool is freely available in two forms:
- The code and models of the tool can be downloaded from this GitHub repository.
- Our web service can be accessed from of our Python library, which can also be downloaded from the CLARIN.SI GitHub repository. Instructions on how to install the ReLDI library from GitHub can be found here (in Serbian). Alternatively, the easiest way to install it is through PyPI from the command line interface. (Detailed instructions also on GitHub.)
The second option, i.e. using the ReLDI Python library, is most recommended for handling larger amounts of data.
Publications