ReLDI-NormTagNER-sr 2.0

ReLDI-Norm­Tag­N­ER-sr 2.0 is a man­u­al­ly an­no­tat­ed cor­pus of Ser­bian tweets. It is meant as a gold-stan­dard train­ing and test­ing dataset for to­keni­sa­tion, sen­tence seg­men­ta­tion, word nor­mal­i­sa­tion, mor­phosyn­tac­tic tag­ging, lem­ma­ti­sa­tion, and named en­ti­ty recog­ni­tion of non-stan­dard Ser­bian. Each tweet is also an­no­tat­ed for its au­to­mat­i­cal­ly as­signed stan­dard­ness lev­els (T = tech­ni­cal stan­dard­ness, L = lin­guis­tic stan­dard­ness). As an up­date to ver­sion 1.1, 2.0 adds an­no­ta­tions for named en­ti­ties.

Au­thors
Niko­la Ljubešić, Tomaž Er­javec, Maja Mil­iče­vić, Tan­ja Samardžić
Avail­abil­i­ty
For lo­cal use, a full-text ver­sion of the cor­pus can be down­loaded from the CLARIN.SI repos­i­to­ry.
Pub­li­ca­tion
The cor­pus con­struc­tion is (par­tial­ly) de­scribed in the fol­low­ing pa­per:
Mil­iče­vić, M. and N. Ljubešić (2016). Tvit­erasi, tvit­er­aši or twit­ter­aši? Pro­duc­ing and analysing a nor­malised dataset of Croa­t­ian and Ser­bian tweets. Slovenšči­na 2.0 4(2) link


License and citation

The resource on this page is available under the Creative Commons Attribution 4.0 International License. By downloading the resource, you agree to the terms of use defined by this license.

Creative Commons License

When using the resource it is necessary to cite the papers listed with it as well as the ReLDI repository page.