Publikacije

Bata­no­vić, V., N. Lju­be­šić, and T. Samar­džić (2018). SETimes.SR – A Refe­ren­ce Tra­i­ning Cor­pus of Ser­bi­an. In Pro­ce­e­dings of the Con­fe­ren­ce on Lan­gu­a­ge Tech­no­lo­gi­es & Digi­tal Huma­ni­ti­es 2018 (JT-DH 2018), Lju­blja­na, Slo­ve­nia, 11–17. pdf ppt

Lju­be­šić, N., Ž. Agić, F. Klu­bič­ka, V. Bata­no­vić, and T. Erja­vec (2018). hr500k — A Refe­ren­ce Tra­i­ning Cor­pus of Cro­a­ti­an. In Pro­ce­e­dings of the Con­fe­ren­ce on Lan­gu­a­ge Tech­no­lo­gi­es & Digi­tal Huma­ni­ti­es 2018 (JT-DH 2018), Lju­blja­na, Slo­ve­nia, 154–161. pdf ppt

Fišer, D., M. Mili­če­vić Petro­vić, and N. Lju­be­šić (2018). Zapi­so­val­ne prak­se v splet­ni slo­ven­šči­ni. In D. Fišer (Ed.), Viri, oro­dja in meto­de za ana­li­zo splet­ne slo­ven­šči­ne. Lju­blja­na: Lju­blja­na Uni­ver­si­ty Press, Facul­ty of Arts. 124–139. pdf

Samar­džić, T.  and P. Mer­lo (2018). Pro­ba­bi­li­ty of exter­nal cau­sa­ti­on: an empi­ri­cal acco­unt of cross-lin­gu­i­stic vari­a­ti­on in lexi­cal cau­sa­ti­ves. Lin­gu­i­stics 56(5), 895–939. pdf (pre-print).

Luset­ti, M., T. Ruz­sics, A. Göhring, T. Samar­džić, and E. Stark (2018).  Enco­der-deco­der met­hods for text nor­ma­li­za­ti­on. In Pro­ce­e­dings of the Fifth Works­hop on NLP for Simi­lar Lan­gu­a­ges, Vari­e­ti­es and Dia­lects (Var­Di­al 2018), COLING 2018, San­ta Fe, NM, USA, 18–28. pdf  bib

Vuko­vić, T. and T. Samar­džić (2018). Pro­stor­na ras­po­de­la fre­kven­ci­je post­po­zi­tiv­nog čla­na u timoč­kom govo­ru. U Timok. Fol­klo­ri­stič­ka i lin­gvi­stič­ka teren­ska istra­ži­va­nja 2015–2017, Knja­že­vac, Srbi­ja: Narod­na bibli­o­te­ka Knja­že­vac, 181–201. pdf

Bata­no­vić, V., M. Cve­ta­no­vić, and B. Niko­lić (2018). Fine-gra­i­ned Seman­tic Textu­al Simi­la­ri­ty for Ser­bi­an. In Pro­ce­e­dings of the 11th Inter­na­ti­o­nal Con­fe­ren­ce on Lan­gu­a­ge Reso­ur­ces and Eva­lu­a­ti­on (LREC 2018). Miya­za­ki, Japan, 1370–1378. pdf bib

Bata­no­vić, V. and B. Niko­lić (2017). Sen­ti­ment Clas­si­fi­ca­ti­on of Docu­ments in Ser­bi­an: The Effects of Morp­ho­lo­gi­cal Nor­ma­li­za­ti­on and Word Embed­dings. Tel­for Jour­nal 9(2). 104–109. pdf

Mili­če­vić Petro­vić M., N. Lju­be­šić, and D. Fišer (2017) Nestan­dard­no zapi­si­va­nje srp­skog jezi­ka na Tvi­te­ru: mno­go buke oko malo odstu­pa­nja? Ana­li Filo­lo­škog fakul­te­ta 29(2). 111–136. pdf

Mili­če­vić, M., N. Lju­be­šić, and D. Fišer (2017). Birds of a feat­her don’t qui­te twe­et toget­her: An ana­lysis of spel­ling vari­a­ti­on in Slo­ve­ne, Cro­a­ti­an and Ser­bi­an Twit­te­re­se. In D. Fišer and M. Bei­ßwen­ger (Eds) Inve­sti­ga­ting Com­pu­ter-Medi­a­ted Com­mu­ni­ca­ti­on: Cor­pus-based Appro­ac­hes to Lan­gu­a­ge in the Digi­tal World. Lju­blja­na: Lju­blja­na Uni­ver­si­ty Press, Facul­ty of Arts. 14–43. pdf

Ruz­sics, T. and T. Samar­džić (2017). Neu­ral sequ­en­ce-to-sequ­en­ce lear­ning of inter­nal word struc­tu­re. In Pro­ce­e­dings of the 21st Con­fe­ren­ce on Com­pu­ta­ti­o­nal Natu­ral Lan­gu­a­ge Lear­ning (CoNLL 2017). Van­co­u­ver, Cana­da, 184–194. pdf  bib

Derungs, C. and T. Samar­džić (2017). Are pro­mi­nent moun­ta­ins fre­qu­en­tly men­ti­o­ned in text? Explo­ring the spa­ti­al expres­si­ve­ness of text fre­qu­en­cy. Inter­na­ti­o­nal Jour­nal of Geo­grap­hi­cal Infor­ma­ti­on Sci­en­ce 32(5), 856–873. pdf

Bentz, C., D. Ali­ka­ni­o­tis, T. Samar­džić, and P. But­te­ry (2017). Vari­a­ti­on in word fre­qu­en­cy distri­bu­ti­ons: Defi­ni­ti­ons, mea­su­res and impli­ca­ti­ons for a cor­pus-based lan­gu­a­ge typo­lo­gy. Jour­nal of Quan­ti­ta­ti­ve Lin­gu­i­stics 24(2–3), 128–162. pdf

Samar­džić, T., M. Sta­ro­vić, Ž. Agić, and N. Lju­be­šić (2017). Uni­ver­sal Depen­den­ci­es for Ser­bi­an in Com­pa­ri­son with Cro­a­ti­an and Other Sla­vic Lan­gu­a­ges. In Pro­ce­e­dings of the Sixth Works­hop on Bal­to-Sla­vic Natu­ral Lan­gu­a­ge Pro­ces­sing (BSNLP 2017). Valen­cia, Spa­in, 39–44. pdf bib

Lju­be­šić, N., T. Samar­džić, and C. Derungs (2016). Twe­et­Geo — A tool for col­lec­ting, pro­ces­sing and ana­lysing geo-enco­ded lin­gu­i­stic data. In Pro­ce­e­dings of the 26th Inter­na­ti­o­nal Con­fe­ren­ce on Com­pu­ta­ti­o­nal Lin­gu­i­stics (COLING 2016). Osa­ka, Japan. pdf bib

Lju­be­šić, N., T. Erja­vec, D. Fišer, T. Samar­džić, M. Mili­če­vić, F. Klu­bič­ka, and F. Pet­kov­ski (2016). Easi­ly acces­si­ble lan­gu­a­ge tech­no­lo­gi­es for Slo­ve­ne, Cro­a­ti­an and Ser­bi­an. In T. Erja­vec and D. Fišer (Eds), Pro­ce­e­dings of the Con­fe­ren­ce on Lan­gu­a­ge Tech­no­lo­gi­es & Digi­tal Huma­ni­ti­es 2016 (JT-DH 2016). Lju­blja­na: Lju­blja­na Uni­ver­si­ty Press, Facul­ty of Arts. 120–124. pdf ppt

Mili­če­vić, M. and N. Lju­be­šić (2016). Tvi­te­ra­si, tvi­te­ra­ši or twit­te­ra­ši? Pro­du­cing and ana­lysing a nor­ma­li­sed data­set of Cro­a­ti­an and Ser­bi­an twe­ets. Slo­ven­šči­na 2.0 4(2). 156–188. pdf

Bata­no­vić, V., B. Niko­lić, and M. Milo­sa­vlje­vić (2016). Reli­a­ble Base­li­nes for Sen­ti­ment Ana­lysis in Reso­ur­ce-Limi­ted Lan­gu­a­ges: The Ser­bi­an Movie Revi­ew Data­set. In Pro­ce­e­dings of the 10th Inter­na­ti­o­nal Con­fe­ren­ce on Lan­gu­a­ge Reso­ur­ces and Eva­lu­a­ti­on (LREC 2016). Por­to­rož, Slo­ve­nia, 2688–2696. pdf bib

Lju­be­šić, N. and T. Erja­vec (2016). Cor­pus vs. lexi­con super­vi­si­on in morp­ho­syn­tac­tic tag­ging: the case of Slo­ve­ne. In Pro­ce­e­dings of the 10th Inter­na­ti­o­nal Con­fe­ren­ce on Lan­gu­a­ge Reso­ur­ces and Eva­lu­a­ti­on (LREC 2016). 1527–1531. pdf bib

Lju­be­šić, N., T. Erja­vec, and D. Fišer (2016). Cor­pus-based dia­cri­tic resto­ra­ti­on for South Sla­vic lan­gu­a­ges. In Pro­ce­e­dings of the 10th Inter­na­ti­o­nal Con­fe­ren­ce on Lan­gu­a­ge Reso­ur­ces and Eva­lu­a­ti­on (LREC 2016). 3612–3616. pdf bib

Lju­be­šić, N., F. Klu­bič­ka, Ž. Agić, and I.-P. Jazbec (2016). New inflec­ti­o­nal lexi­cons and tra­i­ning cor­po­ra for impro­ved morp­ho­syn­tac­tic anno­ta­ti­on of Cro­a­ti­an and Ser­bi­an. In Pro­ce­e­dings of the 10th Inter­na­ti­o­nal Con­fe­ren­ce on Lan­gu­a­ge Reso­ur­ces and Eva­lu­a­ti­on (LREC 2016). 4264–4270. pdf bib

Samar­džić, T. and M. Mili­če­vić (2016). A Fra­me­work for auto­ma­tic acqu­i­si­ti­on of Cro­a­ti­an and Ser­bi­an verb aspect from cor­po­ra. In Pro­ce­e­dings of the 10th Inter­na­ti­o­nal Con­fe­ren­ce on Lan­gu­a­ge Reso­ur­ces and Eva­lu­a­ti­on (LREC 2016). 4596–4601. pdf bib

Fišer, D., T. Erja­vec, N. Lju­be­šić, and M. Mili­če­vić (2015). Com­pa­ring the non­stan­dard lan­gu­a­ge of Slo­ve­ne, Cro­a­ti­an and Ser­bi­an twe­ets. In Smo­lej, M. (Ed.), Sim­po­zij Obdo­bja 34. Slov­ni­ca in slo­var — aktu­al­ni jezi­kov­ni opis (1. del). Lju­blja­na: Filo­zof­ska fakul­te­ta. 225–231. pdf

Samar­džić, T., N. Lju­be­šić, and M. Mili­če­vić (2015). Regi­o­nal Lin­gu­i­stic Data Ini­ti­a­ti­ve (ReLDI). In Pro­ce­e­dings of the Fifth Works­hop on Bal­to-Sla­vic Natu­ral Lan­gu­a­ge Pro­ces­sing (BSNLP 2015). 10–11 Sep­tem­ber 2015, His­sar, Bul­ga­ria. pdf bib