Publications

Batanović, V., N. Ljubešić, and T. Samardžić (2018). SETimes.SR – A Ref­er­ence Train­ing Cor­pus of Ser­bian. In Pro­ceed­ings of the Con­fer­ence on Lan­guage Tech­nolo­gies & Dig­i­tal Hu­man­i­ties 2018 (JT-DH 2018), Ljubl­jana, Slove­nia, 11–17. pdf ppt

Ljubešić, N., Ž. Agić, F. Klu­bič­ka, V. Batanović, and T. Er­javec (2018). hr500k — A Ref­er­ence Train­ing Cor­pus of Croa­t­ian. In Pro­ceed­ings of the Con­fer­ence on Lan­guage Tech­nolo­gies & Dig­i­tal Hu­man­i­ties 2018 (JT-DH 2018), Ljubl­jana, Slove­nia, 154–161. pdf ppt

Samardžić, T.  and P. Mer­lo (2018). Prob­a­bil­i­ty of ex­ter­nal cau­sa­tion: an em­pir­i­cal ac­count of cross-lin­guis­tic vari­a­tion in lex­i­cal causatives. Lin­guis­tics 56(5), 895–939. pdf (pre-print).

Luset­ti, M., T. Ruzsics, A. Göhring, T. Samardžić, and E. Stark (2018).  En­coder-de­coder meth­ods for text nor­mal­iza­tion. In Pro­ceed­ings of the Fifth Work­shop on NLP for Sim­i­lar Lan­guages, Va­ri­eties and Di­alects (Var­Dial 2018), COLING 2018, San­ta Fe, NM, USA, 18–28. pdf  bib

Vuković, T. and T. Samardžić (2018). Pros­tor­na raspodela frekven­ci­je post­poz­i­tivnog člana u tim­o­čkom gov­oru. U Tim­ok. Folk­loris­tič­ka i lingvis­tič­ka teren­s­ka is­traži­van­ja 2015–2017, Kn­jaže­vac, Sr­bi­ja: Nar­o­d­na bib­liote­ka Kn­jaže­vac, 181–201. pdf

Batanović, V., M. Cve­tanović, and B. Nikolić (2018). Fine-grained Se­man­tic Tex­tu­al Sim­i­lar­i­ty for Ser­bian. In Pro­ceed­ings of the 11th In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC 2018). Miyaza­ki, Japan, 1370–1378. pdf bib

Batanović, V. and B. Nikolić (2017). Sen­ti­ment Clas­si­fi­ca­tion of Doc­u­ments in Ser­bian: The Ef­fects of Mor­pho­log­i­cal Nor­mal­iza­tion and Word Em­bed­dings. Telfor Jour­nal 9(2). 104–109. pdf

Mil­iče­vić Petro­vić M., N. Ljubešić, and D. Fišer (2017) Ne­s­tandard­no za­pi­si­van­je srp­skog jezi­ka na Tviteru: mno­go buke oko malo odstu­pan­ja? Anali Filološkog fakul­te­ta 29(2). 111–136. pdf

Mil­iče­vić, M., N. Ljubešić, and D. Fišer (2017). Birds of a feath­er don’t quite tweet to­geth­er: An analy­sis of spelling vari­a­tion in Slovene, Croa­t­ian and Ser­bian Twit­terese. In D. Fišer and M. Beißwenger (Eds) In­ves­ti­gat­ing Com­put­er-Me­di­at­ed Com­mu­ni­ca­tion: Cor­pus-based Ap­proach­es to Lan­guage in the Dig­i­tal World. Ljubl­jana: Ljubl­jana Uni­ver­si­ty Press, Fac­ul­ty of Arts. 14–43. pdf

Ruzsics, T. and T. Samardžić (2017). Neur­al se­quence-to-se­quence learn­ing of in­ter­nal word struc­ture. In Pro­ceed­ings of the 21st Con­fer­ence on Com­pu­ta­tion­al Nat­ur­al Lan­guage Learn­ing (CoN­LL 2017). Van­cou­ver, Cana­da, 184–194. pdf  bib

Derungs, C. and T. Samardžić (2017). Are promi­nent moun­tains fre­quent­ly men­tioned in text? Ex­plor­ing the spa­tial ex­pres­sive­ness of text fre­quen­cy. In­ter­na­tion­al Jour­nal of Ge­o­graph­i­cal In­for­ma­tion Sci­ence 32(5), 856–873. pdf

Bentz, C., D. Alikan­i­o­tis, T. Samardžić, and P. But­tery (2017). Vari­a­tion in word fre­quen­cy dis­tri­b­u­tions: De­f­i­n­i­tions, mea­sures and im­pli­ca­tions for a cor­pus-based lan­guage ty­pol­o­gy. Jour­nal of Quan­ti­ta­tive Lin­guis­tics 24(2–3), 128–162. pdf

Samardžić, T., M. Starović, Ž. Agić, and N. Ljubešić (2017). Uni­ver­sal De­pen­den­cies for Ser­bian in Com­par­i­son with Croa­t­ian and Oth­er Slav­ic Lan­guages. In Pro­ceed­ings of the Sixth Work­shop on Bal­to-Slav­ic Nat­ur­al Lan­guage Pro­cess­ing (BSNLP 2017). Va­len­cia, Spain, 39–44. pdf bib

Ljubešić, N., T. Samardžić, and C. Derungs (2016). Tweet­Geo — A tool for col­lect­ing, pro­cess­ing and analysing geo-en­cod­ed lin­guis­tic data. In Pro­ceed­ings of the 26th In­ter­na­tion­al Con­fer­ence on Com­pu­ta­tion­al Lin­guis­tics (COLING 2016). Os­a­ka, Japan. pdf bib

Ljubešić, N., T. Er­javec, D. Fišer, T. Samardžić, M. Mil­iče­vić, F. Klu­bič­ka, and F. Petkovs­ki (2016). Eas­i­ly ac­ces­si­ble lan­guage tech­nolo­gies for Slovene, Croa­t­ian and Ser­bian. In T. Er­javec and D. Fišer (Eds), Pro­ceed­ings of the Con­fer­ence on Lan­guage Tech­nolo­gies & Dig­i­tal Hu­man­i­ties 2016 (JT-DH 2016). Ljubl­jana: Ljubl­jana Uni­ver­si­ty Press, Fac­ul­ty of Arts. 120–124. pdf ppt

Mil­iče­vić, M. and N. Ljubešić (2016). Tvit­erasi, tvit­er­aši or twit­ter­aši? Pro­duc­ing and analysing a nor­malised dataset of Croa­t­ian and Ser­bian tweets. Slovenšči­na 2.0 4(2). 156–188. pdf

Batanović, V., B. Nikolić, and M. Milosavl­je­vić (2016). Re­li­able Base­lines for Sen­ti­ment Analy­sis in Re­source-Lim­it­ed Lan­guages: The Ser­bian Movie Re­view Dataset. In Pro­ceed­ings of the 10th In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC 2016). Por­torož, Slove­nia, 2688–2696. pdf bib

Ljubešić, N. and T. Er­javec (2016). Cor­pus vs. lex­i­con su­per­vi­sion in mor­phosyn­tac­tic tag­ging: the case of Slovene. In Pro­ceed­ings of the 10th In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC 2016). 1527–1531. pdf bib

Ljubešić, N., T. Er­javec, and D. Fišer (2016). Cor­pus-based di­a­crit­ic restora­tion for South Slav­ic lan­guages. In Pro­ceed­ings of the 10th In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC 2016). 3612–3616. pdf bib

Ljubešić, N., F. Klu­bič­ka, Ž. Agić, and I.-P. Jazbec (2016). New in­flec­tion­al lex­i­cons and train­ing cor­po­ra for im­proved mor­phosyn­tac­tic an­no­ta­tion of Croa­t­ian and Ser­bian. In Pro­ceed­ings of the 10th In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC 2016). 4264–4270. pdf bib

Samardžić, T. and M. Mil­iče­vić (2016). A Frame­work for au­to­mat­ic ac­qui­si­tion of Croa­t­ian and Ser­bian verb as­pect from cor­po­ra. In Pro­ceed­ings of the 10th In­ter­na­tion­al Con­fer­ence on Lan­guage Re­sources and Eval­u­a­tion (LREC 2016). 4596–4601. pdf bib

Fišer, D., T. Er­javec, N. Ljubešić, and M. Mil­iče­vić (2015). Com­par­ing the non­stan­dard lan­guage of Slovene, Croa­t­ian and Ser­bian tweets. In Smolej, M. (Ed.), Sim­poz­ij Ob­dob­ja 34. Slovni­ca in slo­var — ak­tu­al­ni jezikovni opis (1. del). Ljubl­jana: Filo­zof­s­ka fakul­te­ta. 225–231. pdf

Samardžić, T., N. Ljubešić, and M. Mil­iče­vić (2015). Re­gion­al Lin­guis­tic Data Ini­tia­tive (ReLDI). In Pro­ceed­ings of the Fifth Work­shop on Bal­to-Slav­ic Nat­ur­al Lan­guage Pro­cess­ing (BSNLP 2015). 10–11 Sep­tem­ber 2015, Hissar, Bul­gar­ia. pdf bib