Batanović, V., M. Cvetanović, and B. Nikolić (2020). A versatile framework for resource-limited sentiment articulation, annotation and analysis of short texts. In PLoS ONE 15(11): e0242050. link
PSSOH (Application of Free Software and Open Hardware), Belgrade, Serbia. pdf
Otvoreni resursi i tehnologije za obradu srpskog jezika. In Primena slobodnog softvera i otvorenog hardvera —Nigmatulina, I., T. Kew, T. Samardžić (2020). ASR for non-standardised languages with dialectal variation: the case of Swiss German. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial2020), COLING 2020 Barcelona, Spain. pdf
IV. Belgrade: Faculty of Philology. 117 — 130. pdf
Nestandardni jezik u nastavi: novi resursi za srpski i hrvatski kao strani. In: V. Krajišnik et al. (Eds), Srpski kao strani jezik u teoriji i praksi Language Accommodation on Twitter: An example of Serbia. Slavistična revija 67. 87 — 106.Ljubešić, N., D. Fišer, and T. Erjavec (2019). The FRENK datasets of socially unacceptable discourse in Slovene and English. International Conference on Text, Speech, and Dialogue. 103–114. pdf
Ljubešić, N. and K. Dobrovoljc (2019). What Does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing. Florence, Italy. 29–34. pdf
Ruzsics, T., M. Lusetti, A. Göhring, T. Samardžić, and E. Stark (2019). Neural text normalization with adapted decoding and PoS features. Natural Language Engineering 25(5). 585–605. pdf
Scherrer, Y., T. Samardžić, E. Glaser (2019). Digitising Swiss German — How to process and study a polycentric spoken language.Language Resources and Evaluation 53. 735–769.
Batanović, V., N. Ljubešić, and T. Samardžić (2018). SETimes.SR – A Reference Training Corpus of Serbian. In Proceedings of the Conference on Language Technologies & Digital Humanities 2018 (JT-DH 2018), Ljubljana, Slovenia, 11–17. pdf ppt
Ljubešić, N., Ž. Agić, F. Klubička, V. Batanović, and T. Erjavec (2018). hr500k — A Reference Training Corpus of Croatian. In Proceedings of the Conference on Language Technologies & Digital Humanities 2018 (JT-DH 2018), Ljubljana, Slovenia, 154–161. pdf ppt
Fišer, D., M. Miličević Petrović, and N. Ljubešić (2018). Zapisovalne prakse v spletni slovenščini. In D. Fišer (Ed.), Viri, orodja in metode za analizo spletne slovenščine. Ljubljana: Ljubljana University Press, Faculty of Arts. 124–139. pdf
Samardžić, T. and P. Merlo (2018). Probability of external causation: an empirical account of cross-linguistic variation in lexical causatives. Linguistics 56(5), 895–939. pdf (pre-print).
Lusetti, M., T. Ruzsics, A. Göhring, T. Samardžić, and E. Stark (2018). Encoder-decoder methods for text normalization. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), COLING 2018, Santa Fe, NM, USA, 18–28. pdf bib
Vuković, T. and T. Samardžić (2018). Prostorna raspodela frekvencije postpozitivnog člana u timočkom govoru. U Timok. Folkloristička i lingvistička terenska istraživanja 2015–2017, Knjaževac, Srbija: Narodna biblioteka Knjaževac, 181–201. pdf
Batanović, V., M. Cvetanović, and B. Nikolić (2018). Fine-grained Semantic Textual Similarity for Serbian. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, 1370–1378. pdf bib
Batanović, V. and B. Nikolić (2017). Sentiment Classification of Documents in Serbian: The Effects of Morphological Normalization and Word Embeddings. Telfor Journal 9(2). 104–109. pdf
Miličević Petrović M., N. Ljubešić, and D. Fišer (2017) Nestandardno zapisivanje srpskog jezika na Tviteru: mnogo buke oko malo odstupanja? Anali Filološkog fakulteta 29(2). 111–136. pdf
Miličević, M., N. Ljubešić, and D. Fišer (2017). Birds of a feather don’t quite tweet together: An analysis of spelling variation in Slovene, Croatian and Serbian Twitterese. In D. Fišer and M. Beißwenger (Eds) Investigating Computer-Mediated Communication: Corpus-based Approaches to Language in the Digital World. Ljubljana: Ljubljana University Press, Faculty of Arts. 14–43. pdf
Ruzsics, T. and T. Samardžić (2017). Neural sequence-to-sequence learning of internal word structure. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada, 184–194. pdf bib
Derungs, C. and T. Samardžić (2017). Are prominent mountains frequently mentioned in text? Exploring the spatial expressiveness of text frequency. International Journal of Geographical Information Science 32(5), 856–873. pdf
Bentz, C., D. Alikaniotis, T. Samardžić, and P. Buttery (2017). Variation in word frequency distributions: Definitions, measures and implications for a corpus-based language typology. Journal of Quantitative Linguistics 24(2–3), 128–162. pdf
Samardžić, T., M. Starović, Ž. Agić, and N. Ljubešić (2017). Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages. In Proceedings of the Sixth Workshop on Balto-Slavic Natural Language Processing (BSNLP 2017). Valencia, Spain, 39–44. pdf bib
Ljubešić, N., T. Samardžić, and C. Derungs (2016). TweetGeo — A tool for collecting, processing and analysing geo-encoded linguistic data. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016). Osaka, Japan. pdf bib
Ljubešić, N., T. Erjavec, D. Fišer, T. Samardžić, M. Miličević, F. Klubička, and F. Petkovski (2016). Easily accessible language technologies for Slovene, Croatian and Serbian. In T. Erjavec and D. Fišer (Eds), Proceedings of the Conference on Language Technologies & Digital Humanities 2016 (JT-DH 2016). Ljubljana: Ljubljana University Press, Faculty of Arts. 120–124. pdf ppt
Miličević, M. and N. Ljubešić (2016). Tviterasi, tviteraši or twitteraši? Producing and analysing a normalised dataset of Croatian and Serbian tweets. Slovenščina 2.0 4(2). 156–188. pdf
Batanović, V., B. Nikolić, and M. Milosavljević (2016). Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia, 2688–2696. pdf bib
Ljubešić, N. and T. Erjavec (2016). Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). 1527–1531. pdf bib
Ljubešić, N., T. Erjavec, and D. Fišer (2016). Corpus-based diacritic restoration for South Slavic languages. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). 3612–3616. pdf bib
Ljubešić, N., F. Klubička, Ž. Agić, and I.-P. Jazbec (2016). New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). 4264–4270. pdf bib
Samardžić, T. and M. Miličević (2016). A Framework for automatic acquisition of Croatian and Serbian verb aspect from corpora. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). 4596–4601. pdf bib
Fišer, D., T. Erjavec, N. Ljubešić, and M. Miličević (2015). Comparing the nonstandard language of Slovene, Croatian and Serbian tweets. In Smolej, M. (Ed.), Simpozij Obdobja 34. Slovnica in slovar — aktualni jezikovni opis (1. del). Ljubljana: Filozofska fakulteta. 225–231. pdf
Samardžić, T., N. Ljubešić, and M. Miličević (2015). Regional Linguistic Data Initiative (ReLDI). In Proceedings of the Fifth Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015). 10–11 September 2015, Hissar, Bulgaria. pdf bib