Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

248 Full-Text Articles 445 Authors 37,428 Downloads 39 Institutions

All Articles in Computational Linguistics

Faceted Search

248 full-text articles. Page 1 of 12.

The Perception Of Mandarin Tones In "Bubble" Noise By Native And L2 Listeners, Mengxuan Zhao 2019 The Graduate Center, City University of New York

The Perception Of Mandarin Tones In "Bubble" Noise By Native And L2 Listeners, Mengxuan Zhao

All Dissertations, Theses, and Capstone Projects

Previous studies have revealed the complexity of Mandarin Tones. For example, similarities in the pitch contours of tones 2 and 3 and tones 3 and 4 cause confusion for listeners. The realization of a tone's contour is highly dependent on its context, especially the preceding pitch. This is known as the coarticulation effect. Researchers have demonstrated the robustness of tone perception by both native and non-native listeners, even with incomplete acoustic information or in noisy environment. However, non-native listeners were observed to behave differently from native listeners in their use of contextual information. For example, the disagreement between the ...


Analyzing Prosody With Legendre Polynomial Coefficients, Rachel Rakov 2019 The Graduate Center, City University of New York

Analyzing Prosody With Legendre Polynomial Coefficients, Rachel Rakov

All Dissertations, Theses, and Capstone Projects

This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech ...


Corpus Of Usage Examples: What Is It Good For?, Timofey Arkhangelskiy 2019 Universität Hamburg, Alexander von Humboldt Foundation

Corpus Of Usage Examples: What Is It Good For?, Timofey Arkhangelskiy

Proceedings of the Workshop on Computational Methods for Endangered Languages

Lexicography and corpus studies of grammar have a long history of fruitful interaction. For the most part, however, this has been a one-way relationship. Lexicographers have extensively used corpora to identify previously undetected word senses or find natural usage examples; using lexicographic materials when conducting data-driven investigations of grammar, on the other hand, is hardly commonplace. In this paper, I present a Beserman Udmurt corpus made out of "artificial" dictionary examples. I argue that, although such a corpus can not be used for certain kinds of corpus-based research, it is nevertheless a very useful tool for writing a reference grammar ...


Developing Without Developers: Choosing Labor-Saving Tools For Language Documentation Apps, Luke D. Gessler 2019 Georgetown University

Developing Without Developers: Choosing Labor-Saving Tools For Language Documentation Apps, Luke D. Gessler

Proceedings of the Workshop on Computational Methods for Endangered Languages

Application software has the potential to greatly reduce the amount of human labor needed in common language documentation tasks. But despite great advances in the maturity of tools available for apps, language documentation apps have not attained their full potential, and language documentation projects are forgoing apps in favor of less specialized tools like paper and spreadsheets. We argue that this is due to the scarcity of software development labor in language documentation, and that a careful choice of software development tools could make up for this labor shortage by increasing developer productivity. We demonstrate the benefits of strategic tool ...


Applying Support Vector Machines To Pos Tagging Of The Ainu Language, Karol Nowakowski, Michal Ptaszynski, Fumito Masui, Yoshio Momouchi 2019 Kitami Institute of Technology

Applying Support Vector Machines To Pos Tagging Of The Ainu Language, Karol Nowakowski, Michal Ptaszynski, Fumito Masui, Yoshio Momouchi

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Ocr Evaluation Tools For The 21st Century, Eddie A. Santos 2019 National Research Council Canada, University of Alberta

Ocr Evaluation Tools For The 21st Century, Eddie A. Santos

Proceedings of the Workshop on Computational Methods for Endangered Languages

We introduce ocreval, a port of the ISRI OCR Evaluation Tools, now with Unicode support. We describe how we upgraded the ISRI OCR Evaluation Tools to support modern text processing tasks. ocreval supports producing character-level and word-level accuracy reports, supporting all characters representable in the UTF-8 character encoding scheme. In addition, we have implemented the Unicode default word boundary specification in order to support word-level accuracy reports for a broad range of writing systems. We argue that character-level and word-level accuracy reports produce confusion matrices that are useful for tasks beyond OCR evaluation—including tasks supporting the study and computational ...


Building A Common Voice Corpus For Laiholh (Hakha Chin), Kelly Berkson, Samson Lotven, Peng Hlei Thang, Thomas Thawngza, Zai Sung, James C. Wamsley, Francis Tyers, Kenneth Van Bik, Sandra Kübler, Donald Williamson, Matthew Anderson 2019 Indiana University

Building A Common Voice Corpus For Laiholh (Hakha Chin), Kelly Berkson, Samson Lotven, Peng Hlei Thang, Thomas Thawngza, Zai Sung, James C. Wamsley, Francis Tyers, Kenneth Van Bik, Sandra Kübler, Donald Williamson, Matthew Anderson

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Bootstrapping A Neural Morphological Analyzer For St. Lawrence Island Yupik From A Finite-State Transducer, Lane Schwartz, Emily Chen, Benjamin Hunt, Sylvia LR Schreiner 2019 University of Illinois at Urbana-Champaign

Bootstrapping A Neural Morphological Analyzer For St. Lawrence Island Yupik From A Finite-State Transducer, Lane Schwartz, Emily Chen, Benjamin Hunt, Sylvia Lr Schreiner

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Future Directions In Technological Support For Language Documentation, Daan van Esch, Ben Foley, Nay San 2019 Google

Future Directions In Technological Support For Language Documentation, Daan Van Esch, Ben Foley, Nay San

Proceedings of the Workshop on Computational Methods for Endangered Languages

To reduce the annotation burden placed on linguistic fieldworkers, freeing up time for deeper linguistic analysis and descriptive work, the language documentation community has been working with machine learning researchers to investigate what assistive role technology can play, with promising early results. This paper describes a number of potential follow-up technical projects that we believe would be worthwhile and straightforward to do. We provide examples of the annotation tasks for computer scientists; descriptions of the technological challenges involved and the estimated level of complexity; and pointers to relevant literature. We hope providing a clear overview of what the needs are ...


Handling Cross-Cutting Properties In Automatic Inference Of Lexical Classes: A Case Study Of Chintang, Olga Zamaraeva, Kristen Howell, Emily M. Bender 2019 University of Washington

Handling Cross-Cutting Properties In Automatic Inference Of Lexical Classes: A Case Study Of Chintang, Olga Zamaraeva, Kristen Howell, Emily M. Bender

Proceedings of the Workshop on Computational Methods for Endangered Languages

In the context of the ongoing AGGREGATION project concerned with inferring grammars from interlinear glossed text, we explore the integration of morphological patterns extracted from IGT data with inferred syntactic properties in the context of creating implemented linguistic grammars. We present a case study of Chintang, in which we put emphasis on evaluating the accuracy of these predictions by using them to generate a grammar and parse running text. Our coverage over the corpus is low because the lexicon produced by our system only includes intransitive and transitive verbs and nouns, but it outperforms an expert-built, oracle grammar of similar ...


Towards A General-Purpose Linguistic Annotation Backend, Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang 2019 Carnegie Mellon University

Towards A General-Purpose Linguistic Annotation Backend, Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Bootstrapping A Neural Morphological Generator From Morphological Analyzer Output For Inuktitut, Jeffrey Micher 2019 US Army Research Laboratory

Bootstrapping A Neural Morphological Generator From Morphological Analyzer Output For Inuktitut, Jeffrey Micher

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Finding Sami Cognates With A Character-Based Nmt Approach, Mika Hämäläinen, Jack Rueter 2019 University of Helsinki

Finding Sami Cognates With A Character-Based Nmt Approach, Mika HäMäLäInen, Jack Rueter

Proceedings of the Workshop on Computational Methods for Endangered Languages

We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages.


Seeing More Than Whitespace — Tokenisation And Disambiguation In A North SáMi Grammar Checker, Linda Wiechetek, Kevin B. Unhammer, Sjur N. Moshagen 2019 UiT The Arctic University of Norway

Seeing More Than Whitespace — Tokenisation And Disambiguation In A North SáMi Grammar Checker, Linda Wiechetek, Kevin B. Unhammer, Sjur N. Moshagen

Proceedings of the Workshop on Computational Methods for Endangered Languages

Communities of lesser resourced languages like North Sámi benefit from language tools such as spell checkers and grammar checkers to improve literacy. Accurate error feedback is dependent on well-tokenised input, but traditional tokenisation as shallow preprocessing is inadequate to solve the challenges of real-world language usage. We present an alternative where tokenisation remains ambiguous until we have linguistic context information available. This lets us accurately detect sentence boundaries, multiwords and compound error detection. We describe a North Sámi grammar checker with such a tokenisation system, and show the results of its evaluation.


Improving Low-Resource Morphological Learning With Intermediate Forms From Finite State Transducers, Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden 2019 University of Colorado

Improving Low-Resource Morphological Learning With Intermediate Forms From Finite State Transducers, Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden

Proceedings of the Workshop on Computational Methods for Endangered Languages

Neural encoder-decoder models are usually applied to morphology learning as an end-to-end process without considering the underlying phonological representations that linguists posit as abstract forms before morphophonological rules are applied. Finite State Transducers for morphology, on the other hand, are developed to contain these underlying forms as an intermediate representation. This paper shows that training a bidirectional two-step encoder-decoder model of Arapaho verbs to learn two separate mappings between tags and abstract morphemes and morphemes and surface allomorphs improves results when training data is limited to 10,000 to 30,000 examples of inflected word forms.


Using Computational Approaches To Integrate Endangered Language Legacy Data Into Documentation Corpora: Past Experiences And Challenges Ahead, Rogier Blokland, Niko Partanen, Michael Rießler, Joshua Wilbur 2019 Uppsala University

Using Computational Approaches To Integrate Endangered Language Legacy Data Into Documentation Corpora: Past Experiences And Challenges Ahead, Rogier Blokland, Niko Partanen, Michael Rießler, Joshua Wilbur

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


A Preliminary Plains Cree Speech Synthesizer, Atticus Harrigan, Timothy Mills, Antti Arppe 2019 University of Alberta

A Preliminary Plains Cree Speech Synthesizer, Atticus Harrigan, Timothy Mills, Antti Arppe

Proceedings of the Workshop on Computational Methods for Endangered Languages

This paper discusses the development and evaluation of a Speech Synthesizer for Plains Cree, an Algonquian language of North America. Synthesis is achieved using Simple4All and evaluation was performed using a modified Cluster Identification, Semantically Unpredictable Sentence, and a basic dichotomized judgment task. Resulting synthesis was not well received; however, observations regarding the process of speech synthesis evaluation in North American indigenous communities were made: chiefly, that tolerance for variation is often much lower in these communities than for majority languages. The evaluator did not recognize grammatically consistent but semantically nonsense strings as licit language. As a result, monosyllabic clusters ...


A Biscriptual Morphological Transducer For Crimean Tatar, Francis M. Tyers, Jonathan N. Washington, Darya Kavitskaya, Memduh Gökırmak, Nick Howell, Remziye Berberova 2019 Indiana University

A Biscriptual Morphological Transducer For Crimean Tatar, Francis M. Tyers, Jonathan N. Washington, Darya Kavitskaya, Memduh GöKırmak, Nick Howell, Remziye Berberova

Proceedings of the Workshop on Computational Methods for Endangered Languages

This paper describes a weighted finite-state morphological transducer for Crimean Tatar able to analyse and generate in both Latin and Cyrillic orthographies. This transducer was developed by a team including a community member and language expert, a field linguist who works with the community, a Turkologist with computational linguistics expertise, and an experienced computational linguist with Turkic expertise.

Dealing with two orthographic systems in the same transducer is challenging as they employ different strategies to deal with the spelling of loan words and encode the full range of the language's phonemes and their interaction. We develop the core transducer ...


An Online Platform For Community-Based Language Description And Documentation, Rebecca Everson, Wolf Honoré, Scott Grimm 2019 Independent

An Online Platform For Community-Based Language Description And Documentation, Rebecca Everson, Wolf Honoré, Scott Grimm

Proceedings of the Workshop on Computational Methods for Endangered Languages

We present two pieces of interlocking technology in development to facilitate community-based, collaborative language description and documentation: (i) a mobile app where speakers submit text, voice recordings and/or videos, and (ii) a community language portal that organizes submitted data and provides question/answer boards whereby community members can evaluate/supplement submissions.


A Software-Driven Workflow For The Reuse Of Language Documentation Data In Typological Studies, Stephan Druskat, Kilu von Prince 2019 Humboldt-Universität zu Berlin

A Software-Driven Workflow For The Reuse Of Language Documentation Data In Typological Studies, Stephan Druskat, Kilu Von Prince

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Digital Commons powered by bepress