Daniel Benedict

Dacanay

Computational Linguist, MSc

Who am I?

My name is Daniel Benedict Dacanay (/ˈdækənaɪ/, rhymes with ‘smack a guy’)

 

I am a computational linguist specialising in language documentation and revitalisation. Currently, I work as the head linguist and Dictionary Keeper for the Tłı̨chǫ Government in Behchokǫ̀, Northwest Territories, creating, expanding, and maintaining various Tłı̨chǫ language resources, most notably the Tłı̨chǫ Online Dictionary. Prior to this, I was an MSc student in Linguistics at the University of Alberta (under the supervision of Dr. Antti Arppe, with whom I also authored my B.A. Honours thesis in 2022), as well as a researcher at the Alberta Language Technology Lab (ALTLab). 

 

Broadly speaking, my research interests are in developing computational tools and resources for endangered languages, particularly as it relates to lexicography, morphosyntax, and semantics. The bulk of my research has been focused on Plains Cree (nêhiyawêwin), one of Canada’s most widely spoken Indigenous languages, and Dogrib (Tłı̨chǫ Yatıı̀), one of the eleven official languages of the Northwest Territories. However, I have also worked on creating online resources for Woods Cree (nīhithawīwin), Comox (ʔayʔaǰuθəm), and Sango, and have assisted in dictionary creation projects for Sarcee (Tsúùtʼínà), Northern Haida (X̱aat Kíl), and Arapaho (Hinóno’éitíit). You can find more information about specific resources which I have helped develop on my Projects page.

 

Providing comprehensive and accessible documentation for low-resource and endangered languages is a passion of mine at both a personal and professional level, and it is my profound honour to participate in the task of preserving, proliferating, and empowering the continued use of those unparalleled gemstones of human ingenuity and wisdom that are the many languages of our Earth.

Major Projects

Maskwacîs Speech Database

An audio database of spoken words and sentences recorded in Maskwacîs, Alberta.

My Contributions 

I was responsible for manually annotated and extracting the majority of the >150 000 individual audio clips present on the database, as well as for standardising the transcription orthography, providing morphological analyses, regularising the English translations of each entry. Since 2021, I have been conducting multiple weekly meetings with fluent Plains Cree elders to verify the quality of these recordings, to alter and add definitions when necessary, and to elicit novel vocabulary lacking from the initial recording set.

itwêwina

An intelligent online dictionary of Plains Cree

 

My Contributions 

The results of my Honours thesis research on vector semantics ultimately resulted in the implementation of a semantically informed search engine for this dictionary, whereby semantically relate words can be returned as search results for a query even if the query word is not present in the dictionary. In addition to this, I have also expanded the contents of itwewina with the addition of thousands of new entries from various existing Cree dictionaries, and have linked thousands more entries with their respective audio recordings from Speech Database.

navbarlogo1
Tłı̨chǫ Online Dictionary

An intelligent online dictionary of Tłı̨chǫ Yatıı̀

My Contributions

I am the creator of the online dictionary site as it exists now, and am responsible for its management and expansion. Relatedly, I am also the primary contributor of entries to the dictionary (drawing both from existing written sources (many of which I have manually digitised) and my own fieldwork). As the creator of the site, I am also responsible for the implementation of all of the dictionary’s novel features, as well as for much of the language documentation belying those features; this has primarily entailed lengthy elicitation and recording sessions with L1 Tłı̨chǫ speakers.

Publications

Dacanay, D., Harrigan, A., & Arppe, A. (2021). Computational analysis versus human intuition: a critical comparison of vector semantics with manual semantic classification in the context of Plains Cree. In Silfverberg, M., & Desjardins, J.  (eds.), Proceedings of the 4th Workshop on Computational Methods for Endangered Languages, 1, 33-43. doi:10.33011/computel.v1i.971

 

Dacanay, D., Harrigan, A., Wolvengrey, A. & Arppe, A. (2021). The more detail, the better? – Investigating the effects of semantic ontology specificity on vector semantic classification with a Plains Cree / nêhiyawêwin dictionary. In Mager, M., Oncevay, A., Rios, A., Meza Ruiz, I.V., Palmer, A., Neubig, G., Kann, K.  (eds.), Proceedings of the 1st Workshop on NLP for Indigenous Languages of the Americas, 1, 143-52. doi:10.18653/v1/2021.americasnlp-1.15

 

Dacanay, D., Poulin, J., & Arppe, A. (2022). kêtiski-kotahâskwâtam: The effectiveness of various hypernymic levels of WordNet synsets as vector semantic classification categories. In McCaulay, M. & Noodin, M. (eds.), Papers of the Fifty-Third Algonquian Conference (PAC53), 53. 

 

Arppe, A., Poulin, J., Harrigan, A., Schmirler, K., Dacanay, D., & Makinaw, R. (2022). êkosi ê-nêhiyawi-pîkiskwêcik maskwacîsihk – Towards a Spoken Dictionary of Maskwacîs Cree. Presentation conducted at the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-5). 

 

Dacanay, D. (2022). Lexical Semantic Classification in Plains Cree (nêhiyawêwin): Manual and Computational Approaches (Honours thesis, University of Alberta).

 

Dacanay, D. & Schmirler, K. (2022). An Analogy-Based Alternative to “Counter-Intuitive” Grammatical Animacy in Plains Cree/nêhiyawêwin. Presentation conducted at the Fifty-Fourth Algonquian Conference (PAC54).

 

Dacanay, D. & Arppe, A. (2023a, forthcoming). Digitizing, translating, and standardizing Pr. Albert Lacombe’s Dictionnaire de la langue des Cris (1874). In McCaulay, M. & Noodin, M. (eds.), Papers of the Fifty-Fourth Algonquian Conference (PAC54), 54.

 

Poulin, J., Dacanay, D., & Arppe, A. (2023b, forthcoming). Speech Database (Speech-DB) – An on-line platform for recording, storing, validating, and searching spoken language data. In the Proceedings of 1st Workshop on NLP applications to Field Linguistics (Field Matters). 

 

Arppe, A., Neitsch, A., Dacanay, D., Poulin, J., Hieber, D., & Harrigan, A. (2023c, forthcoming) Finding words that aren’t there: Using word embeddings to improve dictionary search for low-resource languages. In Proceedings of the 3rd Workshop on NLP for Indigenous Languages of the Americas

 

Dacanay, D., & Arppe, A. (2024, forthcoming). misi-mîkiwâhp pêsêkinosa ohci – A corpus of miscellaneous Plains Cree texts. In Papers of the Fifty-Fifth Algonquian Conference (PAC55), 55.

 

Side Projects

In addition to my published work, I spend much of my free time on minor linguistics-related side projects (with varying degrees of practical utility). A particular interest of mine is translating works of contemporary fiction into endangered languages. If this interests you, you can read more by the following the link below.

One of the most interesting things to me highly morphologically complex languages, such as, for example, Plains Cree, is the sheer volume …

In early 2022, I had the great privilege of taking part in a field methods course taught by Dr. Jordan Lachler of …

Feel free to contact me with any inquiries via my email, dacanay@ualberta.ca

Contact Information