Invited speakers

Distinguished lecture for the 30th anniversary of the ICGI conference

Title: Learning of Regular Languages by Recurrent Neural Networks? (Mainly Questions)
Abstract: Recurrent neural network architectures were introduced over 30 years ago. From the start attention focused on their performance at learning regular languages using some variant of gradient descent. This talk reviews some of the history of that research, includes some empirical observations, and emphasizes questions to which we still seek answers.

Cyril Allauzen (Google, New York)
Cyril Allauzen’s main research interests are in finite-state methods and their applications to text, speech, and natural language processing and machine learning. Before joining Google, he worked as a researcher at AT&T Labs Research and at NYU’s Courant Institute of Mathematical Sciences.
Cyril is an author of the OpenFst Library, the OpenKernel Library and the GRM Library.
- Talk: Weighted Finite Automata with Failure Transitions: Algorithms and Applications
  Abstract: Weighted finite automata (WFA) are used in many applications including speech recognition, speech synthesis, machine translation, computational biology, image processing, and optical character recognition. Such applications often have strict time and memory requirements, so efficient representations and algorithms are paramount. We examine one useful technique, the use of failure transitions, to represent automata compactly. A failure transition is taken only when no immediate match to the input is possible at a given state. Automata with failure transitions, initially introduced for string matching problems, have found wider use including compactly representing language, pronunciation, transliteration and semantic models.
  In this talk, we will address the extension of several weighted finite automata algorithms to automata with failure transitions (ϕ-WFAs). Efficient algorithms to intersect two ϕ-WFAs, to remove failure transitions, to trim, and to compute the shortest distance in a ϕ-WFA will be presented.
  
  We will demonstrate the application of some of these algorithms on two language modeling tasks: the distillation of arbitrary probabilistic models as weighted finite automata with failure transitions and the federated learning of n-gram language models. We will show the relevance of these methods to the privacy-preserving training of language models for virtual keyboard applications for mobile devices.
  
  This talk covers work in collaboration with Michael Riley, Ananda Theertha Suresh, Brian Roark, Vlad Schogol, Mingqing Chen, Rajiv Mathews, Adeline Wong, and Françoise Beaufays.

Ahmed Elnaggar (Technische Universität München, Germany)
Ahmed Elnaggar performs research on the applications of self-supervised deep learning and language models using high-performance computing in several domains, including NLP, Biology, software engineering, and speech.
Ahmed has recently contributed to the inference of major protein language models [1,2,3].
[1] Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling. Elnaggar A, Essam H, Salah-Eldin W, Moustafa W, Elkerdawy M, Rochereau C, Rost B.arXiv (2023)
[2] ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing. Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Yu W, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, et al. IEEE Trans Pattern Anal Mach Intell. (2021)
[3] Modeling aspects of the language of life through transfer-learning protein sequences. Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. BMC Bioinformatics. (2019)

- Talk: A journey into the Generative AI and large language models: From NLP to BioInformatics
  Abstract: In the last year, the generative AI field has seen a remarkable breakthrough, specifically the generative ai models and their applications in the natural language processing domain. It has achieved new state-of-the-art results on all public datasets and super human-level chatting capabilities. The backbone of this breakthrough is the large language models, including OpenAI GPT and Google Palm. The advantages of these large language models are that they can effectively capture the semantic, syntactic, grammar, and meaning of characters, words, and sentences from large unlabelled datasets using self-supervised learning. Later it can be used to represent sentences and documents better through embedding or as a zero/multi-shot learning method for many NLP tasks. Fortunately, these models have started to be leveraged in other fields like bioinformatics and biochemistry. This talk will give an overview of the large language models and how it was applied in the Bioinformatics field to boost the performance on many use cases. Furthermore, it will show how high-performance computing and optimized deep-learning software and libraries allowed these models to be faster and more efficient during training and inference.

Will Merrill (New York University, USA)

William Merrill’s main research interests are in the formal theory of language and computation, and its applications in NLP and linguistics. Much of Will’s research has focused on the expressive power of neural network architectures in terms of formal language theory, in addition to analyzing the abilities of language models to learn semantics.

- Tutorial: Formal languages and neural models for learning on sequences
  Abstract: The empirical success of deep learning in NLP and related fields motivates understanding the model of grammar implicit within neural networks on a theoretical level. In this tutorial, I will overview recent empirical and theoretical insights on the power of neural networks as formal language recognizers. We will cover the classical proof that infinite-precision RNNs are Turing-complete, formal analysis and experiments comparing the relative power of different finite-precision RNN architectures, and recent work characterizing transformers as language recognizers using circuits and logic. We may also cover applications of this work, including the extraction of discrete models from neural networks. Hopefully, the tutorial will synthesize different analysis frameworks and findings about neural networks into a coherent narrative, and provide a call to action for the ICGI community to engage with exciting open questions.