Fall 2024
INF 385T Special Topics in Information Science: Natural Language Processing and Applications
DESCRIPTION
Natural Language Processing (NLP) is concerned with interactions between computers and humans through the medium of human languages. It involves analyzing, understanding, and generating human language, making it possible for machines to interpret and respond to human speech and text. NLP is currently making significant contributions to modern technological advancements and serves as the backbone of crucial applications such as Gen AI, Conversational AI, Question Answering, Human Language Translation, Summarization, Sentiment and Emotion Analysis, Search and Recommendation, and Information Extraction in various domains such as healthcare, finance, legal, libraries and education and beyond. The proposed graduate-level course aims to cover fundamental concepts in Natural Language Processing / Computational Linguistics and how they are used to solve real-world problems. Classes in each week will be divided into two segments: (a) Theory and Methods, a concise description of an NLP concept, and (b) Practicum, a hands-on session on applying the theory to a real-world task on publicly available multilingual text datasets. We will use Python for programming along with popular libraries for text processing such as NLTK, SpaCy and HuggingFace's transformers. By the end of the course, the goals for the students are to: 1. Understand the process of garnering and pre-processing a large amount of multilingual textual data from various domains and sources. Characterize the processes to store, load, pre-process multilingual data and apply language processing operations such as normalization, tokenization, lemmatization, chunking and machine readable representation (vector) extraction. 2. Train machine learning algorithms for natural language understanding and generation and evaluate their performance. 3. Learn to extract information from unstructured text and represent them in the form of knowledge graphs 4. Learn to use existing knowledge graphs, ontologies and lexical knowledge networks for predictive analysis on text 5. Learn about popular NLP applications and tasks and the process of building such applications 6. Propose a novel product/research-focused idea (this will be an iterative process), design and execute experiments, and present the findings and demos to a suitable audience (in this case, the class).
COURSE NOTES
Natural Language Processing (NLP) is concerned with interactions between computers and humans through the medium of human languages. It involves analyzing, understanding, and generating human language, making it possible for machines to interpret and respond to human speech and text. NLP is currently making significant contributions to modern technological advancements and serves as the backbone of crucial applications such as Gen AI, Conversational AI, Question Answering, Human Language Translation, Summarization, Sentiment and Emotion Analysis, Search and Recommendation, and Information Extraction in various domains such as healthcare, finance, legal, libraries and education and beyond. The proposed graduate-level course aims to cover fundamental concepts in Natural Language Processing / Computational Linguistics and how they are used to solve real-world problems. Classes in each week will be divided into two segments: (a) Theory and Methods, a concise description of an NLP concept, and (b) Practicum, a hands-on session on applying the theory to a real-world task on publicly available multilingual text datasets. We will use Python for programming along with popular libraries for text processing such as NLTK, SpaCy and HuggingFace's transformers. By the end of the course, the goals for the students are to: 1. Understand the process of garnering and pre-processing a large amount of multilingual textual data from various domains and sources. Characterize the processes to store, load, pre-process multilingual data and apply language processing operations such as normalization, tokenization, lemmatization, chunking and machine readable representation (vector) extraction. 2. Train machine learning algorithms for natural language understanding and generation and evaluate their performance. 3. Learn to extract information from unstructured text and represent them in the form of knowledge graphs 4. Learn to use existing knowledge graphs, ontologies and lexical knowledge networks for predictive analysis on text 5. Learn about popular NLP applications and tasks and the process of building such applications 6. Propose a novel product/research-focused idea (this will be an iterative process), design and execute experiments, and present the findings and demos to a suitable audience (in this case, the class).
PREREQUISITES
Graduate standing.
RESTRICTIONS
Restricted to graduate students in the School of Information through registration periods 1 and 2. Outside students will be permitted to join our waitlists beginning with registration period 3.