Hyunjin-Dominique Cho

Hyunjin-Dominique Cho
University of Waterloo | UWaterloo · Department of Statistics and Actuarial Science

Questions

Questions (5)
Question
Hello,
I believe the "sentence processing" is a topic discussed in Psycholinguistics (I am not a Linguist, so please bear with me) .
In Psycholinguistics, what are the general steps in how a sentence is processed by human?
For example, from what I gather from google search, the general procedures in human sentence processing seem to be in the following order:
1. Syntactic analysis of a sentence
2. Shallow semantic processing of the sentence
3. Deep (?) semantic processing of the sentence
....
Is there any paper that talks about such procedures?
Thank you,
Question
Hello,
If I am training my Neural Network with training data comprised of 908 batches, as the rule of thumb, would 20 epochs for training be sufficient?
If 20 epochs are not sufficient, does this mean that I will have to re-train my neural network before performing any analysis with it?
Thanks,
Question
Hello,
If I am going to submit my work to a CS conference, and also make my Python code available publicly, is it absolutely necessary that I set seed in my Python code?
I ran an experiment, but I realized that I forgot to set seed. Will my publication be rejected at the conference because the seed is not set?
Thank you....
Question
Hello,
I am interested in processing the ARC dataset (http://nlpprogress.com/english/question_answering.html) with the GPT2 double heads model neural network. The dataset (tab delimited) is structured as below:
```
Question <tab> Answer
Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? (A) worldwide disease (B) global mountain building (C) rise of mammals that preyed upon plants and animals (D) impact of an asteroid created dust that blocked the sunlight. <tab> D
```
I know that I am supposed to tokenize the dataset before passing it into GPT2 double heads model for doing NLP.How should I tokenize this data? More specifically,
  1. should I add a special token before each character that denotes for multiple choice options (A), (B), (C) and (D)?
  2. should I add special token before each string that denotes for the contents of the multiple choice options?
  3. Am I supposed to add the tokens "<bos>" and "<eos>" at the beginning and at the end of each question statement?
  4. If I am to pass this data into a GPT2 Double Heads Model (The GPT2 model with two heads) for processing multiple choice questions, what should I do with the part that denotes for an actual answer to the multiple choice question?
So for instance, to generate an input sequence for the GPT2 double heads model, should I break up the original question statement into 4 sequences, 1 for each multiple choice option, and apply the tokenization to each of the 4 sequences as below?:
```
<bos> Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? <spec_token1> (A) <spec_token2> worldwide disease <eos>
<bos> Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? <spec_token1> (B) <spec_token2> global mountain building <eos>
<bos> Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? <spec_token1> (C) <spec_token2> rise of mammals that preyed upon plants and animals <eos>
<bos> Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? <spec_token1> (D) <spec_token2> impact of an asteroid created dust that blocked the sunlight. <eos>
```
Thank you,
PS: I found this site https://medium.com/huggingface/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313 and it seem to address some of the questions I have, but still this is not a complete help.
Question
Hello,
When we take a look at English test prep books that are specially designed for ESL learners, we often see in their title that the book is for "learners at elementary level", "learners at intermediate level", and "learners at advanced level".
When we compare the English of a child native English speaker and that of an adult native English speaker, we see that the adult native speaker's English is more "advanced" compare to the child. When we take a look at K-12 English Language Arts curriculums, we can say that the curriculum progresses from "elementary English" to "advanced English" as the grade level goes up.
But to me all these are very vague ideas. What exactly are the characteristics of "elementary English" and "Advanced English"? What exactly are the differences between "elementary English" vs. "advanced English"? How are the terms "elementary English" and "advanced English" defined?
I don't have much background in Linguistics, so if my question could sound a bit funny, but thank you for your understanding.