Featured research (11)

OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT's capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25\% for zero-shot and few-shot evaluation. For GPT-4 model, a loss for semantic tasks is significantly lower than for ChatGPT. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability to personalize ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool's usefulness to society and how the learning and validation procedures for such systems should be established.
One of the main research questions concerning multi-word expressions (MWEs) is which of them are transparent word combinations created ad hoc and which are multi-word lexical units (MWUs). In this paper, we use selected corpus-linguistic and machine-learning methods to determine which lexicalization criteria guide Polish and English lexicographers in deciding which MWEs (bigrams such as adjective+noun and noun+noun combinations) should be treated as lexical units recorded in dictionaries as MWUs. We analyzed two samples: MWEs extracted from Polish and English monolingual dictionaries, and those created by the annotators, and tested two custom-designed criteria, i.e., intuition and paraphrase, also by using statistical methods (measures of collocational strength: PMI and Jaccard). We revealed that Polish lexicographers have a tendency not to include compositional MWEs as lexical entries in their dictionaries and that the criteria of paraphrase and intuition are important for them: if MWEs are not clearly and unambiguously paraphrasable and compositional, then they are recorded in dictionaries. We found that in contrast to Polish lexicographers English lexicographers tend to record also compositional and partly compositional MWEs.
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT's capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25\% for zero-shot and few-shot evaluation. For GPT-4 model, a loss for semantic tasks is significantly lower than for ChatGPT. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability to personalize ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool's usefulness to society and how the learning and validation procedures for such systems should be established.
We propose and test multiple neuro-symbolic methods for sentiment analysis. They combine deep neural networks – transformers and recurrent neural networks – with external knowledge bases. We show that for simple models, adding information from knowledge bases significantly improves the quality of sentiment prediction in most cases. For medium-sized sets, we obtain significant improvements over state-of-the-art transformer-based models using our proposed methods: Tailored KEPLER and Token Extension. We show that the cases with the improvement belong to the hard-to-learn set.KeywordsNeuro-symbolic sentiment analysisplWordNetKnowledge baseTransformersKEPLERHerBERTBiLSTMPolEmo 2.0
In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additionally, we present an extended evaluation using deep neural multilingual models and language-agnostic regressors on the translation of the original collection into 11 languages.KeywordsNLPText classificationText regressionDeep learningEmotionsValenceArousalMultilingualLanguage-agnostic

Lab head

Maciej Piasecki
Department
  • Department of Artificial Intelligence

Members (14)

Jan Kocoń
  • Wrocław University of Science and Technology
Michał Marcińczuk
  • Wrocław University of Science and Technology
Marek Maziarz
  • Wrocław University of Science and Technology
Ewa Rudnicka
  • Wrocław University of Science and Technology
Marcin Oleksy
  • Wrocław University of Science and Technology
Arkadiusz Janz
  • Wrocław University of Science and Technology
Wiktor Walentynowicz
  • Wrocław University of Science and Technology
Joanna Baran
  • Wrocław University of Science and Technology
Paweł Kędzia
Paweł Kędzia
  • Not confirmed yet