Questions related to Natural Language Processing
I am working on an NLP classification project using BERT and want to create my own dataset from books/websites/ ... etc. and i need to see some real example on how to create it. any support/help is welcomed.
Hello there, I am in the search for datasets of software's requirements and their use cases, in hope to be able to gather datasets of use case for the requirements to train a ML model for a research we're working on. Would anyone know any source to find such datasets ?
The ChatGPT-3 (Generative Pretrained Transformer 3) is the third iteration of OpenAI’s popular language model. It was released in 2020 and is considered one of the most advanced large language models (LLM). It is being trained to retrieve massive amounts of text data from the Internet, making it capable of generating human-like text and performing various Natural Language Processing (NLP) tasks such as text completion, summarization, translation, and more. Whereas ChatGPT-3 is a conversational AI language model based on OpenAI’s ChatGPT-3 model and recently released on November 30, 2022. NLT-based ChatGPT-3 has been widely used in various industries, including health and medical sciences
Activation functions play a crucial role in the success of deep neural networks, particularly in natural language processing (NLP) tasks. In recent years, the Swish-Gated Linear Unit (SwiGLU) activation function has gained popularity among researchers due to its ability to effectively capture complex relationships between input features and output variables. In this blog post, we'll delve into the technical aspects of SwiGLU, discuss its advantages over traditional activation functions, and demonstrate its application in large language models.
If each NLP task has an accuracy of 90%, after integrating them into the large language model, the accuracy of each NLP task becomes 85%, right?
For example, if the accuracy of each NLP task is 90%, after integrating them into a large language model, the accuracy of each NLP task becomes 85%.
I am keen to know how NLP, a subfield of AI, can be used to improve customer service in the field of Supply Chain Management. Are there examples of its use in customer interaction, complaint management, or understanding customer sentiment? What models or techniques are commonly used in these applications?
The more benefit of large language model is its big capability, not benefit of few-shot learning ability?
I'm looking for opportunities in research Assistance or any kind of involvement in research in the fields of Machine Learning, Deep Learning, or NLP. I am eager to contribute my efforts and dedication to research endeavors. Please let me know if you have any openings for this kind of work.
I am currently working as a sustainability data scientist, and I'm intending to conduct independent research at the intersection of climate change and machine learning. I am highly proficient in data analysis, visualization, time series forecasting, supervised machine learning and natural language processing. Furthermore, I have substantial knowledge in the domains of climate change, biodiversity and sustainability in general. Here are a few examples of my past work:
Forecasting Atmospheric CO2 Concentration: https://towardsdatascience.com/forecasting-atmospheric-co2-concentration-with-python-c4a99e4cf142
Visualizing Climate Change Data:
Statistical Hypothesis Testing with Python:
Simplifying Machine Learning with PyCaret book:
Currently, I want to apply topic modeling on a dataset of news articles about climate change. This will help us extract insights about the ways this subject is presented in media that affect the opinions of countless people globally. My original intention was to focus on greek news websites, therefore I created a dataset for this purpose. Still, we can decide on a different scope for the project, and analyze news articles from other countries. There are numerous free datasets available, and we can also consider utilizing an API to create more. In case you are interested in collaborating, I encourage you to leave a comment or message me. Thanks you for taking the time to read this post!
I want someone to do research collaboratively in the area of computer vision or natural language processing. Interested ones please get in touch with me.
How long can it take for GAN (Generative adversarial networks), given its current state of research with regard to its state of the art development, to yield more efficient results in terms of NLP performances, while the major advantages of NLP may be improved by quantum computing?
Advances in Natural Language Processing has shown that research questionnaires can handle by CHATGPT4.
Where should results from CHATGPT4! Primary source or Secondary source?
I created my own huge dataset from different sites and labeled it on some NLP task. How can i publish it in form of Paper or article and where?
This topic has generated a lot discussion on the ethical implications of using language models like ChatGPT in academic settings. It drives us to consider potential biases, accuracy issues, and professionalism in academia while employing such technology. Furthermore, it encourages the investigation of alternate ways or complementary approaches that can improve academic success while resolving concerns about the incorporation of ChatGPT.
By considering the use of ChatGPT as a catalyst, and given the controversy surrounding their role, what are the potential benefits and drawbacks of introducing ChatGPT or similar language models into the academic product creation process? and does it assist the academic researcher in producing an efficient and engaging academic output, or does it cause the researcher to lose their ability to communicate ideas clearly and concisely and conveying arguments in a logical and convincing manner?
The BERT is described in the paper 《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》.
The RoBERTa is described in the paper 《RoBERTa: A Robustly Optimized BERT Pretraining Approach》.
Now 3 years past. Are there any pretrained-language-model that surpass them in most of the task? (Under the same or nearby resources)
Speedup without accuracy decreasing is also considered as a better one.
If there are available journals that published natural language processing please can you list them with thier impact factor.
I'm writing a systematic review article on Natural Language Processing (NLP) and planning to submit the paper to a Q1 journal. Would you please recommend a list of free Q1 journals from where I will receive a fast decision?
I have a collection of sentences that is in an incorrect order. The system should output the correct order of the sentences. What would be the appropriate approach to this problem? Is it a good approach to embed each sentence into a vector and classify the sentence using multiclass classification (assuming the length of the collection is fixed)?
Please let me know if there can be other approaches.
We collected the [good]/[bad] feedback from the web page.
Then we remove the [bad] feedback data.
Then we only use the [good] feedback data to train the text-generation policy-model.
The [good] feedback data is merged into the origin dataset of policy-model.
I'm pursuing Master's in Computer Science. My area of expertise is software development (web full stack).
I am new to research domain and processes. I'm trying to search for a research topic but having a hard time narrowing down to research gap.
I'm exploring following areas in NLP
- Dialogue and Conversational agents
- Knowledge graphs
- Low resource & domain adaptation by transfer learning
- Multi-lingual NLP
Apart from NLP I'm looking for
1. Microservices architecture patterns
2. Component based software frameworks
All I'm finding online are survey papers. Am I using too generic keywords? Can someone help in how to narrow down to a topic from domain/ research area?
Thanks in advance!
The development of AI models such as ChatGPT and what will happen later in the field of epidemiology.
The following is the answer from ChatGPT：
As an AI language model, I can assist you in various ways in the field of epidemiology. Here are a few examples:
- Data analysis: I can help you analyze large sets of data to identify trends, patterns, and correlations that can help you understand the spread of diseases and the factors that influence it. I can also help you identify risk factors, track outbreaks, and monitor the effectiveness of public health interventions.
- Modeling and simulation: I can assist you in building models and simulations that can help you predict the spread of diseases and evaluate the impact of different interventions. This can help you identify the most effective strategies for controlling outbreaks and preventing the spread of diseases.
- Literature review: I can help you perform a comprehensive review of the existing literature on a particular disease or public health issue. This can help you identify knowledge gaps, understand the current state of research, and develop new research questions.
- Natural language processing: I can assist you in extracting meaningful insights from unstructured data sources such as social media, news articles, and medical records. This can help you monitor public sentiment, identify emerging trends, and track disease outbreaks in real time.
- Chatbot: I can be programmed to answer questions related to epidemiology and public health. This can help you provide accurate information to the public and address their concerns and questions in a timely manner.
Please let me know if there's anything specific I can help you with.
What would you use Natural Language Processing techniques for if you had access to an enormous digital library of data on "democracy" & closely related concepts?
This question comes from Agustin Goenaga's essay here: https://theloop.ecpr.eu/what-democracy-should-be-for-us/
I am currently working on a project, part of which is for presentation at JK30 this year in March hosted at SFU, and I have been extensively searching for a part of speech (POS) segmenter/tagger capable of handling Korean text.
The one I currently have access to and could make execute is relatively outdated and requires many modifications to execute runs on the data.
I do not have a strong background in Python and have zero background in Java and my operating system is Windows.
I wonder if anyone may be able to recommend how may be the best way to go about segmenting Korean text data so that I can examine collocates with the aim of determining semantic prosody, and/or point me in the direction of a suitable program/software.
Researchers may feel hard to follow the latest result on their area. There may be several journals relevant to their research, and it’s impossible to read every paper.
Can we use natural language understanding method to help us read papers, and select those most likely to help us like chatgpt? Do we have such app?
In other way, why don’t we build something like academic Tiktok? Tiktok is perfect for researchers, because it clearly know our interests, and we can see others’ attitude towards a certain paper via comments.
Reinforcement-Learning-On-NLP means that using reward to update model.
Re-Label-That-Data means using reward to label-again the related data and then re-train.
Deep learning has made major advances across multiple domains such as image recognition, speech recognition, natural language processing, and many more
A. Recurrent Neural Networks
B. Convolutional Neural Networks
Although the experiments did not show promising results, it still gave some insights to how color space and SPP impacts the results of CNN based MDS.
What is the main difference between LSTM and transformer architectures in natural language processing tasks, and which one is generally considered to be the best?
Gone through number of papers but didn't the got any working solution .Looking for free open source solution /Approach .Not wanted to buy any third party solution API.
NLP ,natural language processing ,BERT,LSTM,Spacy
Knowledge graph has made impressive progress and is an important resource in artificial intelligence domain. I am researching on knowledge graph embedding, which is representing the entities and relations in the knowledge graph as vectors. Now, I want to introduce knowledge map as a resource to other natural language processing tasks. What interesting areas do you think I can try? Such as text semantic matching, text classification, and so on.
I am I studied finance in my masters and worked in financial institutions. I have worked in automation of risk and compliance. Currently planning to have a PhD connected to Artificial Intelligence. Based on reading some articles online, I have come up with a list of PhD topics.
Could you please help me find which one is best form this list? Or any other new idea is also welcome. Thank you
- Cost Benefit analysis of Implementing AI in GRC (Governance, Risk, and compliance) of Financial Institutions
- ROI of Implementing AI in GRC
- Application of AI in Automation, Data Validation, Cleansing
- Application of Natural Language Processing in GRC for Categorization and Mapping
- Approach to implement AI, whole Transformation vs Hybrid adoption
- Benefits and Challenges for early adopters Financial Institutions of AI
- Role of AI in reducing behavioral biases in Risk Management
- Ai based Entrepreneurship and Innovation
- AI in Risk management of Hedge Funds
I'm looking for tools that can help me parse sentences into clauses, and then clauses into groups and phrases from a Systemic Functional Linguistics perspective. I have found lots of NLP tools online, but they seem to only parse sentences into parts of speech, not clauses or groups/phrases. Any suggestions would be greatly appreciated!
I have just started my doctoral in NLP Domain. As I can see there are a lot of papers in this research area, what I realized while doing a literature review is that good publication, or any publication are complicated and it may take me some years to have something which can be delivered. Thus What is something else I can do apart from publications that might give positive weightage as an academician career-wise, a biweekly newsletter, or some technical paper that goes into some recent outstanding paper in my field
I am working on research proposal named " Invoice Automation with NLP" but I am totally confused how to keep going on it? and most important is this topic is good to research ?
Highly appreciated your ideas and comments or recommends please.
I am new to chatbot development and NLP. I wanted to know if it is possible to use extractive text summarization algorithms in the rasa chatbot development framework.
Thank you in advance
Has any NLP based deep learning model been able to beat OpenAI's GPT-3 when it comes to machine translation and text summarization ?
Could you please give your ideas and share resources about how document verification may be achieved using semantic analysis? Is there any tool or technique? Suggestions including simple and easy techniques would be great. Thanks.
I will be more than happy to have your suggestions, I am trying to understand the current challenges in clinical NLP, the articles I found do not mention the main challenges, I found some general challenges like de-identification, abbreviation etc.
Looking of to learn the basics to advances of NLP . What are the good resources , university courses to learn NLP . As i am looking on youtube threre are lot of information is availble but not able to differenciate.
Please suggest some of the good NLP course.
Could you recommend courses, papers, books or websites about wav audio preprocessing?
Thank you for your attention and valuable support.
Looking at a lot of methods, I want to use the degree of confusion to compare the results of model generation and human results. I plan to use the LSTM model to do the training set with human text and the test set with different machine text. Compare the perplexity in the tests. The Internet says this thing is to evaluate the quality of language models, I want to compare the difference between man and machine. I don't know if this will work.
I would like to set up High-End System for NLP models training with huge corpus. To train models for TTS, STT, and translation. What is the best specification for setting up such an environment. Please recommend system specs.
Pre-training big model was widely used. It takes many new thought.
In the field of architecture, how to catch the trend?
In NLP, we have make an experiment in text generation. Like generating abstract. But what is its application?
Hi, I have been working on some Natural Language Processing research, and my dataset has several duplicate records. I wonder should I delete those duplicate records to increase the performance of the algorithms on test data?
I'm not sure whether duplication has a positive or negative impact on test or train data. I found some controversial answers online regarding this, which make me confused!
For reference, I'm using ML algorithms such as Decision Tree, KNN, Random Forest, Logistic Regression, MNB etc. On the other hand, DL algorithms such as CNN and RNN.
I am trying to make generalizations about which layers to freeze. I know that I must freeze feature extraction layers but some feature extraction layers should not be frozen (for example in transformer architecture encoder part and multi-head attention part of the decoder(which are feature extraction layers) should not be frozen). Which layers I should call “feature extraction layer” in that sense? What kind of “feature extraction” layers should I freeze?
Hello, I am interested converting word numerals to numbers task, e.g
- 'twenty two' -> 22
- 'hundred five fifteen eleven' -> 105 1511 etc.
And the problem I can't understand at all currently is for a number 1234567890 there are many ways we can write this number in words:
=> 12-34-56-78-90 is 'twelve thirty four fifty six seventy eight ninety'
=> 12-34-576-890 is 'twelve thirty four five hundred seventy six eight hundred ninety'
=> 123-456-78-90 is '(one)hundred twenty three four hundred fifty six seventy eight ninety'
=> 12-345-768-90 is 'twelve three hundred forty five seven hundred sixty eight ninety'
and so on (Here I'm using dash for indicating that 1234567890 is said in a few parts).
Hence, all of the above words should be converted into 1234567890.
I am reading following papers in the hopes of tackling this task:
But so far I still can't understand how would one go about solving this task.
I know some basic approaches that can be used on languages with rich morphology.
3. Character n-grams
4. FastText embeddings
I would like to know if there any more recent development and what the researchers feel about the robustness of each method in specific domains (Indic languages etc.)
I have set of tags per document, and want to create a tree structure of the tags, for example:
- The_C_Programming_Language_(2nd Edition),
I need to generate a hierarchy as per the attached example image.
Are there Free taxonomy/ontologies which can give Parent words? like
get_parent_word( "Student", "Instructor") = 'People'
get_parent_word("The_C_Programming_Language_(2nd Edition)", "Head_First_Java") = "Book"
is_correct_parent(parent: "Student", child: "Student_profile") = True
I have a corpus of English as well as Technical documents and use Python as the main language. I am exploring WordNet and SUMO Ontology currently, if anyone has used them previously for a similar task or if you know something better I would really appreciate your guidance on this.
I have been investigating some research topics about code-mixing for some down-stream tasks in NLP. More excatly, it is a bit hard to find some code-mixed corpus for cross-lingual sentence-retrieval task.
I'm looking for datasets containing coherent sets of tweets related to Covid-19 (for example, collected within a certain time period according to certain keywords or hashtags), containing labels according to the fact they contain fake/real news, or according to they fact they contain pro-vax / anti-vax information. Possibly, the dataset I'm looking for would also contain a column showing the textual content of each tweet, a row showing the date, and columns showing 1)The username /id of the autohor; 2)The username/id of the people who retweeted the tweet.
Do you know any dataset with these features?
Greetings, I am very enthusiastic about Natural Language Processing. I have some experience with Machine learning, Deep learning and Natural Language Processing. Is there anyone who is willing to work in collaboration?
Kindly ping me. Regards and thanks.
I am trying to implement a VQA model in ecommerce, and would love to have a dataset that focuses on fashion (or any ecommerce type of goods). If there isn't an available one, is synthetically generating q&a pairs for a given image a good idea? If so, any idea how to approach such problem ?
I have a data set that contains a text field for approximately more than 3000 records, all of which contain notes from the doctor. I need to extract specific information from all of them, for example, the doctor's final decision and the classification of the patient, so what is the most appropriate way to analyze these texts? should I use information retrieval or information extraction, or the Q and A system will be fine
Is there any AI-related (mainly NLP, Computer Vision, Reinforcement Learning based) journal where I can submit short papers? It should be non-open access.
In which application of Machine Learning ( NLP, Computer Vision, etc ) would we find maximum value with Semi-Supervised Learning and Self-Training ?
I am trying to build a model that can produce speech for any given text?
i could not find any speech cloning algo that can clone the voice based on speech only so I turned to TTS(Text-to-speech) models. I had the following doubts regarding data preparation?
As per LJSpeech dataset which has many 3-10 sec recordings we require around 20 hours of data. It will be very hard for me to build these many 10 sec recordings. What would be the impact if I make many 5 min recordings. One could be high resource req (but how much), are there any others.
Also is there some way through which I could convert these 5 min recordings as per LJSpeech format
Consider there is a record of 100 values ,with different errors in data such as NULL, duplicate values or improper format. Is it possible to cluster those data as per errors and display the reason for it using NLP?
I developed an approach for extracting aspects from reviews for different domains, now I have the aspects. I want some suggestion on how to use these aspects in different applications or tasks such as aspect based recommender system.
Note: Aspect usually refers to a concept that represents a topic of an item in a specific domain, such as price, taste, service, and cleanliness which are relevant aspects for the restaurant domain.