Science topic
Arabic NLP - Science topic
Explore the latest questions and answers in Arabic NLP, and find Arabic NLP experts.
Questions related to Arabic NLP
I created a lecture series for this, please suggest any improvement.
For word segmentation. Thank you very much!
I created my own huge dataset from different sites and labeled it on some NLP task. How can i publish it in form of Paper or article and where?
We are an active Arabic Natural Language processing (NLP) and AI research group doing research in Deep learning, machine learning and social network analysis for Arabic NLP.
We are looking for an RA that can work remotely on a number of NLP/Deep learning/Machine Learning projects, where can we find such candidates?
Responsibilities:
Data cleaning, analysis and visualization using various approaches.
Ability to conduct literature review and summarize them in a coherent way.
Ability to implement different ML/DL approaches using different datasets to serve specific NLP problems.
Ability to fine tuning BERT/AraBert and its different variations to serve specific NLP tasks.
Ability to communicate the experiments and results in clear English language.
Required Minimum Qualifications:
Master/PhD in computer science.
Experience in Python (including numPy, sciPy, pandas, matplotlib)
Excellent working knowledge of Deep learning/Machine Learning.
Experience with word embeddings, BERT, etc.
Ability to clearly communicate technical ideas in English.
Motivated, Independent, self-learner and ability to work with diverse team.
Excellent verbal and written communication skills are required.
Because a human thought is interconnected with a language, what do you think about the Integration of Natural Language Processing (NLP) with Deep Learning (DP)? I think that it is the main way to build General Artificial Intelligence.
What approaches are used in the Integration of NLP with DP? What are trends in this area?
I want an Arabic dataset specially in chatting
Thanks
I am wondering if there is a dataset or online database for clinical reports, electronic health records or discharge summaries written in Arabic.
By mathematical pattern, I mean mathematical pattern in the textual structure of the resulted language
Dear everybody!
I do a hobby project as creating a character-level seq2seq2 LSTM.
In my task, I give a text as an input (max 40 characters) and the LSTM generates an output that rhymes with the input.
I created very large rhyming rows databases.
At the beginnings I trained my model with the next parameters:
batch_size = 200
epochs = 250
latent_dim = 300
num_samples = 10000
with these parameters my model converged to 0.4 after 75 epoch, but i waited all the 250 epoch and tested that model.
The result wan't so bad, but I wanted more.
After that I tried very large batch sizes, with more than 200k training data (almost all possible parameres) and every result leads to overfitting, that means my model threw the same sentence to every input. BUT(!) after I tried the 250 epoch model, I used checkpoint saving and tested only the best model after it didn't converge more. It stops at 0.29 acc usually.
I know the character level lstm in this task has its own limitations, but it would be really 10k training data?
Is it possible the convergence doesn't matter in this case and the model needs only more epochs?
Is the database too big and has a lot of stopwords and I need to do word-frequency-based filtering on the training data?
I know that the word-level method could be more effective, but I'm afraid of I misunderstood something and I don't want to waste more time to wait results from training until I don't know what I'm doing wrong.
What should I do?
Thank you all.
Hi, I am trying to solve the problem of imbalanced dataset using SMOTE in text classification while using TfidfTransformer and K-fold cross validation. I want to solve this problem by using Python code. Actually it takes me over two weeks and I couldn't find any clear and easy way to solve this problem.
Do you have any suggestion where exactly to look?
After implementing SMOTE is it normal to get different results accuracy in the dataset?
Is there any tool or algorithm that find the pattern of a given Arabic words?
For example: Extract the pattern of Arabic word "ملعب", which is "مفعل".
What is the best word embedding evaluation method?
Except TALLIP, I do not know any journals which are specialized in Arabic NLP and Information Retrieval.
Can anyone cite other journals?
hello ,
I'm working on a text to speech research for the Arabic language . One particular component of TTS that i have noticed that is under researched in the Arabic language is G2P ( grapheme to phonemes ) conversion , especially when doing G2P using Neural networks or AI.
in your opinion , why this area ( G2P for Arabic ) is under researched? why there are no ( or little ) papers on using AI and Neural networks for Arabic G2P ? is it not worth working on ? do you think that this is a good idea to research .
thank you
I want to know about the best Arabic named entity tools available and how to use them?
Thanks in advance
I am working in Arabic NLP and I was using Badaro's dataset called ArSenL
I used it in my lexicon dataset to enrich my own dataset but when it was manually inspected I found that it had many misclassified words [words are classified as positive with high confidence while it is very clear it has negative polarity , ex: murder was classified as positive]
so if someone have used it for the same purpose can he tell me if I can rely on it blindly or I will need to manually inspect it
Hi ,
I know GATE library has some support for Non-English ontology such as Arabic. Please, I am wondering if there is another library package for Arabic ontologies?
Arabic plugin
How do I create RDF with GATE? documentation
Thanks
I use this tool TextDirectoryToArff.java on WEKA web site as the link below
but the result as in figure number1.Not recognized by WEKA.
I need each word to have its features only in a single row not multiple row as figure1.
I try coding without any useful>
I need a help in arranging the file to be readable in WEKA.
If any one know a tool or can guide me for a solution.
Thanks
I want to do a very simple job: given a string containing pronouns, I want to resolve them.
For example, I want to turn the sentence "Mary has a little lamb. She is cute." into "Mary has a little lamb. Mary is cute.".
I use jave and Stanford Coreference which a part of Stanford CORENLP. I have managed to write some of the code but I am unable to complete the code and finish the job. Below is some of the code which I have used. Any help and advice will be appreciated.
String file=" Mary has a little lamb. She is cute.";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation(file);
pipeline.annotate(document);
List<CoreLabel> tokens = new ArrayList<CoreLabel>();
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for(CoreMap sentence: sentences)
{
Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
System.out.println(graph);
for (Map.Entry entry : graph.entrySet()) {
CorefChain c = (CorefChain) entry.getValue();
CorefMention cm = c.getRepresentativeMention();
System.out.println(c);
System.out.println(cm);
}
}