Science topic

Computational Linguistics - Science topic

Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective.
Questions related to Computational Linguistics
  • asked a question related to Computational Linguistics
Question
3 answers
I'm a student of Applied Linguistics, new to the field but well-versed in conducting research within social sciences and economics. I’m seeking guidance on applying Computational Linguistics to address global issues. Thank you!
Relevant answer
Answer
Hello, Arthur; is there anything particular in computational linguistics you are interested in?
  • asked a question related to Computational Linguistics
Question
2 answers
Hello!
I am currently trying to reproduce the results described in "Inferring psycholinguistic properties of words" (https://www.aclweb.org/anthology/N16-1050.pdf), by implementing the authors' bootstrapping algorithm. However, for some unknown reason I keep getting correlations with the actual ratings in the .4-.6 range, rather than the .8-.9 range. If you have a software implementation that reaches the performance levels from the paper, could you please share it with me?
Many thanks,
Armand
Relevant answer
Answer
هذا مرتبط بتخزين المفاهيم النفسية للكلمات، مثل: نار= عذاب. جنة=تواب، حجر=قسوة
وهكذا تتم المعالجة الآلية الإحصائية لينجح في نهاية المطاف الحاسب الآلي وفق خوارزمية متقنة في كشف التوجهات النفسية للنص المقروء .
  • asked a question related to Computational Linguistics
Question
4 answers
Hello Everyone,
Can anyone guide me to find Corpus/ Training data for readability difficulty of English texts?
Thanks in advance
Udaysimha Nerella
Relevant answer
Answer
Rafal Rzepka Hi there! This link is down, do you know how to access it now? Thank you so much!
  • asked a question related to Computational Linguistics
Question
1 answer
La comunicazione letteraria in digitale una riformulazione del testo in molteplici codici che investe un processo in una pluralità di SSD con specificità di ruoli e di competenze Io Ritamaria Bucciarelli ho inteso perseguire questi ambiti con il supporto scientifico di super -eccellenze per SSD , che hanno giustificato le scelte : Humanae litterae; Fisica Quantistica, matematica , linguistica computazionale , implementazioni .
Il modello di riferimento è: musicologico quantistico: Obiettivo trasferimento dati della tipologia testuale IU ad IA due codici e una moltiplicità di meccanismi linguistici , fonici e ancora grafi da produrre, automi , analisi trasformazionali in ambienti linguistici e ancora automi e infine implementazioni da produrre . Io ci sono riuscita per riprodurre il verso emotivo del testo letterario della Divina commedia in un calcolo quantistico , spiegato nel Piano di Fano ed infine risolto nella schiuma dei quanti . Vi invito a rispondere al mio appello . Per queste competenze io penso che non ci sono referi per il giudizio . Grazie
Relevant answer
Answer
Kuantum fiziğinde hesaplamalı dilbilimleri ve veri analizinde elde edilen bilgiler ile sonuca ulaşılır. Kuantum fiziğinin her durumuda bu geçerlidir. Bilgi ve iletişim araçlarında veya kendi alanında doğru olarak elde edilen verilerle hareket etmek ve bunu uygulamak en uygun metottur. Her zaman geçerli olan durumun uygulanması gerekir.
  • asked a question related to Computational Linguistics
Question
1 answer
I am currently working on a project, part of which is for presentation at JK30 this year in March hosted at SFU, and I have been extensively searching for a part of speech (POS) segmenter/tagger capable of handling Korean text.
The one I currently have access to and could make execute is relatively outdated and requires many modifications to execute runs on the data.
I do not have a strong background in Python and have zero background in Java and my operating system is Windows.
I wonder if anyone may be able to recommend how may be the best way to go about segmenting Korean text data so that I can examine collocates with the aim of determining semantic prosody, and/or point me in the direction of a suitable program/software.
Relevant answer
Answer
Kerry Sluchinski You might try the following user-friendly POS taggers/segmenters for Korean language data:
1. KoNLPy: KoNLPy is a Python module for Korean natural language processing. It features a POS tagger as well as numerous tools for Korean language processing. KoNLPy is straightforward and well-documented.
2. KOMORAN: KOMORAN is a Korean morphological analyzer and POS tagger that is free source. It is available as a command-line utility and as a Java library. For testing reasons, KOMORAN offers a user-friendly online interface.
3. Hannanum is a Korean morphological analyzer and POS tagger. It is a Java library that is built on a dictionary-based approach. Hannanum is simple to use and provides a user-friendly online interface for testing.
4. Korean Parser: Korean Parser is a dependency parser and part-of-speech tagger for Korean. It is written in Python and may be used as either a command-line utility or a Python library. Korean Parser is straightforward and well-documented.
5. Lingua-STS: Lingua-STS is a web-based tool for processing Korean language. It features a POS tagger as well as numerous tools for Korean language processing. Lingua-STS is simple to use and features an intuitive online interface.
These tools are all simple to use and may be used to separate Korean text data and conduct POS tagging.
  • asked a question related to Computational Linguistics
Question
13 answers
Hi everybody,
I would like to do part of speech tagging in an unsupervised manner, what are the potential solutions?
Relevant answer
Answer
  • asked a question related to Computational Linguistics
Question
5 answers
There are high-status conferences such as NeurIPS, ICSE, and ACL. Sometimes they accept more than 1000 papers each year.
On the other hand, there are several Q1 journals (with high impact factors) in each category.
Based on your experience, what would be the pros and cons of each one for you as a researcher? How well they are received when you are applying for a position?
Relevant answer
Answer
Generally, academic institutes prefer Scopus/WOS indexed journals rather than conferences, and even give more credits for that.
  • asked a question related to Computational Linguistics
Question
5 answers
It is common that a source domain is conceptually linked to multiple target domains. According to the neural theory of metaphor, once a concept as a source domain is activated, signals will spread through neural circuits/mappings. This is, multiple target domains should be activated simultaneously. Is this true? Or context plays a moderating role in this process. Any terms or articles to recommend, please?
Can I inhibit the processing of several other mappings by making one mapping more accessible? (accessibility theory).
Relevant answer
Answer
focuses in the domains
  • asked a question related to Computational Linguistics
Question
4 answers
What program is best for the computer-assisted phonetic comparison of dialects? We would like to compare several phonetically quite close dialects of a more or less well-documented language (with the respective protoforms available in case they're required for comparison). The aim of the comparison is to see how close the dialects are to each other and if maybe one stands out against the others, as well as to possibly get input for solving the questions of how the language and / or its speakers spread across the area where the dialects are currently spoken (within the possibilities, of course).
Relevant answer
Answer
Praat could be practically useful for doing various tasks of phonetic analysis by computer.
More info:
Good luck,
  • asked a question related to Computational Linguistics
Question
3 answers
Computational linguistics is the basic future of all languages, considering the electronic processing of anything is the controller in the continuation of production or not, especially since language is no longer only a means of communication, as much as it has become a way of production ... But the Arab world cannot pay attention to it, despite the attempts of research writing and articles, and despite the talk about it in the various media.
Relevant answer
أشكر لسعادتك الاهتمام والرد، وليتنا نتواصل بشأن هذا المجال فلست متخصصا فيه ولكن لدي بعض القراءات حوله، وهو أحد المحاور التي أتبناها في أبحاثي ومقالاتي حول الدراسات الثقافية والوعي العربي...
كل الشكر والتقدير لكم.
  • asked a question related to Computational Linguistics
Question
5 answers
actually I need semantic relations to agent nominals as well.
fx. I need the verb 'grave' (eng: (to) dig) which have semantic relations to 'jord' (eng: dirt) and 'skovl' (eng: showel) and of course alot of other obvious relations.
I need the verbs in order to test how organizational resources (knowledge, money, stuff which is all nominals) can be combined with verbs into tasks fx "grav i jorden med skovlen" (eng: dig into the dirt with the showel)
Relevant answer
Answer
I would suggest using Standford CoreNLP to annotate your texts (corpus) with POS tags, and I believe this computational package can have different scripts for different languages. Then, extract the words with verb tags. Let me know if there is any question.
  • asked a question related to Computational Linguistics
Question
4 answers
Actually there are some popular tools I've been working for so long but I'm interested in a specialized tool for that matter.
Relevant answer
Answer
Definitely possible, but little bit complex...
  • asked a question related to Computational Linguistics
Question
11 answers
I need schizophrenic people's writings dataset for natural language processing, there are some work on social media contents through self disclosure ones, but I want clinical data in English.
Any help will be appreciated.
Relevant answer
Answer
I need it, too.
  • asked a question related to Computational Linguistics
Question
3 answers
I was not able to find a comprehensive survey about this on the Internet. Going through some books I came across semantic nets and conceptual dependency theory but are there any others? Any web resources or survey papers would be most helpful.
Relevant answer
Answer
Pustejovosky's Generative Lexicon is also a model for lexical knowledge representation.
  • asked a question related to Computational Linguistics
Question
3 answers
What are the available benchmarks for evaluating semantic textual similarity approaches?
I am aware of the following:
- SemEval STS
- Microsoft Research Paraphrase Corpus
- Quora Question Pairs
Do you use other that these in your research?
Relevant answer
Answer
Standard datasets include WordSim-353 (see Agirre et al. 2009), SimLex-999 (Hill et al. 2016) and Chiarello et al. (1990). If you're interested in paraphrases including longer phrases, I'd also look at the Penn Paraphrase Database (PPDB, Ganitkevitch et al. 2013):
  • Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, and Aitor Soroa (2009), A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of HLT-NAACL 2009. Boulder, CO, 19–27.
  • Christine Chiarello, Curt Burgess, Lorie Richards, and Alma Pollock (1990), Semantic and associative priming in the cerebral hemispheres: Some words do, some words don’t sometimes, some places. Brain and language 38(1):75–104.
  • Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch (2013), PPDB: The Paraphrase Database. In: Proceedings of NAACL-HLT 2013. Atlanta, GA, 758–764.
  • Felix Hill, Roi Reichart, and Anna Korhonen (2016), Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41(4).
  • asked a question related to Computational Linguistics
Question
13 answers
Except PER metric, what are the existing performance metrics to compare two different recognizers in speech recognition?
Relevant answer
Answer
word error rate (WER%)
  • asked a question related to Computational Linguistics
Question
16 answers
What are the free or open source Arabic morphological analyzers, which we can download from Internet?
Please provide the links.
Relevant answer
Answer
More recent POS taggers:
Farasa, ADAM, Multi-dialect Arabhic Pos tagging: A CRF approach and Part of Speech tagging for Arabic gulf Dialect using Bi-LSTM
  • asked a question related to Computational Linguistics
Question
4 answers
Hi,
I'm looking for accessible/online corpora and tools to help me calculate the phonetic and phonological complexity of words in German (e.g. Jakielski's Index of Phonetic Complexity, IPC and the like) -- as well as any pointers to what useful measures of phonological complexity that have been identified experimentally.
Thanks very much in advance!
Relevant answer
Answer
Dear Gina,
Here you have a comprehensive list of Speech analysis and transcription tools for several languages.
I hope this helps.
Kind regards,
Begoña
  • asked a question related to Computational Linguistics
Question
3 answers
I want to know about the best Arabic named entity tools available and how to use them?
Thanks in advance
Relevant answer
Answer
  • asked a question related to Computational Linguistics
Question
4 answers
Given a text, how do you extract its introduction, its development, and its conclusion? Which Computational Linguistics technique can serve to identify the beginning and end of the introduction, development, and the conclusion? Which article addresses these questions?
Relevant answer
Answer
Ian Maun, thank you very much! You helped me a lot!
  • asked a question related to Computational Linguistics
Question
3 answers
I am actually trying to extract  data from SEC Edgar filings, however I see that building parsers for each form is quite exhausting, in addition not all those filings are of same format even though they are from the same form say for eg. 10k filing. 
I am intrigued if some one can direct me in the right direction.
Relevant answer
Answer
Hi Trinadh
You might want to think about textual analysis tools. You could start by running a simple word frequency count to get an idea of the frequency of words within the reports. You could use the linux command line to get you started. Simple tools such as head, tail, wc, grep, tr, sort, sed, cut, uniq and awk can get you a long way in a very short space of time. You could also write some simple Python scripts to carry out these tasks. Python will run on most O/S, is simple to programme and very fast in operation.
However, at the high end of the spectrum, you could use a tool like GATE (written by the University of Sheffield) to perform full textual analysis of any kind of text. You can apply your own list of keywords to the text, or you can use the tool to develop a keyword list based on the contents of the files. It provides you with a POS tagger and many analysis tools which can allow you to understand the sentience of the content of the file. This is a serious tool, and not recommended for amateurs, as there is a steep learning curve. You will also need a thorough understanding of Natural Language Processing to get the best out of it. I have added a link for the GATE software to let you see what it can do.
I hope that helps.
Regards
Bob
  • asked a question related to Computational Linguistics
Question
3 answers
Regards,
Relevant answer
Answer
Yes,
1. is the phrases in passive or active voice and what is proportion of both
2. What kind of pronouns are used in terms of level of politeness
3. What type of phrases: simple, subjectless, complex; with or without images;
4. Any proverbial references?
5. Plain or idiomatic what is the proportion of both
6. Then establish the features as a defining cluster for sub-genres in songs
thanks
  • asked a question related to Computational Linguistics
Question
3 answers
I have a dictionary which its values are matrices and its keys are the most frequent words in the train file. There is a test file which I have to see if the words in each line of that is in the dictionary (the keys), get their values and add them together and then divide that to the number of the words in each line which matches to the keys. The answer is one matrix. I tried "sum(val)", but it doesn't add the matrices together. How can I fix the code (the end part of that) which I've enclosed? 
Relevant answer
Answer
You can add numpy.array objects by using the '+' Operator. You have to pay attention to the shape, but your arrays seem to have the shape (1,2).
Sou you can use the following code. This will return a np.array.
  • asked a question related to Computational Linguistics
Question
3 answers
I need English proof reading for my Arabic computational linguistics research, about 80 pages, it's English text with Arabic linguistics terms, a job not for free
Relevant answer
Answer
I am willing if you provide me with more info. 
  • asked a question related to Computational Linguistics
Question
7 answers
Thanks.
Relevant answer
Answer
  • asked a question related to Computational Linguistics
Question
7 answers
I want to probabilistic way to model the affix distribution .Anyone knows the algorithm or technique for achieving the same
  • asked a question related to Computational Linguistics
Question
8 answers
Hi, I'm doing a multidimensional analysis following the work of Douglas Biber, on two corpora (one learner data, one professional texts).  I have the following dimensions following exploratory factor analysis, but am having trouble working out how to define and characterise these dimensions according to function (e.g. involved vs. informational discourse, context (in)dependent discourse, etc.). 
Here are the 5 dimensions.  In EACH CASE, the z-scores are HIGHER in the learner texts than the professional texts except where a * is seen after the linguistic feature.
1) 
VBD – Past tense verbs
PRT – Present tense verbs*
2) 
NN - Other nouns not nominalisations or gerunds*
NOMZ – Nominalisation
POMD – Possibility modals
VB – Verbs in base form
TO1 – Infinitives
3) 
JJ – Adjectives*
PRMD – Predictive Modals
4) 
PIN – Total prepositional phrases
DT – Determiners
VBN – Verbs with part participle
FPP1 – First person pronouns
5) 
SPP2 – Second person pronouns
QUPR – Quantifier pronouns
TPP3 – Third person pronouns
IN – Preposition or subordinate conjunction.
I hope that anyone who has done their own MDA might want to provide some pointers here.  Many thanks in advance!
Relevant answer
Answer
Interesting point regarding the CFA point above, quite a few linguistics papers that have done MDA via EFA have NOT then gone on to do CFA to confirm the model, e.g. Hardy & Romer (2013), http://uteroemer.weebly.com/uploads/5/5/7/7/5577406/hardy_and_roemer_2013_cor2e20132e0040.pdf.  
That said, I did a PCFA myself on the data, and got quite poor normed-fit and comparative fit indices, although RMSEA was appropriate.  It was interesting to attempt something like this, however, so thanks for the suggestion.
  • asked a question related to Computational Linguistics
Question
3 answers
Lemma lists represent a necessary tool in NLP. Despite lengthy investigation, I could not locate an Arabic lemma list that would be freely available, and the complexity of Arabic inflections means that the creation of one from scratch is no easy task and should only be undertaken once it is ascertained that none is already available.
Relevant answer
Answer
This is certainly not an exhaustive solution, but you can get a very substantial list of inflected forms and their lemmas from the Arabic Universal Dependency Treebank:
The Treebank contains about 242K lemmatized tokens, so I think those could be collapsed into a rudimentary lemma list covering quite a lot. The license is CC-BY-NC-SA, as shown here:
  • asked a question related to Computational Linguistics
Question
4 answers
I am looking for a corpus containing documents for extractive summarization. The sentences of the documents should be labelled as "positive" if that sentence is included in the summary, "negative" otherwise. The sentences will be fed as training data for the summarizer I am currently working with.
Relevant answer
  • asked a question related to Computational Linguistics
Question
3 answers
I'm writting my Master's Thesis about mobile learning and I'm lost with the terminology.
We are developing a mobile application to practice the Spanish conjugation. The system is not really social oriented since it is more a behavioral activity in which the user writes the answer and the device gives him feedbak. Do you think that it can still be considered as MALL (mobile assisted language learning)?
Thank you in advanced.
Teresa
Relevant answer
Answer
MALL is usually defined as language learning that is supported through mobile devices or where the learner is mobile. It is not defined by the type of language-learning activity that learners perform through the mobile device/application. It's worth keeping in mind that anything that can be called MALL could also be called CALL since mobile devices are a type of computing device.
Here are some references which should give you a sense of the types of things that fall into the MALL category.
Kukulska-Hulme, A., & Shield, L. (2008). An Overview of Mobile Assisted Language Learning: from Content Delivery to Supported Collaboration and Interaction. ReCALL, 20(3), 271–289. http://doi.org/10.1017/S0958344008000335
Cui, Y., & Bull, S. (2005). Context and learner modelling for the mobile foreign language learner. System, 33(2), 353–367. http://doi.org/10.1016/j.system.2004.12.008
Hockly, N. (2013). Designer Learning: The Teacher as Designer of Mobile-based Classroom Learning Experiences (pp. 1–12). Monterey, CA, USA: The International Research Foundation for English Language Education (TIRF). Retrieved from http://www.tirfonline.org/english-in-the-workforce/mobile-assisted-language-learning/designer-learning-the-teacher-as-designer-of-mobile-based-classroom-learning-experiences/
Godwin-Jones, R. (2011). Emerging Technologies Mobile Apps for Language Learning. Language Learning & Technology (LLT), 15(2), 2–11.
Chinnery, G. M. (2006). Going to the MALL: Mobile Assisted Language Learning. Language Learning & Technology, 10(1), 9–16.
Palalas, A. (2011). Mobile-Assisted Language Learning: Designing for Your Students. In S. Thouësny & L. Bradley (Eds.), Second language teaching and learning with technology: views of emergent researchers (pp. 71–94). Dublin: Research-publishing.net.
Burston, J. (2014). The Reality of MALL: Still on the Fringes. CALICO Journal, 31(1), 103–125. http://doi.org/10.11139/cj.31.1.
Demmans Epp, C. (2015). Mobile adaptive communication support for vocabulary acquisition. Journal of Learning Analytics, 1(3), 173–175.
Demmans Epp, C. (2016). Supporting English Language Learners with an Adaptive Mobile Application (Doctoral). University of Toronto, Toronto, ON, Canada. Retrieved from http://hdl.handle.net/1807/71720
  • asked a question related to Computational Linguistics
Question
3 answers
Dear researchers,
I am looking for a stylometry dataset, using all or part of lexical, syntact and structural features in a form of CSV, arff, or db.
I will really appreciate if you provide me with part of your dataset or suggest a link to get a dataset.
Relevant answer
Answer
Please consider the datasets I mention in my PhD and related documents.
  • asked a question related to Computational Linguistics
Question
4 answers
I intend to read about the criticism leveled to divergence time estimation of languages based on both lexicostatistics (glottochronology) and methods of comparative linguistics such as maximum parsimony. Could you introduce me some critical papers?
Relevant answer
Answer
Dear Colleague,
The literature is vast. If you wish to focus on criticism, articles by Don Ringe (with various coauthors) may be useful. On a different note, the book "Historical Linguistics and Lexicostatistics"  ed. by V.V. Shevoroshkin and Paul Sidwell (Melbourne, 1999) could provide you with food for thought (especially the chapters by Sergei Starostin, Harald Sverdrup).
I attach a paper which contains relevant details of the maths applied in the revised version of glottochronology (my co-author, V. Blazek is the mathematician :-)).
Hope, this helps.
Best regards,
Irén H.
  • asked a question related to Computational Linguistics
Question
3 answers
I wish to find all ditransitive constructions in a corpus like BNC or COCA, e.g. "verb + noun + noun" and "verb + noun + preposition + noun", so that I can see which words can be used in a ditransitive construction.
Relevant answer
Answer
Hi Weichang,
If you're using a CQP based interface like CQPWeb for the BNC, you can use a part of speech regular expression to get a rough approximation of NPs. Using the CLAWS tags in the BNC, you could find two lexical NPs after a lexical non-gerund verb like this in CQP:
[pos="VV[^G]?"][pos="AT.*"]?([pos="R.*"]?[pos="J.*"])* [pos="N.*"][pos="(AT|J).*"] ([pos="R.*"]?[pos="J.*"])* [pos="N.*"]
The main trick is avoiding compounds of N+N, which can look like to consecutive NPs. Then you still have to filter out adverbial NPs, such as "eat [lunch] [every day]", but you can sample a subset to establish how frequent those are in your data. Either way, you won't get a perfect result, but a pretty good one.
If you have COCA in a CQP interface you could do something similar, depending on whether you're using COCA's tagset or re-tag it using some other tagset (Penn or CLAWS, for example).
Hope this helps,
Amir
  • asked a question related to Computational Linguistics
Question
11 answers
I have done my masters study research in Sentiment Mining and I have worked on Multi aspect Sentiment Analysis. I want to continue the work in the area, can you help me choosing?
Relevant answer
Answer
Nowadays there are quite a big trend to research on deep learning of NLP and Sentiment Mining , you can follow research work of stanford to get a better understanding and continue your work.
  • asked a question related to Computational Linguistics
Question
63 answers
Do the formal languages of logic share so many properties with natural languages that it would be nonsense to separate them in more advanced investigations or, on the contrary, are formal languages a sort of ‘crystalline form’ of natural languages so that any further logical investigation into their structure is useless? On the other hand, is it true that humans think in natural languages or rather in a kind of internal ‘language’ (code)? In either of these cases, is it possible to model the processing of natural language information using formal languages or is such modelling useless and we should instead wait until the plausible internal ‘language’ (code) is confirmed and its nature revealed?
The above questions concern therefore the following possibly triangular relationship: (1) formal (symbolic) language vs. natural language, (2) natural language vs. internal ‘language’ (code) and (3) internal ‘language’ (code) vs. formal (symbolic) language. There are different opinions regarding these questions. Let me quote three of them: (1) for some linguists, for whom “language is thought”, there should probably be no room for the hypothesis of two different languages such as the internal ‘language’ (code) and the natural language, (2) for some logicians, natural languages are, in fact, “as formal languages”, (3) for some neurologists, there should exist a “code” in the human brain but we do not yet know what its nature is.
Relevant answer
Answer
Dear André,
do you remember some of the 2/3 of your thoughts which you thought in internal language? How do you do that?
Can you rethink them and discuss them internally? How do you do that?
Are among those thoughts some which cannot come into mind like a picture (because they are in some way more abstract)? How do you remember them, rethink them, and discuss them internally?
  • asked a question related to Computational Linguistics
Question
3 answers
I need tool/dictionary/algorithm that when I give it the Arabic verb it will give me the noun of that verb and vice versa. 
Relevant answer
Answer
 Dear Hamzah,
In my knowledge, there is no such tool for this task in Arabic. I don't know if such tool does exist for other languages as well.
As you know there is no standard form of the nouns. You can at a first stage extract the nouns automatically. And select at the second stage the right nouns by implementing a  rule based system.   Arabic grammar experts can help you efficiently in this stage.
  • asked a question related to Computational Linguistics
Question
4 answers
Hi all
Is there any database or program which links arabic nouns to their derivatives (for example linking "اسخراج" to"استخرج").
I do not need a root extractor i.e. linking اسخراج to خرج
Relevant answer
Answer
Is there any database or program which links arabic nouns to their derivatives (for example linking "اسخراج" to"استخرج")?
I remember there was one back several years ago. The database requires that the radicals of the basic measure verb be entered in a slot. The outcome includes the (required) derivation(s). I had the link and lost it. To answer your question, yes, there is, well, was. If you conduct a search, you may find it.
  • asked a question related to Computational Linguistics
Question
4 answers
I have seen that these two terms have been used interchangeably in the literature. I am wondering what are the main distinguishing factors between these two systems?
Relevant answer
Answer
In a nutshell, LMS focuses on how a student learns while CMS focuses on how a course is delivered and implemented for learning.  In a nutshell...
  • asked a question related to Computational Linguistics
Question
6 answers
Please refer any website/ Paper/ Tutorial/ Link where text mining analysis is being used with conclusion formulation.
Relevant answer
Answer
You can use WEKA software. See the below link:
  • asked a question related to Computational Linguistics
Question
6 answers
Are there any corpus of "easy to read medical text" freely available?
Relevant answer
Answer
This presentation about Text Mining might be relevant for you:
  • asked a question related to Computational Linguistics
Question
5 answers
It has been a while since I am searching for a freely available and reliable term extraction for arabic (specially for single-word terms). Any suggestions will be highly appreciated. 
Relevant answer
Answer
I've started working on AntConc only recently; it helps you to view the files. However, it is not very good for extracting words or searching for collocations. Laurence Anthony (the designer) promised to develop a new version (4.0)  that can help you process Arabic texts without messing things up. Meanwhile, I'm working on Wordsmith, which is not free unfortunately. All the best!
  • asked a question related to Computational Linguistics
Question
7 answers
Automatic indexing - Given a text document extract terms that describe the main topics or concepts covered in the document. It is a task done before inverted index construction in Information Retrieval Systems development. Terms may be keywords, keyphrases, noun phrases, noun group, entities etc.
Relevant answer
Answer
  • asked a question related to Computational Linguistics
Question
15 answers
Have asked in Statistical Area. Am interested in identifying probabilistic and statistic distributions of Mandarin tones [either in general or in specific corpora].
I have developed some very general data eg Tone 1 occurs around 18% of the time, Tones 2 and 3 slightly higher than Tone 1, Tone 4 occurs > 40%, and the neutral is relatively low. But I'd like to obtain more detailed data and also theories as to how experts view tones in probability [if this style can even be accomplished]. Would Bayesian probabilities not be appropriate?
Relevant answer
Answer
Stephen, that's really cool stuff! Thanks. 
  • asked a question related to Computational Linguistics
Question
4 answers
A sentence of a tonal language presents critical lexical information in the tones whereas a nontonal language such as English does not. What might be a way of developing useful statistics that measure and show this difference? In other words, how much of the information content is in the tones?
  • E.G. let us say English is 100% nontonal and Mandarin can be shown to be 60% nontonal and 40% tonal [I do not really know what the statistics would be].
  • asked a question related to Computational Linguistics
Question
17 answers
Are there studies that identify the possible probabilistic and statistic distributions of Mandarin tones [either in general or in specific corpora]? I have developed some very general data eg Tone 1 occurs around 18% of the time, Tones 2 and 3 slightly higher than Tone 1, Tone 4 occurs > 40%, and the neutral is relatively low. But I'd like to obtain more detailed data and also theories as to how experts view tones in probability [if this style can even be accomplished]. Would Bayesian probabilities not be appropriate? 
Relevant answer
Answer
Usually sandhi is regarded as a generalized way to handle the joint between certain phonemes.  There may be many rules, but they all fall into the category of what happens when one word ends with one phoneme (in the case of tone, it seems to be an aspect of the ending phoneme) and the next begins with another.  So these examples just fall into cases handled by the system of sandhi. 
  • asked a question related to Computational Linguistics
Question
3 answers
I am making a research to create indexes that will contain names and other keywords. My resource texts are written in Greek polytonic characters. I think that it would be very useful to find a way to make them editable and searchable. Furthermore, in order to summarize and classify the information mined, I believe that a software with stylometry function is needed. For the above reasons I am looking for: a) OCR software, b) stylometry software.
Any kind of help will be greatly appreciated! Thank you!
Relevant answer
Answer
Hi, are you aware of the following open-source system?
I don't have any personal experience with the program but it seems it might worth a try. 
  • asked a question related to Computational Linguistics
Question
3 answers
Hello, my research is about sign language recognition, many researchers choose to use the sign as a base unit of modeling , while others attempt to use a structure similar to phonemes to create models. what's the better approach for modeling the sign?
Relevant answer
Answer
Hand over hand and side by side so they can see the perspective!
  • asked a question related to Computational Linguistics
Question
3 answers
Till now Traditional N-Gram approach is used for Word Prediction system.
Relevant answer
Answer
You need a tagged corpus for training your N-Grams and build a Language Model, and then use the Language Model  as a discriminative resource for word prediction.
You do not necessary need a treebank, in the sense that dependencies are not a must. What you need are Part-of-speech, lemma and morpho-synctactic information. With this info you can boost precision in prediction. We implemented this method for Italian, please see papers in my page.
For inflected languages, synctatic N-Grams can provide significative gain. In our experiments we reported  up to 30% improvement of KS (Keyword Saving).
  • asked a question related to Computational Linguistics
Question
5 answers
I need a hint how choose feature from connected alphabet. If you do not familiar with this language just think English handwriting where in a word every letter is connected.
Relevant answer
Answer
Here is a recent paper.
I hope, it will help you.
Good luck
  • asked a question related to Computational Linguistics
Question
6 answers
Has it been calculated mathematically or logically?
Relevant answer
Answer
Greetings
I'm not sure what you mean by "innate". Do you mean "Do UG's claims actually match how people really process language?". To the best of my knowledge, there is no easy way to assess that: I'm not sure one could devise an experiment to validate the UG hypothesis as a whole. Maybe proving or disproving certain claims.
If you mean "Is Generative Grammar a consistent theory?", now that's another story. Even if one could prove it's consistent, it wouldn't necessarily mean it's true, i.e. it successfully accounts for language emergence, evolution, variation, acquisition, etc. in every aspect. If you look at other scientific fields, it seems to me that quantum mechanics AND relativity theory are both consistent models of some aspects of reality. Yet, they are fundamentally irreconcilable. Unless you start unifying them under an overarching model (string theory).
So it all boils down to personal belief, I think, hence the fierce arguments these questions generally spark among linguists.
  • asked a question related to Computational Linguistics
Question
4 answers
Which Neural Network techniques are used in computational linguistic applications?
Relevant answer
Answer
This will depend on the type of features you will use for the  computational linguistic application. you should know if the used features are dependent or independent, linearly separable on non linearly separable etc. After specifying the nature of the used linguistic features, then you should search for the Neural Net that suits your training data.
  • asked a question related to Computational Linguistics
Question
6 answers
Thanks in advance for your replies.
Relevant answer
Answer
thanks for your answers 
  • asked a question related to Computational Linguistics
Question
6 answers
I search a way to interpret logical representation of natural language, more or less as Latent Semantics.
Relevant answer
Answer
Well, yes. I know. Quite many people have tried to solve the problem but, to my knowledge, so far nobody has come up with a clear-cut model of the interface between natural language semantics and one or more kinds of logic. And I'm not quite happy with the metaphorical use of expressions like 'extract' and 'knowledge' in the context.
  • asked a question related to Computational Linguistics
Question
3 answers
Corpus must contain documents (texts) with hand annotated keywords by human experts.
Relevant answer
Answer
In the European project MANTRA we developed a parallel corpus based on EMEA and MedLine. It comprises of a silver standard large corpus (using different systems tagging the corpus with a majority vote) and a manually crafted 550 document gold corpus (only EMEA) used to calculate precision and recall of the silver standard corpus. The link for the downloads is https://sites.google.com/site/mantraeu/project-output and project information can be found at https://sites.google.com/site/mantraeu/
  • asked a question related to Computational Linguistics
Question
3 answers
Is there a need to build a new standard corpus for Arabic Information Retrieval? Is it possible in the current state of the art?
Relevant answer
Answer
yeah, there is always need for such things. Possibility position can depend on the environment but the languages such written from right to left need such things in entirety.
  • asked a question related to Computational Linguistics
Question
3 answers
I try to find the best method for Information Extraction in long-distance languages ( long-distance means have less similarity) , Korean , Japan and Myanmar languages are same sentence structure similarly. Finally , I want to show the brief summary for these extracted information with Myanmar language. Thanks a lot for concentration.
Relevant answer
Answer
For IE, in my knowledge, CRF is the best method for extracting the snippets of information. However, you need a training data?
Could you describe more detail what is your problem? Which is information you want to extract?
  • asked a question related to Computational Linguistics
Question
4 answers
Extracting causal relationships from texts is far from trivial, but there are quite a few intriguing pieces in the recent literature that discuss how this could be done. E.g. http://www.hindawi.com/journals/tswj/2014/650147/. The 'technology readiness level' of this work seems significantly below that of things like entity, sentiment, event, etc extraction. But at least some progress seems to have been made.
Given the availability of so many large full-text academic databases, it would be of course fantastic to be able to 'extract' all of the causal hypotheses that have been formulated over the years in various disciplines. But so does anybody know of any existing textmining tools that can already do this - even if it's just for English? 
Relevant answer
Answer
Our coding software (Profiler Plus) is commercial, but we have been working on a coding scheme to extract propositional data (such as causality) from text. It is still very much a work in progress. However, if you are interested in a collaborative project, I'm interested in working on concrete applications.
  • asked a question related to Computational Linguistics
Question
5 answers
Textmining tools are becoming ever more useful, but it remains difficult to find good tools for CJK languages. If anybody knows of good tools for - especially - Chinese, I'd be grateful for a link. 
Relevant answer
Answer
I didn't use textmining tools for Chinese language so far, but I find some information about two of them - maybe it will be useful:
The Stanford Word Segmenter supports Chinese:
There are some plugins to GATE for processing Chinese:
  • asked a question related to Computational Linguistics
Question
10 answers
For most of my projects I use R to manage my big data and firing statistical analyses on the results. My domain of research is linguistics (computational linguistics, corpus linguistics, variational linguistics) and in this case I'm concerned with big heaps of corpus data. However, R feels sluggishly slow - and my system isn't the bottleneck: when I take a look at the task manager, R only uses 70% of one CPU core (4 cores available) and 1Gb of RAM (8Gb available). It doesn't seem to use the resources available. I have taken a look at multicore packages for R, but they seem highly restricted to some functions and I don't think I'd benefit from them. Thus, I'm looking for another tool.
I am looking at Python. Together with Pandas it ought to be able to replace R. It should be able to crunch data (extract data from files, transform into data table, mutate, a lot of find-and-replace and so on) *and* to make statistical analyses based on the resulting final (crunched) data.
Are there any other tools that are worth checking out to make the life of a corpus linguist easier? Note that I am looking for an all-in-one package: data crunching and data analysis.
Relevant answer
Answer
Hi Bram,
I'm an experienced Python user but don't really know much about that specific scientific discipline. In any case you have a Python library (Natural Language Toolkit) that should be able to help you a lot:
Besides this for numpy based data (arrays) you may want to look at (besides panda) scikit-learn for Machine Learning algorithms:
Matplotlib for plotting:
http://matplotlib.org/ (although there are plenty of others if you need more performance in plots)
I'm not sure what in what formats your raw data comes from but there's a lot of libraries for dealing with CSV, EXCEL formats, among others...
I would advise you to install everything with 64 bits version. Python has official 64 bit releases but some libraries don't so this link might help you in that regard:
Besides I would advise you to install a more complete package set instead one library at a time. WinPython (for windows) has an excellent installation manager (besides already coming with a lot of stuff) :
Anaconda is another distribution (64 bits) also good for Mac or Linux:
Hope it helps.
  • asked a question related to Computational Linguistics
Question
3 answers
Hello I am student working on NLP project. The top ranked words under a given topic number which is obtained from LDA (Latent Dirchlet allocation), I am trying to assign topic names to each topic number using wikipedia as a knowledge base. 
Wikipedia category graph contains links between categories which are having some relationship but do not have an hierarchical structure. From this graph I removed the non hierarchical links to get a DAG (directed acyclic graph) as a consequence a given category can have one or more parent, after this  I applied BFS like algorithm to get a taxonomy but this misses relevant hierarchical links.
Is there any factor which I consider to get more accurate and meaningful taxonomy.
Thank you in advance 
Relevant answer
Answer
There is some work on taxonomy extraction from Wikipedia (identifying taxonomic links in the Wikipedia catetory graph).
You might want to check our paper
García, Renato Domínguez, et al. "Automatic taxonomy extraction in different languages using wikipedia and minimal language-specific information." Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2012. 42-53.
  • asked a question related to Computational Linguistics
Question
5 answers
I implemented a study using a pseudoword as a prime and real words as targets. When looking for relevant literature, quite a while ago, I found nothing. Now I have the results and a found a clear N400 component and a quite strong P600 effect. I would really like to be able to cite some similar work, but so far I just haven't found anything. References about word pairs and N400 and P600 would also do.
Thanks
Relevant answer
Answer
Hi! This is not my field, but it seems to me that this ERP study uses pseudo-words as prime and figures as target and finds also a N400 component: Kovic, V., Plunkett, K. and Westermann, G. (2010). The shape of words in the brain. Cognition 114, 19–28.
  • asked a question related to Computational Linguistics
Question
4 answers
What is the best method for extracting information in long-distance languages ( e.g. Myanmar , Japan , Korean ) ?
How to summerize the information in these languages ?
Thz....
Relevant answer
Answer
You are probably looking for similarities among languages that may be related. Then, compile a list of common words for each language: mother, father, brother, sister, man, woman, child, house, rice, sun, to go, etc
  • asked a question related to Computational Linguistics
Question
5 answers
I want to know if there are systems that generates programs from Natural Language description, and how this is called in science journals. Broadly speaking I mean something like, we describe e.g. web app in natural words. Like "I need catalog web app, with users, and admins. and users are registered in site and admin approves them. and etc.". So the program uses data from different structured frameworks (ROR or Django), and create program by itself. And if there are some researches in this fields?
Relevant answer
Answer
You can look at Ellen Riloff's work :
In there find: NaturalJava: A Natural Language Interface for Programming in Java.
While it is dated you can use it and Joachim's reference to get a head start and maybe find some other references..
  • asked a question related to Computational Linguistics
Question
4 answers
I'm looking more specifically for studies that used Word Sketch to analyze collocations in/across different academic disciplines.
Relevant answer
Answer
Maybe Ken Hyland's  stuff on Academic Discourse ,namely his work on certain verbs  in research articles.
  • asked a question related to Computational Linguistics
Question
4 answers
What is the most suitable tool for sequence classification/mining with the following features: langauge modeling, HMM, CRF, multi-attribute nodes?
Relevant answer
Answer
For HMM algorithm, you can use HTK (HMM tool Kit). HTK can do automatic segmentation and labeling.
  • asked a question related to Computational Linguistics
Question
3 answers
Are there any free Arabic morphologically tagged corpora?
Relevant answer
Answer
Hi Ibrahim,
I don't believe so. Please check out the following paper for the issues involved in developing such corpora: http://archimedes.fas.harvard.edu/mdh/arabic/NAACL.pdf.
  • asked a question related to Computational Linguistics
Question
3 answers
can any one help me and suggest to me papers on hybrid approaches in NER(Named Entity Recognition), and any one can ssuggest me usefull techniques to use
  • asked a question related to Computational Linguistics
Question
16 answers
I am searching for a study which examined the number of annotators for creation of a reliable corpus for a text classification task evaluation.
Snow et al [1] argue that on average 4 non-expert raters are required for annotation tasks but the tasks described are no classification tasks (only the study on affectual data might be considered as classification task). I'm rather searching for a statement for topic-based classifications.
Often, three annotators are used and a majority-voting is done but without real evidence that this a sufficient number...
Thank you very much in advance for your answers!
[1] Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 254-263. 
Relevant answer
Answer
Sebastian, your a talking about a different problem. The task of defining the tag set is vitally important and usually done badly. Multiple experts in creating those definitions are valuable, but they don't all have to do the annotating. In fact in one of our studies  we showed that the experts were more constant between themselves but missed more of the peripheral examples expressed in non-standard language, higher precision, but lower recall. We use linguists for doing annotations as they are much more aware of the nuances of language and more conscientious about the accuracy of the tagging. I would always advise  to not use domain experts to do the annotating as they don't have the language analysis tools. Their role is to specify the task the tag definitions and arbitrate difficult cases, however it is the linguists who refine the tag definitions after the experts have created them.
  • asked a question related to Computational Linguistics
Question
6 answers
I'm reading about context free grammar and i recognized how to eliminate the left recursion but i did not find out what is the problem with left recursion?? can any one explain 
Thanks in advance
Relevant answer
Answer
Dear Ahmed
the problem with left recursion, from a computational linguistics point of view, is that it leads to infinite recursion, as mentioned in other posts. And, sadly, linguists do tend to write an awful lot of such rules, as the example below shows (a very naive DCG grammar for English relative clauses). If you 'consult' this grammar with swi-prolog, all will apparently run smooth because swi-prolog can deal with such recursive rules appropriately. If you submit the following  goal "s([the,man,that,he,knows,sleeps],[]).", you'll get "true" as an answer. But, if you ask swi-prolog to search for more results (";"), then you'll get an "Out of local stack" error because of the left-recursion. 
The general strategy is "transform your left-recursive rules into right-recursive ones". It means you must tweak your grammar to eliminate such left-recursive rules and transform them into right-recursive ones, with the help of an intermediate non-terminal (cf. for eg. http://web.cs.wpi.edu/~kal/PLT/PLT4.1.2.html). 
From an algorithmic point of view, different approaches have been published, in order to deal with such left-recursive rules (as said earlier, this is how linguists spontaneously write formal grammars). If you're looking for algorithms, you can have a look at Bob Moore's paper http://research.microsoft.com/pubs/68869/naacl2k-proc-rev.pdf.
s --> np, vp.
np --> det, n.
np --> np, relc.%this is a left-recursive rule
relc --> pror, pro, vt.
vp --> vi.
vp --> vt, np.
det --> [the].
n --> [man].
pro --> [he].
pror --> [that].
vt --> [knows].
vi --> [sleeps].
  • asked a question related to Computational Linguistics
Question
3 answers
I have considered 3 datasets and 4 classifiers & used the Weka Experimenter for running all the classifiers on the 3 datasets in one go.
When I Analyze the results, considering say classifier (1) as the base classifier, the results that I see are :
Dataset (1) functions.Linea | (2)functions.SM (3) meta.Additiv   (4) meta.Additiv
--------------------------------------------------------------------------------------------------
'err_all' (100)  65.53(9.84) |    66.14(9.63)          65.53(9.84) *      66.14(9.63)
'err_less' (100) 55.24(12.54) | 62.07(18.12) v    55.24(12.54) v    62.08(18.11) v
'err_more' (100) 73.17(20.13) | 76.47(16.01)     73.17(20.13) *    76.47(16.02)
--------------------------------------------------------------------------------------------------
                            (v/ /*) |              (1/2/0)                 (1/0/2)                 (1/2/0)
As far as I know:
v - indicates that the result is significantly more/better than base classifier
* -  indicates that the result is significantly less/worse than base classifier
Running multiple classifiers on single database is easy to interpret, but now for multiple datasets, I am not able to interpret which is better or worse as the values indicated do not seem to match the interpretation.
Can someone pls. help interpret the above result as I wish to find which classifier performs the best & for which dataset.
Also what does (100) next to each dataset indicate?
'err_all' (100), 'err_less' (100),  'err_more' (100)
Relevant answer
Answer
The methodology for comparing n machine learning (ML) methods over m data sets is described in (Demšar, 2006). Please see the link.
I show you an example in the attached document.
  • asked a question related to Computational Linguistics
Question
22 answers
I'm just curious about what other Natural Language Processing instructors might be using as introductory NLP packages, esp. to non-experienced programmers. Any thoughts? I'd much appreciate your advice and commentary on level of difficulty, effectiveness, etc.
VR
Relevant answer
Answer
Apart from NLTK and Pattern, the only other Python NLP library that I've come across is TextBlob ( http://textblob.readthedocs.org/en/dev/ ):
"TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more."
I've not used it extensively, so I can't comment on whether or not it would be a good alternative to pure NLTK (TextBlob is apparently built on top of both NLTK and Pattern). It does seem to attempt to simplify the code for certain aspects of text processing such as part-of-speech tagging, so it may suit your teaching needs as an introductory tool.
  • asked a question related to Computational Linguistics
Question
3 answers
I would like to know how big a corpus can be built using LingSync, and for what goals. I would also like to know to what extent such a corpus can be converted to a stand-alone online corpus.
Relevant answer
Answer
It is free, only that I don't know how versatile it is and what are other alternatives.
  • asked a question related to Computational Linguistics
Question
6 answers
I am doing my final year project on "Classification of Tonal and Non-Tonal languages" using neural networks. The system takes pitch contour and energy as parameter Using only the pitch contour as a parameter yields an accuracy of 66%, whereas adding short term energy increases it to above 80%. 
Many standard literatures also consider energy as a characteristic feature of the language, but provides no explanation.
Relevant answer
Answer
I have studied the relationship between the pitch and energy contour in my PhD  and summarized some of the work in the follow publication. Whilst it is not directly related to tonal languages there is a comprehensive analysis and discussion about the  relationship between the two measurements. 
The Relationship between Prosody and Breathing in Spontaneous Discourse
Hird, Kathryn ; Kirsner, Kim
Brain and Language, 2002, Vol.80(3), pp.536-555 [Peer Reviewed Journal]
Digital Thesis available from UWA
Are cognitive processes involved in the control of declination in spontaneous speech?
Kathryn Marie Hird University of Western Australia. Dept. of Psychology. 1994
 
Available at Reid Library  Store  (Thesis 1994 HIR)
  • asked a question related to Computational Linguistics
Question
4 answers
N-gram is pretty suitable for NLP or any sequential data. I am wondering if anybody is working n-gram for software engineering research. Please share your experience.
Relevant answer
Answer
The group of Martin Vechev is using language models (and other NLP techniques) in program analysis and synthesis, e.g.:
  • asked a question related to Computational Linguistics
Question
5 answers
what model or models to be used in speaker recognition under automatic speech recognition (ASR)?
Relevant answer
Answer
Hello,
Just Try to make a combination between HMM and GMM. 
  • asked a question related to Computational Linguistics
Question
3 answers
Other than finding open datasets and privacy issues, is there any other challenges that might be faced by sentiment analysis applications in smart cities contexts?
Relevant answer
Answer
 Hi, Sentiment analysis itself faces some challenges from technical perspective in any context not just the smart cities. The general challenges will be domain-dependency, context-dependency, sarcasm, spam, ambivalency, and implicit opinions(or sentiment). These are the challenges that have to be considered in the design and implementation of any application of sentiment analysis.
  • asked a question related to Computational Linguistics
Question
6 answers
I am able to access the transcripts but I am unable to access the audio files even on free online corpora webpages. Could anyone tell me how to access both transcripts as well as audio files together?
Relevant answer
Answer
Sir, You can write to John M Swales who was instrumental in developing MICASE. He responds to our queries. Generally we get access to transcripts only. The audio databases are not shared. There is Dr Claudia from Germany, Dresden . she collected a lot of samples from Indian users of English.  Her contact is also useful.