Science topic
Text Linguistics - Science topic
Explore the latest questions and answers in Text Linguistics, and find Text Linguistics experts.
Questions related to Text Linguistics
text Linguistic Contributions to the development of translation studies
I think that text summaries can be considered as a separate genre of text because summary texts have unique stylistic features. For example, narrator change, features such as length and brevity of the text, deleted of detailed information, conjunctions that point to logical connections, discourse markers etc. But the important thing is that the purpose of the discourse, which determines the genre of text, changes. The communicative purpose in narrative texts is not the same as the purpose of the summary of that text. In some studies, summary is considered as a genre of academic text. I'm dealing with genre here in the context of a text schema. Although van Dijk claims that the summary is "reach the great proposition", it does not make it a dependent text just because the topic/content does not change in a text and sticking to the original text. Rather, it uses distinctive linguistic markers. If the situation were only the topic or content of the text, there would be no art or literature.
Do you know any aphorisms, old sayings, parables, folk proverbs, etc. on science, wisdom and knowledge, ...?
Please, quote.
Best wishes

Could you please tell me what are the best available Arabic speech corpuses for a TTS system? Please include even non free options.
I am trying to use Stanford TokensRegex, however, I am getting an error in line number (11). It says that (). Please do your best to help me. Below is my code:
1 String file="A store has many branches. A manager may manage at most 2 branches.";
2 Properties props = new Properties();
3 props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
4 StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
5 Annotation document = new Annotation(file);
6 pipeline.annotate(document);
7 List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
8for(CoreMap sentence: sentences)
9 {
10 TokenSequencePattern pattern = TokenSequencePattern.compile("[]");
11 TokenSequenceMatcher matcher = pattern.getMatcher(sentence);
12 while( matcher.find()){
13 JOptionPane.showMessageDialog(rootPane, "It has been found");
14 }
15 }
Hi ,
I know that most of existing probabilistic and statistical term-weighting schemes (TF-IDF and its variation) are based on linked independence assumption between index terms. On the other hand, semantic information retrieval are seeks the importance of linked dependence between index terms each other.
Please, I am wondering when linked dependence between index terms is vital ? When also can we neglect linked dependence between index terms?
Note: dependence assumption: if two index terms have the same occurrences in the document, this will tend to that index terms are dependent and they should have the same term-weight values.
Thanks
Osman
Hello!
I'm a student of a university and trying to interpret indicator diagram of internal combustion engine. as a part of this, I have to find average specific heat ratio of gas inside the cylinder. and to do that, I have to find out specific heat at constant pressure(Cp). my professor gave me the three approximate expressions(but only its name.). Danisi(?), Khül, JANAF. (I'm not sure about the 'Danisi' because it's a name of Japanese and I can't find his name in anywhere... so I just transcribe it as pronounced in my language. But since the professor says don't use it because he used it for example so it doesn't really matter... in fact it does matter but I don't have enough time for that now...)
I tried thermodynamics textbooks in the university library, I tried to google it, I tried asking professor(he gave me a textbook but it's all Japanese... and I never learn Japanese...). I found about who is Khül and what is JANAF(and JANAF table...) but I can't find approximate expression about specific heat. So I hope, someone teaches me about it but that's unrealistic... so just, I would very glad if someone gives me a link to a page that explains about it, or where/how to find it.
-In short, I want to find about JANAF approximate expression of specific heat and Khül's approximate expression of specific heat...
I'll attach an example that my professor offered.
Thank you for your time!

I have some files containing a Persian sentence, a tab and then an English word in each line. The English words show the sentence class. some files have 2 classes, some 3 and some more. I extracted 1000 words from the file and made a term document matrix. The columns of the matrix are the classes and the rows are the words. Now I want to change this matrix to SVD which returns u, sigma and V (Vt) and then do dimension reduction. 1) How can I do that? (I've enclosed the code (python3) but I'm not sure if it's right or no. I copied from the net)
2) when I print the term document matrix, it only returns the start and last lines of the matrix (because it's too large). How can I print all of the matrix?
Then I have to find each word's vector according to u*sigma. 3) How should I make such vector (actually a matrix which is the indexes of each row of u*sigma matrix)?
hint: this a part of LSA project.
I have a file containing a Persian sentence, a tab and then an English word. I have to delete stop words and punctuation in the file. I wrote the code in python 3, but because in some words the punctuation attaches to the word, and it is counted as a part of the word and not punctuation, it can't be deleted. So I need to use regular expression to delete stop words. I tried to use that in the code below, but I couldn't. How can I change the code below that it works correctly? (in fact, what should I write exactly?) thanks.
I have a list of Persian words and a file which contains a sentence, a tab and then an English word in each line. I want to check if each word in each line of the file, exist in the list, the code returns "1", and if not, it returns "0". For example, if my list contains 20 words and my file has 50 lines, the code should return 50 rows with 20 columns of 1 and 0 and a column of that English word at the end. (In fact 21 columns). And between each number should be a comma (as like as the picture below). And finally I want to write them in a new file. The code below just returns one column. How can I fix it? thanks

I deal with his life, work and texts (germain, latin) for several years. I would like compare my results and oppinions with anybody else´s ones.
Keep in mind that he published his works not by his name but by pseudonyms, so if it si your field too, you surely know that :)
I am looking text chat data (any kind of call center). If anyone know please provide me link or data.
1/ If we consider a context defining a term as a set of sentences giving necessary information about the meaning of this term, would it be a contextual definition or a definitional context?
2/ Can we find other types of definitions in one definitional context?
This was a concept regarding health imported to Japan from China between 7th to 10th centuries.
Yojo connected health with diet, mental control, exercise and sexual restraint.
I am translating a book related to this topic, however, I can't find the equivalent Japanese word or Chinese word for it...
does anyone express the relationship between the word order and communication ?
I am trying to develop software to get suitable attributes for entities names depending on entity type.
For example if I have entities such doctor, nurse, employee , customer, patient , lecturer , donor, user, developer, designer, driver, passenger and technician, they all will have attributes such as name, sex, date of birth, email address, home address and telephone number because all of them are people.
Second example word such as university, college, hospital, hotel and supermarket can share attributes such as name, address and telephone number because all of them could be organization.
Are there any Natural Language Processing tools and software could help me to achieve my goal. I need to identify entity type as person or origination then I attached suitable attributes according to the entity type?
I have looked at Name Entity Recognition (NER) tool such as Stanford Name Entity recognizer which can extract Entity such as Person, Location, Organization, Money, time, Date and Percent But it was not really useful.
I can do it by building my own gazetteer however I do not prefer to go to this option unless I failed to do it automatically.
Any helps, suggestions and ideas will be appreciated.
Maybe a tool that would also let me annotate parallel texts?
Hi everyone! I'm a linguist having basic computer skills, so I have only some vague notions about Java, Python or other programming languages. I'm interested in annotating a small parallel corpus for discourse relations and connectives, so I need to be able to define several criteria in my analysis (arguments, connectives, explicitness/implicitness, etc.). I would welcome any suggestions... Thanks!
I want to analyze Urdu text linguistically but I couldn't find any software to measure the frequency of different items for the purpose.
There are many tools are used to find out the Part Of Speech (POS) such as Stanford tagger, Tree Tagger and Gate. What is the most common tagger with a lower error rate for English British language?
I was wondering whether there is a ready to use tool for syntactic normalization of , e.g. noun phrases "treatment of acne" --> acne treatment, etc. Although a rule-based approach is possible, there must be a more robust solution for that.
, i.e., finding the existence and quantity of a set of adjectives from a given set of sentences where the sentences do not contain the adjectives?
I would like a code to run Stanford Named Entity Recognizer (NER). Suppose that I have text and I would like the Stanford NER to recognize the entities which are mentioned in the text.
It will be appreciated if I could have examples with code, tutorial or any other useful resource.
I am trying to use Stanford TokensRegex to design patterns. I am attempting to catch "A manager may manage at most 2 branches" where it has been mentioned once in the text, however I failed to get it. below is my code
String file="A store has many branches. Each branch must be managed by at most 1 manager. A manager may manage at most 2 branches. The branch sells many products. Product is sold by many branches. Branch employs many workers. The labour may process at most 10 sales. It can involve many products. Each Product includes product_code, product_name, size, unit_cost and shelf_no. A branch is uniquely identified by branch_number. Branch has name, address and phone_number. Sale includes sale_number, date, time and total_amount. Each labour has name, address and telephone. Worker is identified by id’.";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
// create an empty Annotation just with the given text
Annotation document = new Annotation(file);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for(CoreMap sentence: sentences)
{
TokenSequencePattern pattern = TokenSequencePattern.compile("A manager may manage at most 2 branches");
String sentence1=sentence.toString();
String[] tokens = sentence1.split(" ");
TokenSequenceMatcher matcher = pattern.getMatcher(document.get (CoreAnnotations.SentencesAnnotation.class));
while( matcher.find()){
JOptionPane.showMessageDialog(rootPane, "It has been found");
}
}
Please suggest any books, articles which could help me in learning to design patterns in Stanford TokensRegex within Stanford CoreNLP.
I would like to know which is the best Natural Language Software to recognize the part of speech with small parentage of errors. I have used Stanford CoreNLP but It some time came out with errors.
CDA framework has been widely used by Chinese linguistic analysts recently. But the real difficulty is that due to the very distinctive political, historical background in China and dichotomy between western and eastern ideology, what CDA scholars generally agree on sometimes does not fit the situation in Chinese society.
So what is the best way to incorporate the CDA framework to Chinese issues and at the same time be truthful to the indigenous environment, so to be really socio-historically significant?
I would like to extract attributes of a table which are mentioned in Plain text. What is a best approach to be followed : Is it supervised, semi-supervised or unsupervised ?
I have some sample case studies but I have not had big training set.
below is example of a case study:
"Consider the following relational database for Fester Zoo. Fester Zoo wants to maintain information about its animals, the enclosures in which they live, and its zookeepers and the services they perform for the animals. In addition, Fester Zoo has a program by which people can be sponsors of animals. Fester Zoo wants to track its sponsors, their dependents, and associated data. Each animal has a unique animal number and each enclosure has a unique enclosure number. An animal can live in only one enclosure. An enclosure can have several animals in it or it can be currently empty. A zookeeper has a unique employee number. Every animal has been cared for by at least one and generally many zookeepers; each zookeeper has cared for at least one and generally many animals. Each time a zookeeper performs a specific, significant service for an animal the service type, date, and time are recorded. A zookeeper may perform a particular service on a particular animal more than once on a given day.
A sponsor, who has a unique sponsor number and a unique National Insurance number, sponsors at least one and possibly several animals. An animal may have several sponsors or none. For each animal that a particular sponsor sponsors, the zoo wants to track the annual sponsorship contribution and renewal date. In addition, Fester Zoo wants to keep track of each sponsor’s dependents. A sponsor may have several dependents or none. A dependent is associated with exactly one sponsor."
Any books or online resources are appreciated.
I need to extract all words after the following pattern "/[Ee]ach/ ([tag:NN]|[tag:NNS]) /has|have/ /\\w|[ ]|[,]/" until the end of the sentence but I am getting unexpected output:
in the second sentence I am getting: "Each campus has a" where the right output is "Each campus has a different name, address, distance to the city center and the only bus running to the campus "
in the third sentence I am getting "Each faculty has a " where the right output is " Each faculty has a name, dean and building "
in the fourth sentence the pattern is unable to match the right output which is " each problem has solution, God walling"
It will be appreciate if you could help me in solve this problem, I think that there my pattern has not been written correctly , below is my code
String file="ABC University is a large institution with several campuses. Each campus has a different name, address, distance to the city center and the only bus running to the campus. Each faculty has a name, dean and building. this just for test each problem has soluation, God walling.";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation(file);
pipeline.annotate(document);
List<CoreLabel> tokens = new ArrayList<CoreLabel>();
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for(CoreMap sentence: sentences)
{
for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class))
tokens.add(token);
TokenSequencePattern pattern = TokenSequencePattern.compile("/[Ee]ach/ ([tag:NN]|[tag:NNS]) /has|have/ /\\w|[ ]|[,]/");
TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
while( matcher.find()){
JOptionPane.showMessageDialog(rootPane, matcher.group());
}
tokens.removeAll(tokens);
}
Our large SMS corpus in French (88milSMS) is available. User conditions and downloads can be accessed here: http://88milsms.huma-num.fr/
Is there a website that list all corpora available for NLP and text-mining communities?
Any electronic resources include books, example, tutorial are appreciated.
I need clauses or phrases from a sentence.
There is lot of literature on genre evolution/transformation and the internet in general, and I've found some linguistic and discourse/communication oriented literature on on-line book and movie reviews, but - up to now - very little work dedicated specifically to the more recent product or consumer reviews dedicated to all kinds of objects, from cell phones to travel destinations, and published e.g. on thematic websites and the connected forums (where users often post reviews or review fragments, in some cases mixed with other kinds of posts). Has anyone come across text linguistic or discourse analytical work about these genres?
If there is any, what is the underlying technology?, i.e is it formant based, unit selection based, concatenative etc.?