Science topic

Text Mining - Science topic

Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
Questions related to Text Mining
  • asked a question related to Text Mining
Question
1 answer
I am experiencing an issue when trying to access the 'text mining' section in the Stitch database. Upon entering this section, which is supposed to display literature sources, no information is loaded.
Relevant answer
Answer
In Stitch, which is a platform often used for ETL (extract, transform, load) processes and data integration, there are a few key settings and permissions to check to ensure data loads correctly into a text mining environment. Here’s a checklist that might help:
1. Data Sync Frequency and Scheduling
  • Ensure that the data pipeline is set to run at the appropriate frequency for your needs. This is especially important if your text mining section depends on up-to-date data.
  • Verify that scheduled runs are completing successfully without interruptions.
2. Table and Field Selection
  • Confirm that all necessary tables and fields for text mining are selected in Stitch for syncing.
  • Some platforms have the option to sync only specific fields or tables to reduce load, so make sure no essential data for text mining is omitted.
3. Schema Mapping and Transformations
  • Check that the schema in Stitch aligns with what your text mining application expects. Any schema changes (e.g., column names, data types) may affect the data load.
  • If you’re using Stitch’s transformations, ensure they don’t alter or drop any fields that your text mining depends on.
4. Permissions and User Roles
  • Make sure that the Stitch database user has the necessary permissions to access, read, and write the relevant tables.
  • Review both Stitch’s permissions and the permissions of the target database to ensure there are no restrictions that might prevent data from loading.
5. Error Handling and Alerting
  • Enable any available error logging or alerts in Stitch, so you’re notified if there’s a problem with the data sync that could impact text mining.
  • This is useful for catching issues like data truncation, connection failures, or data type mismatches.
6. Data Quality Checks
  • Ensure that Stitch includes data validation checks to confirm the data is complete and accurate before it reaches your text mining stage. Missing or inconsistent data can affect mining results.
  • Set up alerts or checks for unexpected data changes (like sudden null values or data anomalies) that could impact text mining accuracy.
Let me know if you need help with a specific setting in Stitch or the target database!
  • asked a question related to Text Mining
Question
3 answers
Let us share your insights regarding latest trends and developments in text mining research and applications.
Relevant answer
Answer
Advanced Natural Language Processing (NLP) Techniques:
  • Contextual Language Models: Models like BERT and GPT-3 have revolutionized NLP, enabling deeper understanding of context, nuances, and intent.
  • Sentiment Analysis: More sophisticated techniques are being developed to analyze sentiment with greater accuracy, considering sarcasm, irony, and cultural nuances.
  • Topic Modeling: Advanced topic modeling algorithms can uncover intricate thematic structures within large text corpora.
2. Multimodal Text Mining:
  • Text and Image Analysis: Combining text and image data to extract richer insights, such as analyzing product reviews with accompanying images.
  • Text and Audio Analysis: Analyzing transcripts of spoken language along with the audio itself to capture nuances and emotions.
3. Ethical Considerations and Bias Mitigation:
  • Fairness and Bias: Researchers are focusing on developing techniques to mitigate biases in text mining algorithms, ensuring fair and equitable outcomes.
  • Privacy and Security: Addressing privacy concerns and implementing robust security measures to protect sensitive textual data.
4. Domain-Specific Text Mining:
  • Healthcare: Extracting information from clinical notes, medical literature, and social media to improve patient care and drug discovery.
  • Legal: Analyzing legal documents to identify patterns, extract key information, and support legal decision-making.
  • Finance: Analyzing financial news, reports, and social media to predict market trends and assess risk.
5. Text Mining for Social Media Analysis:
  • Sentiment Analysis: Monitoring brand reputation and customer sentiment on social media platforms.
  • Topic Modeling: Identifying emerging trends and popular topics on social media.
  • Community Detection: Analyzing social networks to identify influential users and communities.
6. Text Generation and Summarization:
  • AI-Generated Text: Creating human-quality text, such as news articles, product descriptions, and creative writing.
  • Text Summarization: Condensing long documents into concise summaries, aiding in information retrieval and analysis.
7. Text Mining for Knowledge Graph Construction:
  • Knowledge Graph: Building structured representations of knowledge from textual data, enhancing information retrieval and reasoning.
By staying abreast of these trends, researchers and practitioners can unlock the full potential of text mining, driving innovation and decision-making across various industries.
  • asked a question related to Text Mining
Question
5 answers
Hello everyone,
I want to find emerging pattern of blockchain applications in cybersecurity . I’ve collected and filtered my dataset which now consists of 1183 research items indexed in WoS and scopus. Which text mining algorithms can fulfill the purpose?
I found burst detection and LDA suitable but as a tourism student i want to know about other possibilities and the suggestions of professionals.
Best wishes.
Relevant answer
Answer
One text mining algorithm that can fulfill the purpose of identifying emerging patterns of blockchain applications in cybersecurity from your dataset of 1183 research items indexed in WoS and Scopus is topic modeling using Latent Dirichlet Allocation (LDA). LDA is a probabilistic model that can discover hidden topics within a collection of documents by assigning probability distributions to words and topics. By applying LDA to your dataset, you can uncover the underlying themes and topics related to blockchain applications in cybersecurity. This algorithm can help identify patterns, common trends, and relationships among the research items, enabling you to gain insights into the emerging patterns in this domain.
  • asked a question related to Text Mining
Question
1 answer
yes. for further details contact now
Relevant answer
Answer
Yes. I suggest doing a search of ResearchGate using the terms "r package topic model" and following up on the top articles on topic modeling in R. There are other packages you can find by browsing the CRAN archives of R packages, but these articles are a good place to start.
I also recommend the book Text Analysis with R for Students of Literature by Matthew Jockers and Rosamond Thalken.
  • asked a question related to Text Mining
Question
7 answers
Hey everyone.
I am working on my research dissertation. I want to use LDA (Latent Dirichlet Allocation Model) on my data. I found out about program Orange Data Mining programme (available here: https://orangedatamining.com/).
Does anyone know how to do correctly LDA in this programme?
And for everyone who knows how to do correctly LDA: what results have to be reported about my LDA analysis? I guess I have to report it like this: ?
Any advice on LDA (how to do it in R per example or in program Orange ...) will be very helpful. I am a beginner at this method but I really want to use it cause it is the best method for my research question.
Thank you
N. A.
Relevant answer
Answer
There are slovenian stopword list available in R. You may use stopwords::stopwords(„sl“, source=„nltk“).
An alternative source is „stopwords-iso“ under the same language code.
  • asked a question related to Text Mining
Question
6 answers
Hi everyone.
I am doing LDA (Latent Dirichlet Allocation) in R. I have two questions.
1. QUESTION
I am analysing comments from Slovenian social media. But I organised the data as follows and I would be happy if someone could help. In the Excel document, I have the authors of the comment written in one column, and I have the content of the comment written in the other column. So I have approx. 4,000 rows and each row has two columns - one for the author and one for the comment. I had all of these comments “separate” in my document, but I wanted to combine them. I obtained each group of comments from individual web portals (eg. Facebook posts, comments under articles, Reddit debates, ...). And I combined all these documents of comments into two columns. So now all comments are written in one column. That's my corpus (it is binary now - all the comments in one row). Can I use the LDA in the R program on this data set? Or do comment groups need to be separated into individual documents for the LDA method? I hope my question is clear, thank you so much.
2. QUESTION
How do you add Slovenian stopwords in R? Do you know it maybe? Because I got the message of error in R saying: Error in stopwords("slovenian"): no stopwords available for 'slovenian'.
I would be happy if someone could help.
Best regards, N. A.
Relevant answer
Answer
Michelangelo Misuraca thank you so much! It worked. I have one question: how do we compute coherence score for Latent Dirichlet Allocation?
Example of computing coherence score: Vaping discussion in the COVID-19 pandemic: An observational study using Twitter data ( )
  • page no. 5
I want to know how many topics make sense to define in the LDA model. And as I know this is done by calculating coherence score (for each of number of topics?
And how do we do it in R?
Thank you so much for your help.
  • asked a question related to Text Mining
Question
2 answers
Hi, Could you please guide me how to conduct Latent Semantic Analysis through text mining for my business research, any website, book or tutorial videos? so I can apply this method for my research project. Thanks in advance. Kind regards Bushra Aziz
Relevant answer
Answer
Text Analytics Toolbox of MATLAB maybe suitable for your task. In practice, it is more friendly to beginners compared with Python tools. On the official website and its help centre, tutorial materials are provided in the manner of step by step. As well, some videos you can find on Youtube about it.
  • asked a question related to Text Mining
Question
3 answers
I require some suggestions and need a health insurance dataset where text mining can be possible.Any recent papers addressing dataset can be helpful
Relevant answer
Answer
Dear Anuradha,
Please check the following link:
  • asked a question related to Text Mining
Question
7 answers
I have a data set that contains a text field for approximately more than 3000 records, all of which contain notes from the doctor. I need to extract specific information from all of them, for example, the doctor's final decision and the classification of the patient, so what is the most appropriate way to analyze these texts? should I use information retrieval or information extraction, or the Q and A system will be fine
Relevant answer
Answer
DEAR Matiam Essa
This text mining technique focuses on identifying the extraction of entities, attributes, and their relationships from semi-structured or unstructured texts. Whatever information is extracted is then stored in a database for future access and retrieval.The famous technique are:
Information Extraction (IE)
Information Retrieval (IR)
Natural Language Processing
Clustering
Categorization
Visualization
With the increasing amount of text data, effective techniques need to be employed to examine the data and to extract relevant information from it. We have understood that various text mining techniques are used to decipher the interesting information efficiently from multiple sources of textual data and continually used to improve text mining process.
GOOD LUCK
  • asked a question related to Text Mining
Question
6 answers
For unsupervised text clustering, the key thing is the init embedding for text.
If we want to use https://github.com/facebookresearch/deepcluster for text, the problem for text is how to get the init embedding from deep model.
BERT can not get good init embedding.
If we do not use deep model, is there better way to get embedding better than glove wordvec?
Thank you very much.
Relevant answer
Answer
Dear Tong Guo
In the following paper,
a new embedding technique based on deep learning for text clustering has been proposed.
  • asked a question related to Text Mining
Question
7 answers
I would like to make an extractive text summarization dataset by crawling webpages, however, I can't manually annotate it (summarize it), do you know any way to summarize it?
  • asked a question related to Text Mining
Question
3 answers
Please suggest R packages and codes for text ming (or any other programming) to search pubmed database.
Relevant answer
Answer
Ajit Kumar Singh Enter a free text search into the PubReMiner tool, and it will search PubMed for results. The program analyzes these data and generates tables that rank the frequency of terms in the articles' titles and abstracts, as well as related MeSH categories.
  • asked a question related to Text Mining
Question
7 answers
I have some Key Informant Interview (KII) data. I want to apply Natural Language Processing (NLP) to identify the pattern in the data. Can applying NLP for analyzing KII be mentioned as data analytics tools in the report/paper?TIA
Relevant answer
Answer
Of course, it is an interesting work. For example, (1) using NER (Named Entity Recognition), RE (Relation Extraction) to construct Knowledge Graph, then analyzing the relations between the interviewees or the knowledge constitution of an interviewee ; (2) using EE (Event Extraction) to identify the event correlation between the questions and answers; (3)using SA (Sentiment Analysis) to analyze the attitudes toward to the interviewer or the company, etc; (4) using topic models to analyze the topics about the interview and finding out which topic the interviewers are most interested in; etc.
Many,many interesting jobs you can do by using NLP analysis. Wish you finished an interesting paper in some days.
  • asked a question related to Text Mining
Question
3 answers
I have a research-related question to how can I easily read my results off a Co-occurrence network from VOS Viewer. Please provide any links to articles I can relate to.
Relevant answer
Answer
Dear Dhvani H Kuntawala,
Related to your query, I suggest you follow https://www.youtube.com/watch?v=sW893WYvQGM
  • asked a question related to Text Mining
Question
3 answers
Where can I find a reliable source of historical newspapers with lower costs of subscriptions? already tried with the new york times archive but I can't download the newspapers and I need it for a text mining project, any suggestions will be welcome!
Relevant answer
Answer
I suggest:
  1. 19th Century US Newspapers.
  2. Accessible Archives.
  3. Ancestry.com.
  4. Chronicling America.
  5. Early American Newspapers, Series 1, 1690-1876.
  6. Footnote.com.
  7. GenealogyBank.
  8. Google News and News Archive.
Kind Regards
Qamar Ul Islam
  • asked a question related to Text Mining
Question
7 answers
I am looking for a software for searching a text in a set of files. Any recommendations?
It should be something similar to the Multi Text Finder.
The aim is to teach students to find an important information in documents.
Relevant answer
Answer
Sarayut Chaisuriya
Searching by one or several keywords.
  • asked a question related to Text Mining
Question
4 answers
I’m doing topic model with a collection of technical documents related to the repair of device. The reports are extracted from different softwares from different repair shops. I need to do proper cleaning so model focuses on the key words, specifically I want to automatically remove useless words like:
* Additional findings
* External appereance
* Incoming condition, etc
These "fil-in / template word" are found in almost every document and there are even more others, the documents are collected from different sources and consolidated in one database from which I do the extractions.I already tried segregating by repair shop using tfidf, term frequency, bm25 and segregating by software.
Relevant answer
Answer
I am building an app to help with some of these problems, for example we added a cleaner (it's called not very accurately 'remove custom stopwords') where you can input the words you want removed. You can try the app here, it's still in beta, would love your feedback: https://sagetextipocapp.azurewebsites.net/
  • asked a question related to Text Mining
Question
3 answers
Well, I'm seeking to find "stop words" for my text mining project with the R programming language. My project aims to find the most appropriate keywords for my thesis's systematic literature review, focusing on digital transformation. Can anyone help me through this? Remember that I don't mean Stop-words as very high-frequency words that serve a grammatical purpose though I want to create a list of over-indexed words in the digital transformation field.
Relevant answer
Answer
If I am not mistaken, you want to create a list of most used words relevant to the Digital Transformation topic, and so free from common stop-words.
I imagine there may be lists of key words already out there, or you could compile key words from recent main articles.
Now to your point, there are ways to do that easily through R. I coded a quick example here:
where I extract the main non stop-words words (token in NLP: automatically individualized words) from one highly quoted Digital Transformation article.
Let me know if you manage to read and use the example. And if needed check this easy tutorial for tidytext (a simple and recent R package for NLP):
Good luck with your thesis!!
  • asked a question related to Text Mining
Question
16 answers
My master's research is in info retrieval and text mining. I would be grateful if you could help me to select a good topic for my phd research proposal.
Relevant answer
Answer
There is a growing interest in applyling machine learning to physical sciences, e.g., data-driven turbuelcen closure modeling, sparse-based discovering of governing equations, deep learning-based predictive modeling of chaos.
  • asked a question related to Text Mining
Question
9 answers
I am looking for a free software for text mining and sentiment analysis for my research on customer review mining (it involves calculating polarity of attributes,opinion oriented information extraction etc)
can somebody suggest if this can be done through NVIVO,is it free ?
also if you have any other suggession
Relevant answer
Answer
Agree, for example on https://realpython.com/sentiment-analysis-python/ a nice overview of all steps you have to explore. Success!
  • asked a question related to Text Mining
Question
6 answers
Hello,
I want to do some text mining of tweets. One of the questions is to understand people's expression of sympathy/empathy. I don't know if there're any ways to quantitatively do this?
Specifically, are there any lexicon dictionaries? For example, for moral foundation theories, there is a dictionary to do the detection. For sentiment analysis, there're also many lexicons or packages to achieve this.
Or, are there any pre-trained models or classifiers that can achieve this job?
Thanks in advance.
Relevant answer
Answer
By direct observation of an individual's behavior
  • asked a question related to Text Mining
Question
8 answers
We are currently working on a research project that aims at understanding the consumer behaviour for the cultural sector in Quebec-Canada during the COVID-19 crisis. For this reason, we are looking for tools for text mining and multi-language sentiment analysis (English and French) to analyze opinions on social media. We would prefer the cloud-based tools so that our students, who have limited resources and may not have the background in IT, can perform the analysis.
We would appreciate if you could help us to choose the right tool.
Thank you in advance,
Relevant answer
Answer
You can try Keatext Software.
It provides a free account for one user for universities and non-profit organizations. It also has multilingual support.
  • asked a question related to Text Mining
Question
5 answers
1. Can I use ORANGE towards text mining for qualitative research publication? For interview responses?
2. Is it an acceptable methodology?
3. Can you pls refer me to any already published reputed material who used Orange?
Relevant answer
Answer
Dear Quazy, 1) You can use the software that best fit to your requirements. Maybe, it could be Orange, R, IBM Statistics, SAS, Phyton libraries, etc. Even, you could use statistical approaches also, such as the correspondence analysis; 2) a software is not a methodology...take a look of traditional approaches before make your choice (e.g., crispdm, semma, kdd process, etc.); 3)https://doi.org/10.1007/978-3-540-30116-5_58
I hope it is useful for you!
Regards,
  • asked a question related to Text Mining
Question
7 answers
I am involved in a project that adds value in visualizing misclassification in the text mining domains. I am wondering whether anyone has experience in formally proofing that the visualisations are in fact aiding the overall data science project outcome.
Relevant answer
Answer
Visualization involves many aspects. Scientific aspect is mostly manifested as interface that can augment, by strengthening, human-data interaction. In scientific field visualization has a wide sense, which is lesser dependent on perception. For instance, sonification and haptification are usually applicable with a similar success. In the art, visualization is mostly manifested and intended to augment human-to-human mental communication. However, this aspect is lesser developed and has not yet enough tools to uncover opportunities and prospects of the novel method or the concept. So far, it is highly dependent on perception and primitive tools available. Mental visualization, in a wide sense, is the next step forward to augment the way of human thinking in both human-data interaction or human-human communication or perception the world.
  • asked a question related to Text Mining
Question
1 answer
I want to make an adjacency matrix with citations.
I want to make an index of 130 words and search 130 papers against the 130 words. Manually this is a long process. But I want to automate the searching.
Can anyone suggest if this can be done with text mining or any other ways?
Relevant answer
Answer
Yes, it is the classical task of NLP
  • asked a question related to Text Mining
Question
12 answers
I am working on the answers of the stakeholders in freight transport area and deveopment of crowd logistics solutions. I need to implement text minig. Do you know any other free software than R for text mining?
Relevant answer
Answer
For text mining, WEKA can also be a better choice for less effort.
  • asked a question related to Text Mining
Question
11 answers
Hi Everyone,
I have a documents search engine, and the users have ability to Rate the search result for any query they make. For the first versions of Search engine, I am using Universal Sentence Encoder to generate document embeddings, and at the time of search User search queries are also embedded and the documents with most closest embeddings are presented in search result.
User can rate a documents from the search result, on some scale, say 0 to 5 (0 being Not Relevant and 5 is Very relevant)
Using this kind of feedback is there a way we can fine tune the search results?
One idea is using BERT with triplet loss, where we can use:
Anchor : User Search Query
Contradiction : Document which User found Not Relevant
Entailment: Document user found very relevant
Anybody experience in doing this? or any other ideas, suggestions , papers are welcome.
Relevant answer
Answer
If the cheating thing is under control you can use several proposed methods to include relevance feedback like:
  • asked a question related to Text Mining
Question
2 answers
Can anyone make a simple example based on a small database? I need to compute by hand to understand it.
I have attached an example. Please explain it with more details for me.
Thanks a lot
  • asked a question related to Text Mining
Question
10 answers
I want an Arabic dataset specially in chatting 
Thanks 
Relevant answer
Answer
You can find it at: https://metatext.io/datasets
  • asked a question related to Text Mining
Question
5 answers
I'm looking for a dataset comprises of reviews, ratings, reviewer nationality, name and place of the hotel, and so on.
Relevant answer
  • asked a question related to Text Mining
Question
4 answers
I am interested in a software tool for extracting data from social networks based on geographic characteristics. Purpose - analysis to obtain data on the mood of the population.
Relevant answer
Answer
Wow! Thanks a lot - I'll try this resource!
  • asked a question related to Text Mining
Question
14 answers
I am trying to do text mining on Chinese reviews. I have tried out many softwares, like the RapidMiner, Chinese Text Analytics, Python. Most of them seem to require certain level of programming knowledge. And RapidMiner requires the extension of Hanminer, but I don't know why it is still not working. I found LIWC which seems to be able to analyze Chinese text and I purchased the software. But now, I have a difficulty in segmenting the text using the Standford Segmenter, which again requires some programming works. Any recommendations on how I can do this? Or any recommendations on an easier way of analyzing Chinese reviews? Many thanks!
Relevant answer
Answer
Hi, I just bumped into a recent publication in one of the top tourism journals entitled: Will you miss me if I am leaving? Unexpected market withdrawal of Norwegian Joy and customer satisfaction.
They used an online tool developed by Baidu: http://ai.baidu.com/tech/nlp/sentiment_classify
Hope it helps
  • asked a question related to Text Mining
Question
3 answers
I am planning to use text mining as a method to collect data from social media. Do you know of any key literature that explains the method?
Relevant answer
Answer
  • asked a question related to Text Mining
Question
14 answers
Want to know about current research trends in Machine learning and Natural language Processing (NLP) - Code-mixed text in detail as soon as possible. This is for a research project of (theoretical) computer science. Thanks in advance.
Relevant answer
Answer
Language models, pre-trained models, transfer learning, and sentence embedding are the top trends of NLP right now (they are related to each other). You can check out the latest NLP conferences, such as EMNLP, and see that many papers are relying on BERT/ROBERTA/ALBERT/... models.
PS: I have published two papers on these methods.
  • asked a question related to Text Mining
Question
4 answers
I want to make sure that whether the model is overfitting or not! My study focused on unstructured tweets. I labelled the tweets with TextBlob and I used LinearSVC to get the classification evaluation. the model accuracy is 98%. Now I doubt it is overfitting! is that normal that much high accuracy or what might be my mistake?
Thanks
Relevant answer
Answer
Yes even i observe the same
  • asked a question related to Text Mining
Question
3 answers
Hello
I have the following situation: I have a paper X about topic Y. For paper X I did a forward search with Web of Science (checking all new papers which cite paper X). Then I have downloaded all articles I have identified via forward search (approx. 1'000 Papers). Now I would like to sort these papers according to the frequency of specific keywords used.
For example: I have found paper Z via forward search (so paper Z cites paper X which is about topic Y). Now I want to check if paper Z is also concerned about topic Y or if it just refers to it in passing. For that I search for specific keywords which correspond to topic Y. According to the frequency of the specific keywords mentioned in paper X, I want to classify it in the category "relevant" or "not relevant". Now, how can I determine the threshold for the keywords? That is, if paper X only uses the specific keyword once it is most probably not relevant to topic Y. But if it mentions the specific keyword 20 times it is probably relevant for topic Y.
Is there a recognized methodology to determine or approximate a threshold for the keyword frequency which allows to distinguish if a paper is relevant to topic Y or not?
With this approach I hope to reduce the 1'000 papers to those which are about topic Y.
  • asked a question related to Text Mining
Question
5 answers
i want to give my master students research topics related to Text mining especially regarding language processing so anyone will guide me in this regard?
Relevant answer
Answer
  • asked a question related to Text Mining
Question
8 answers
Hi Folks,
I need your help regarding the Artificial Intelligence Context of Information Retrieval tools and Big Data & Data Mining in the libraries? Dissertation/Thesis, research paper, conference Paper, Book chapter, Research Project and Article can you share with me. I will also welcome you comments, thought and feed back in the context of University libraries support me to designed my PhD Questionnaire.
-Yousuf
Relevant answer
Answer
Dear Colleagues and Friends from RG,
In my opinion, in the coming years, one of the key applications of artificial intelligence integrated with other Industry 4.0 technologies, including Big Data Analytics, will be improvement of information search on the Internet.
Conducted scientific research confirms the strong correlation between the development of Big Data technology, Data Science analytics, Data Analytics and the effectiveness of the use of knowledge resources. I believe that the development of Big Data technology and Data Science analytics, Data Analytics and other ICT information technologies, multi-criteria technology, advanced processing of large information sets, and Industry 4.0 technology increases the efficiency of using knowledge resources, including in the field of economics, finance and organization management. In recent years, ICT information technologies, Industry 4.0 etc. have been developing dynamically and are used in knowledge-based economies in particular. These technologies are used in scientific research and business applications in commercial enterprises and in financial and public institutions. Due to the growing importance of this issue in knowledge-based economies, an important issue is the analysis of the correlation between the development of Big Data technology and Data Science analytics, Data Analytics, Business Intelligence and the effectiveness of using knowledge resources to solve key problems of civilization development. The use of Big Data, Data Science, Data Analytics, Business Intelligence and other ICT information technologies as well as advanced data processing Industry 4.0 in the processing of knowledge resources should contribute to increasing the efficiency of knowledge resource processing in knowledge-based economies, including in the field of economics and finance.
In recent years, the scope of applications of Big Data technology and Data Science analytics, Data Analytics in economics, finance and management of organizations, including enterprises, financial and public institutions, has been increasing. Therefore, the importance of implementing analytical instruments for advanced processing of large data sets in enterprises, financial and public institutions, i.e. the construction of Big Data Analytics platforms to support organization management processes in various aspects of operations, including the improvement of customer relations, is also growing. In my opinion, scientific research confirms the strong correlation between the development of Big Data technology, Data Science analytics, Data Analytics and the effectiveness of the use of knowledge resources. I believe that the development of Big Data technology and Data Science analytics, Data Analytics and other ICT information technologies, multi-criteria technology, advanced processing of large information sets, and Industry 4.0 technology increases the efficiency of using knowledge resources, including in the field of economics, finance and organization management. In recent years, ICT information technologies, Industry 4.0 etc. have been developing dynamically and are used in knowledge-based economies in particular. These technologies are used in scientific research and business applications in commercial enterprises and in financial and public institutions. Due to the growing importance of this issue in knowledge-based economies, an important issue is the analysis of the correlation between the development of Big Data technology and Data Science, Data Analytics, Business Intelligence and the effectiveness of using knowledge resources to solve key problems of development of business entities. In recent years, the use of 5G technology to collect data from the Internet can significantly contribute to improving the analysis of sentiment of Internet users' opinions and the possibility of extending the use of research techniques carried out on Business Intelligence, Big Data Analytics, Data Science and other research techniques using ICT information technologies , internet and advanced data processing typical of the current fourth technological revolution referred to as Industry 4.0.
In recent years, organization management processes have been improved through the implementation of information technology and advanced data processing technologies Industry 4.0 into the IT analytical platforms Business Intelligence, Big Data Analytics, etc. The technologies of advanced analysis of big data sets Big Data Analytics and research processes carried out on Business Intelligence platforms are used also to improve business management processes. Data collection processes on the Internet can be supported by the use of 5G technology. In recent years, information technology management models in organizations have been enriched with advanced 4.0 industry data processing technologies, including cloud computing, Internet of Things, artificial intelligence, machine learning and more. The use of information systems in built models of information technology management in organizations, etc. is currently taking place in many areas of functioning of various types of business entities. The use of ICT information technologies and advanced data processing technologies i.e. typical for the current technological revolution Industry 4.0 already covers almost the entire functioning of business entities, from computerized sales support systems, logistics, accounting, reporting, risk management to marketing activities on the Internet and designing new products and innovative solutions in information systems. Online banking is starting to dominate, whose development is determined by technological progress in the field of ICT and Industry 4.0 information technologies. Computerization is also increasingly affecting public sector institutions servicing tax systems and settlements of business entities. Business Intelligence analytical platforms have also been developed for several years in the SME sector. Business Intelligence systems supporting analytical processes and organization management are produced by IT companies not only for large corporations. The analyst of large information sets in Big Data databases is also developing. Big Data Analytics and Data Science analytical systems are used by more and more types of business entities to analyze both the markets in which they operate and complex processes that are conducted or diagnosed and researched in these enterprises. Computerization also covers financial and economic risk management processes, etc. In all these areas of ICT technology application, building and improving IT technology management models in organizations is also an important issue. Therefore, specific information technology management models should be tailored to the specifics of the operations of a particular business entity, enterprise, company, corporation, public institution or financial institution.
On the other hand, the collection of large data sets about users of specific websites and portals in Big Data database systems generates new categories of information security risk. The database of social media portal such as Facebbok is already a powerful collection of information. Some research centers specializing in the use of large data sets Big Data downloaded from social media portals through the analysis of sentiment I prepare reports that can be helpful in forecasting phenomena and processes in the future. Medicine is one of the areas where there are great opportunities in this matter. For example, insurance companies and commercial banks that grant loans may be interested in information posted by users on Facebook and possibly also on other social media sites. Apparently, some insurance companies and commercial banks during the analysis of an application for insurance or credit are looking at the information content of accounts, applicant profiles, potential client, contractor posted on social media portals.
Another area of ??application of analytics carried out on large data sets collected in Big Data database systems is sentiment analysis in the field of surveying the opinions of Internet users regarding specific products and / or services and companies producing them. Large amounts of information downloaded from comments, entries, posts from social media portals are processed in Big Data database systems to determine, e.g. consumer awareness regarding the offer of products and services of specific companies. This type of information is of great importance for the purpose of planning advertising campaigns informing about the mission, idea, product offer, usability features of a given company's offer. This type of data may be relevant to forecasting changing consumer preferences for specific companies' offers. Techniques for collecting analytical data on the Internet can be supported by the use of 5G technology.
I am also involved in research on knowledge management using the Big Data computerized database platforms. In my publications available on the Research Gate portal, I described the key determinants of the development of Big Data technology and the security of information obtained from the Internet, collected and processed in Big Data databases. I also described the development of analytics using Business Intelligence platforms that are used in enterprises. Business Intelligence based analytics, as well as Data Science and Big Data Analytics are increasingly being used to improve business management processes. The development of this analytics based on the implementation of ICT and Industry 4.0 information technologies into analytical processes has a great future ahead of it in the coming years. I invite you to cooperation.
One of the areas in which the possibilities of market analytical technology applications, including data downloaded from internet portals, are growing is the marketing of enterprises and institutions. In recent years, the development of marketing is determined by the development of Industry 4.0 technology and the development of open innovations on the Internet. Open innovations developed on the Internet concern, among others, free information and marketing services. The issue of the possibility of publishing specific content, texts, banners, comments etc. on the Internet and obtaining free information are key determinants of the development of information services on the Internet. On the other hand, the largest internet technology corporations earn income mainly from paid marketing services. Therefore, the Internet environment is a kind of mix of free and paid information and marketing services, which are simultaneously, simultaneously and simultaneously interrelatedly developed by various Internet companies. Currently, research is conducted into the analysis of the development of open innovations in the field of free information services, which are the main factor of business success of the largest online technology companies, which include such concerns as Google and social media portals such as Facebook, Instagram, YouTube, Tweeter, LinkedIn and others .
The development of internet information services will be determined by technological progress in the field of new ICT, communication technologies and advanced data processing techniques typical of the current technological revolution referred to as Industry 4.0. The development of information processing technology in the era of the current technological revolution called Industry 4.0 is determined by the use of new information techniques, for example in the field of e-commerce and e-marketing. These solutions are the basis for the business success of the largest online technology concerns that offer information search, data collection and processing services in the cloud (e.g. Google) and provide information services on platforms developed in social media portals (e.g. Facebook, Instagram, YouTube, Tweeter, LinkedIn, Pinterest, and more).
The current technological revolution referred to as Industry 4.0 is motivated by the development of the following factors: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced technologies of Data Mining.
The information technologies mentioned above, combined with the improvement of ICT and communication technologies, along with the progressive process of increasing the computing power of computers will become an important determinant of technological progress in various branches of industry in the coming years. Based on the development of these new technological solutions, the processes of innovatively organized analyzes of large information collections gathered in Big Data database systems and computing cloud computing for the purposes of applications in such fields as machine learning, Internet of Things, artificial intelligence have been dynamically developing in recent years. , Business Intelligence. To this can be added other areas of advanced technologies for analyzing large data sets, such as Medical Intelligence, Life Science, Green Energy, etc. Processing and multi-criteria analysis of large data sets in Big Data database systems is performed according to the V4 concept, i.e. Volume (meaning large number of data), Value (large values ??of specific parameters of the information analyzed), Velocity (high speed of new information appearing) and Variety (high information diversity). The above-mentioned advanced technologies for processing and analyzing information are increasingly used for the needs of marketing activities of various business entities that advertise their offer on the Internet or analyze the needs in this regard reported by other entities, including companies, corporations, financial and public institutions. More and more commercially operating business entities and financial institutions conduct marketing activities on the Internet, including on social media portals. The possibilities of collecting market data on the Internet in subsequent years can be significantly expanded by using 5G technology.
The information and communication technologies listed above, combined with the improvement of ICT technologies and the implementation of Business Intelligence analytics into the processes of economic and financial, economic, macroeconomic and market analyzes may be instrumental instruments helpful in the efficient and effective management of economic, investment processes and enterprises, including analyzes carried out for the purposes of improving marketing activities in enterprises. More and more companies, banks and other entities need to carry out multi-criteria analyzes on large data sets downloaded from the Internet describing the markets in which they operate and contractors and clients with whom they cooperate. On the other hand, there are already specialized technology companies that offer this type of analytical services, prepare commissioned reports, which are the result of such multi-criteria analyzes of large data sets obtained from various websites and from entries and comments contained on social media portals. An important research technique that has been developing in recent years, the effects of which are used for the purposes of marketing activities of companies, is sentiment analysis carried out on large data sets collected from the Internet and stored in Big Data database systems.
In order to group the behavior of social media users into specific classes of behavior, these classes must first be defined. Sentiment analysis using large data sets collected from entries and comments from social media portals and transferred to Big Data database platforms can be helpful. Then, when observing the changes in certain types of behavior of users of social media portals, you can analyze the data collected in Big Data according to these observations. In addition, a useful tool can be an analysis of the behavior of users of social media portals based on current posts, entries and comments on specific social media pages, statistical analysis of comments on specific topics of posts. This type of research is carried out by online technology companies that run social media portals and use the results of these studies to develop their viral marketing services, because this field of marketing is a key determinant of revenue generated by these companies from advertising sales on social media portals. The basis of marketing activities conducted in this way are market research conducted by collecting market data from the Internet regarding the offer of individual companies, their competition, demand for specific products and services from Internet users as well as collecting, processing and analyzing this data in Big Data Analytics database and analytical systems. The process of collecting market data from specific websites can be improved by using 5G technology.
Industry 4.0 technologies are also used in the development of transaction systems and transaction security in the field of e-commerce and online banking. The key determinants of the globally developing e-commerce relate primarily to the implementation of ICT information technologies and advanced data processing technologies, i.e. industry 4.0 typical for the current technological revolution to computerized, automated transaction systems supporting online trading. In addition, the use of blockchain technology for transaction security systems and data transfer on the Internet. The use of ICT information technologies and advanced data processing technologies i.e. typical for the current technological revolution Industry 4.0 to online transaction systems supporting e-commerce already applies to almost all the functioning of online stores, from computerized sales support systems, logistics, accounting, reporting, risk management to Internet marketing activities and improving security systems for online transactions. Another important determinant of e-commerce development is the development of online mobile banking available on mobile devices and new solutions related to the Internet of Things technology. Online banking is starting to dominate, whose development is determined by technological progress in the field of ICT and Industry 4.0 information technologies. Computerization is also increasingly affecting public sector institutions servicing tax systems and settlements of business entities. In addition, Business Intelligence analytical platforms supporting the management processes of companies operating also in the e-commerce sector have been developed for several years. The analyst of large information sets in Big Data databases is also developing. Big Data Analytics and Data Science analytical systems are also used by businesses operating also in the field of e-commerce.
In recent years, new internet marketing instruments have also been developed, mainly used on social media portals, and are also used by companies operating in the e-commerce sector. Internet technology and fintech companies are also emerging that offer information services on the Internet to support marketing management, including the planning of advertising campaigns for products sold via the Internet. To this end, sentiment analyzes are used to survey Internet users' opinions regarding dominant awareness, recognition, brand image, mission and the offer of specific companies. Sentiment analysis is carried out on large data sets downloaded from various websites, including millions of social media sites collected in Big Data systems. The analytical data collected in this way are very helpful in the process of planning advertising campaigns carried out in new media, including social media portals. These campaigns advertise products and services sold via the Internet, available at online stores. In view of the above, the development of e-commerce is determined mainly by technological progress in the field of ICT information technologies and advanced data processing technologies Industry 4.0 and new technologies used in securing financial transactions carried out via the Internet, including e-commerce related transactions, e.g. technology blockchain. I have described the above issues of various aspects of the application of information systems and ICT, including Big Data, Business Intelligence in companies operating on the Internet in my scientific publications available on the Research Gate portal. I invite you to cooperation.
According to the above, in my opinion, the use of 5G technology to collect data from the Internet will significantly contribute to improving the analysis of sentiment of Internet users' opinions and the possibility of extending the use of research techniques carried out on Business Intelligence, Big Data Analytics, Data Science and other research techniques using information technologies ICT, internet and advanced data processing typical of the current fourth technological revolution referred to as Industry 4.0. At present, however, all the potential applications of 5G technology in economic and other applications are unknown. These applications will be wide in both business processes carried out by technological internet companies as well as by security institutions. Globally operating technology internet companies, thanks to the use of 5G technology in research processes, will improve their offer of information, internet and marketing services addressed to Internet users. On the other hand, national security institutions and IT systems risk management departments operating in companies can also obtain a tool enabling a significant improvement of instruments ensuring a high level of security of information transferred via the Internet and other cybersecurity issues. Therefore, research on cyber security and e-commerce will be expanded to include the impact of 5G technology on the development of many aspects of these areas of activity of business entities, institutions and citizens increasingly using the Internet in various areas of business.
In view of the above, in my opinion in the coming years one of the key applications of artificial intelligence integrated with other Industry 4.0 technologies, including Big Data Analytics, will be improvement of information search on the Internet.
Best wishes.
Dariusz Prokopowicz
  • asked a question related to Text Mining
Question
5 answers
I would like to find an efficient way to perfom text mining methods and topic modeling on scientific publications. So far I have not been able to find a solution to my problem of making the texts available for processing in RStudio. Is there an easy way to form a corpus comprised of large numbers of text documents, e.g. .pdf files? Or is it even an R package of some sort that allows getting the texts directly from databases like Web of Science?
Any help, advice, tips, tricks and hints are highly appreciated. Thank you very much in advance!
Relevant answer
Answer
For large number of PDF files, any library to extract text from PDF can be used in a loop function. e.g. see https://ropensci.org/technotes/2018/12/14/pdftools-20/
  • asked a question related to Text Mining
Question
5 answers
Queries for search engines (such as google) contains some information about the intent and interest of the user, which can be used for user profiling, recommendation etc. As far as I know, there are already lots of methods to deal with relatively long texts, such as news, articles, essays, and extract useful features from them. However, queries are usually too short and may related to many different areas, I wonder if there are some advanced methods (not simple word embedding) already verified to be effective in extracting information from query texts? Thanks !!!
Relevant answer
Answer
Dear Prof. Eugene Veniaminovich Lutsenko:
Thank you for your detailed and professional comments!
BRs!
  • asked a question related to Text Mining
Question
3 answers
I'm just currently confused with what you are discussing, you would need to be more specific and more concise when talking about text mining and opinion mining.
Relevant answer
Answer
That's good, maybe you should talk about machine learning and natural processing languages.
  • asked a question related to Text Mining
Question
10 answers
I am preparing a paper on a biometric study about nursing informatics and I am interested in similar studies published or not published.
Relevant answer
Answer
Recently I wrote a paper about development of bibliometrics use in nursing informatics, which could be found on the following link
  • asked a question related to Text Mining
Question
5 answers
what is Contextual and Non-Contextual Features selection and how we use the Contextual feature selection? how it is use in text mining.?
Relevant answer
Answer
Dear Kapil Sethi!
Contextual feature: the result with use of it should be better than without it.
So, if you exlude the contextual feature the result will be worse.
It could be used for contextual feature detection.
In the numerous mining you can use also correlation matrix.
More precise definition you can find in the work
Robust Classification with Context-Sensitive Features by Peter David Turney
Research Example for Text Mining:
I start text mining from extracting features (e.g. words or n-grams).
For every text there is the vector of features inclusion
Example: text - "I robot"
features i you robot man
incl.vector 1 0 1 0
If I interested classification robotic <- > non-robotis:
I will see that feature "robot" will have large normalized frequency of apearing than in non-robotic text.
So, the feature "robot" is contextual.
Also I will see, that there are many features that do not influence the result (the frequency in robotic and non-robotic texts will be similar).
  • asked a question related to Text Mining
Question
2 answers
I am trying to extract content from my college web page as text for text mining project, can someone give me an example work flow i can use?
Relevant answer
Answer
hi,
kindly check this mail:
2https://www.knime.com/sites/default/files/inline-images/knime_web_knowledge_extraction.pdf
  • asked a question related to Text Mining
Question
6 answers
I am thinking of creating a search engine to help people find a movie or similar movie based on the snippets of the story. For instance, if a user type in "movie about dog waiting a long time for his owners to come back", the result should return "Hachiko" , "Eight below", "Lassie" etc. However, it would have been better if we can use data mining method to actually search it based on the plot of the movie not keywords. ​What is the best solution for this work?
Relevant answer
Answer
@Raju_Balakrishnan3
Thank you so much.
  • asked a question related to Text Mining
Question
5 answers
hi respected fellows, please help me to collect data which is ethically right to do some text analysis. i am intending to collect google reviews for "google home" for text mining to extract factors. please help me to identify a method to collect customer reviews for analysis.
Regards
Relevant answer
Answer
@Sultana I was explaining how I would collect a corpus with product reviews. First, you can download movie reviews using the Python means and, second, use the beautifulsoup to process the product reviews. If you need more information on how product reviews can be statistically processed, you can consult my thesis.
  • asked a question related to Text Mining
Question
4 answers
Hi there ! I'm loking for a tools that is able calculate a similarity score for each term of 2 list name and give me the top 10 similarity score he found. For exemple List 1 : (-)-epigallocatechin (+)-catechin (pyro)catechol sulfate 3',4'-Dimethoxyphenylacetic acid List 2 : 3',3'-Dimethoxy-phenylacetic acid catechin (epi)gallocatechin catechol sulfate Results (-)-epigallocatechin VS (epi)gallocatechin - Score = 0.9 (very similar) (-)-epigallocatechin VS (+)-catechin - Score = 0.5 (+)-catechin VS catechin - Score = 0.9 etc etc ... Thanks a lot for your great help.
Relevant answer
Answer
In order to determine the similarity of arbitrary terms, I would transform them into their sets (alternatively Bag-Of-Words BoW) and would use Jaccard similarity on the two sets (resp. an weighted derivation for BoW).
However, better suited would be to transform them into a set of their syllables, that would respect the domain language better. But I don't know whether that can be done for chemical compounds or molekules easily.
Hope that helps for first attempt.
  • asked a question related to Text Mining
Question
1 answer
Does NLP and NVivo softwares do the same operation or how they differ from each other? According to the following file that has used NLP for construction site accident analysis? Can we do the same using Nvivo software?
Relevant answer
Answer
You can do any kind of text mining application using pre-written softwares and tools. However, it is important to understand that the underlying concepts in those tools are also NLP and text mining approaches. Generally tools are very generic - i.e., they can be applied to any domain and the accuracy will not be very good for all the domains. Corpus specificity is one of the important problems with NLP applications. When this is the case, we can fine tune NLP and text mining algorithms according to the corpus in hand so that we get more accurate results which is why most people go in for NLP and text mining algorithms rather than tools when we want corpus specific mining like what is given in the attached pdf.
  • asked a question related to Text Mining
Question
4 answers
As a developer how could access Microsoft Web n grams service in development
  • asked a question related to Text Mining
Question
5 answers
I am working on quite some papers targeting organizational culture, corporate values and leadership. Traditional those topics have been widely researched by using either questionnaires or interviews. The limit in cases being covered as well as the often missing link to the companies (disclosure) was motivation for me to explore the options of text mining and NLP as tools for cultural research. I wonder what other think about this and if some of you have experience in this.
Otherwise if from interest I am ofc happy to share my knowledge and some of my paper drafts of published wok in this field.
/Björn
Relevant answer
Answer
sure
  • asked a question related to Text Mining
Question
4 answers
Most of the proposed algorithms concentrate on neighboring concepts (events), like "enter restaurant" --> "wait for waiter", but I have trouble finding papers on generating / retrieving longer scripts (I am not talking about narrative cloze task) which are evaluated for commonness.
Relevant answer
Answer
Arturo Geigel , thank you for the suggestion. I've been working on computational creativity for some time, but the problem is that e.g. poetry (less in case of humor) allow too much freedom, whereas commonsense behavior patterns (even if unlimited in quantity) are stricter and shared by most of people which makes them hard to generate, especially because they are not expressed in one chunk of text, they must be "glued together" from pieceses scattered in various texts. My guess is that before creating something original, you need to know what is common (and boring in a sense).
  • asked a question related to Text Mining
Question
3 answers
What are the available benchmarks for evaluating semantic textual similarity approaches?
I am aware of the following:
- SemEval STS
- Microsoft Research Paraphrase Corpus
- Quora Question Pairs
Do you use other that these in your research?
Relevant answer
Answer
Standard datasets include WordSim-353 (see Agirre et al. 2009), SimLex-999 (Hill et al. 2016) and Chiarello et al. (1990). If you're interested in paraphrases including longer phrases, I'd also look at the Penn Paraphrase Database (PPDB, Ganitkevitch et al. 2013):
  • Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, and Aitor Soroa (2009), A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of HLT-NAACL 2009. Boulder, CO, 19–27.
  • Christine Chiarello, Curt Burgess, Lorie Richards, and Alma Pollock (1990), Semantic and associative priming in the cerebral hemispheres: Some words do, some words don’t sometimes, some places. Brain and language 38(1):75–104.
  • Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch (2013), PPDB: The Paraphrase Database. In: Proceedings of NAACL-HLT 2013. Atlanta, GA, 758–764.
  • Felix Hill, Roi Reichart, and Anna Korhonen (2016), Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41(4).
  • asked a question related to Text Mining
Question
8 answers
Dear colleagues, I would like to generate a summary of all packages in R which can be used for big data research (data mining, web crawling, machine learning, text mining, social media analysis, neural networks, you name it).
It would be fantastic, if we can create a huge list
a) of the name of packages
b) a short summary what the packages does
c) references to tutorial (beyond the standard CRAN description).
Best,
Holger
Relevant answer
Answer
Hi Holger,
My two cents:
- Still in webscraping territory, "RSelenium" (https://www.rdocumentation.org/packages/RSelenium/versions/1.7.1 and http://johndharrison.github.io/RSOCRUG/) for dealing with "dynamic" websites (e.g. you need to interact with the website for it to generate the data you want).
- As for machine learning, there are plenty. Almost as many as there are algorithms. A nice general library, with several algorithms and simple sintax, is offered by the "caret" package (http://topepo.github.io/caret/index.html).
Hope it helps.
Best,
José
  • asked a question related to Text Mining
Question
5 answers
Dear all,
I would like to ask:
1. "What are the different approaches available to find character based, word based or line based similarities and differences among multiple text document?"
2. "Is there any open source library or source code available, which can help in identifying word, character or line based similarities and differences among multiple text documents?" The required library should not only provide me similar strings, but also provide me the exact location.
Please let me know about it, I would be thankful to you.
Relevant answer
Answer
Dear Muhammad Hammad,
Kindly search in the Scopus database for finding the recent trend or approaches on the problem being discussed.
  • asked a question related to Text Mining
Question
4 answers
I tend to reach out when I'm fairly clueless about something, this time no exception. Some background first though.
I research in Thailand and Australia, Thailand never being easy. Business managers tend not to be helpful. Why say yes and create risk, so possible loss of face (Thailand is a heavily face-based culture) when you can ignore or say "No"? Sometimes, though, the situation doesn't pan out that way at all. Currently, I have a number of business managers happy to help, saying "Yes". But there's an issue - it's low (green) season. There's a real paucity of clients at cookery schools for me to interview.
I sat reading cookery school reviews. One in particular had 169 reviews on Google. I kept seeing the words fun, funny and laugh. Over and over - suggesting people attend for fun. Equally I saw little comment around gaining cooking skills, very little. So, had I found my answer as to why people attend touristic cooking classes? Do I have to interview people at all? Why not just text-mine the reviews?
In fact, over the last day or two things have picked up, interviews nearly finished. But I'm still fascinated by text-mining, if only to capture a school's reviews for comparison against questionnaire responses.
My two questions are:
1. Do readers find text-mining a viable approach to inferentially discovering consumer motivations in the way I've said?
2. How acceptable do readers feel text-mining to be in academia, as opposed to marketing? I'm not sure I've even seen text-mining used, as referenced in an academic article. An exception is a Russian friend who is a big user, which might suggest that there are national differences on this?
Relevant answer
Answer
Dear Mark Azavedo,
Answers:
1. I think the answer will be yes. It is a viable approach.
2. Text mining is a popular tool for analysing unstructured texts. I have used it in my research. If you take my words, it is really useful, though it needs manual intervention.
Thanks,
Sobhan
  • asked a question related to Text Mining
Question
3 answers
How to retrieve location of tweets even though the location of users were turned off and what are features through which we find location?Any work on it till now?
Relevant answer
Answer
Hi Umair Arshad,
Please follow the papers below.
1. Srivastava, S. K., Gupta, R., & Singh, S. K. (2018). Simple Term Filtering for Location-Based Tweets Classification. In Speech and Language Processing for Human-Machine Communications (pp. 145-152). Springer, Singapore.
2. Liu, R., Cong, G., Zheng, B., Zheng, K., & Su, H. (2018, July). Location Prediction in Social Networks. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (pp. 151-165). Springer, Cham.
3. Ozdikis, O., Ramampiaro, H., & Nørvåg, K. (2018, March). Spatial Statistics of Term Co-occurrences for Location Prediction of Tweets. In European Conference on Information Retrieval (pp. 494-506). Springer, Cham.
4. Stowe, K., Anderson, J., Palmer, M., Palen, L., & Anderson, K. (2018). Improving Classification of Twitter Behavior During Hurricane Events. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media(pp. 67-75).
5. Li, P., Lu, H., Kanhabua, N., Zhao, S., & Pan, G. (2018). Location Inference for Non-geotagged Tweets in User Timelines. IEEE Transactions on Knowledge and Data Engineering.
Thanks,
Sobhan
  • asked a question related to Text Mining
Question
25 answers
Dear all,
Do you know any available data set for text summarization-with text summaries?
Relevant answer
Answer
Dear Keramatfar,
Luis Adrián Cabrera-Diego is right. Please go through this.
  • asked a question related to Text Mining
Question
3 answers
I have annotated my dataset by using POS, chunking, and wordcase. If I include dependency parser in my annotated dataset, will it help to define more features for the classification of Movie Name Entity? In short, will dependency relations improve the performance of the model for Move Reviews dataset? I need to identify movie names and person names from my corpus.
Relevant answer
Answer
Dear Mir,
You can check this.
  • asked a question related to Text Mining
Question
6 answers
I want to classify the news headline data. I am able to to make corpus , cleaning the data , train the data using SVM ( but only for small data set ) . I am not splitting the data into train and test data inspite i am using different set for test data ( but from headline data only).
I am able to train the model but while testing with test data .
Error: No. of Variables in both are different is coming.
Random forest ( Same Error)
I have tried Naive Bayes ( Accuracy is coming very less aprox 10%)
Relevant answer
Answer
Dear Garg,
If new headlines are related to positive or negative sentiment of any stakeholders, you can do text classification using sentiment analysis. R studio is good in this task; but I suggest you to use Python better better handling the problem.
  • asked a question related to Text Mining
Question
4 answers
Example of text mining in Quran
Relevant answer
Answer
With Python we can read Arabic text in the file and code.. but the instalation of wordcloud for example is complex.
  • asked a question related to Text Mining
Question
14 answers
I've read several times that on the problems of large dimension (Image Recognition, Text Mining, ...), the Deep Learning method gives significantly higher accuracy than the "classical" methods (such as SVM, Logistic Regression, etc.). And what happens on problems of ordinary, medium dimension? Let's say that the data set is on the order of 1,000 ... 10,000 objects and the object is characterized by 10 ... 20 parameters. Are there articles that provide data on the comparison of accuracy indicators (Recall, Precision, ...) by Deep Learning and other methods on some benchmarks?
Thanks beforehand for your answer. Regards, Sergey.
Relevant answer
Answer
Madam Murthy is right
  • asked a question related to Text Mining
Question
6 answers
  1. I would like to know. the best one(s) to use, whether free or proprietary. Thanks much!
Relevant answer
Answer
Yes. Mahoto is right.
  • asked a question related to Text Mining
Question
6 answers
I am interested in text mining. I use the clustering techniques to cluster the words in a text. Firstly, I want to select say 200 words frequently used words. Then I have to make a distance matrix and a dendogram considering the selected words. Please suggest me how I do it in R.
Relevant answer
Answer
Please go through the link below.
  • asked a question related to Text Mining
Question
14 answers
Are there any survey papers on word embedding in NLP which covers the whole history of word embedding from simple topics like one-hot encoding to complex topics like w2v model?
Relevant answer
Answer
Word Embedding for Understanding Natural Language: A Survey
Hope this paper can help you
  • asked a question related to Text Mining
Question
3 answers
Are there any R packages which can be used to mine text data in Malayalam?
Or is there any other FOSS package that can mine Malayalam text data?
Relevant answer
Answer
You may refer to the papers given below:
1. Nair, D. S., Jayan, J. P., & Sherly, E. (2014, September). SentiMa-sentiment extraction for Malayalam. In Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on (pp. 1719-1723). IEEE.
2. Nair, D. S., Jayan, J. P., Rajeev, R. R., & Sherly, E. (2015, August). Sentiment Analysis of Malayalam film review using machine learning techniques. In Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on (pp. 2381-2384). IEEE.
  • asked a question related to Text Mining
Question
3 answers
I Know that the supervised method is evaluated in terms of precision, recall and f1 measure. Therefore, what evaluation criteria is used for the evaluation of unsupervised method? Can an unsupervised method be evaluated in terms of precision, recall and f1?
Relevant answer
Answer
Hi Jibran,
To evaluate unsupervised learning schemes, you could rely on the log-likelyhood metric when using the density-based approach. Another option is converting the unsupervised learning into a supervised learning technique, where you can use its evaluation metrics such as ROC, F-measure, etc.
HTH.
Samer Sarsam, PhD.
  • asked a question related to Text Mining
Question
3 answers
Hi,
I'm looking for a free tool to recognize the terminology concepts in technical domains such as computer science and engineering.
Is there any available dictionary, gold standard or such a tool to do that? why there is no much research in this direction?
Thank you,
  • asked a question related to Text Mining
Question
4 answers
Nowadays there are plenty of core technologies for TC (Text Classification). Among all the ML learning approaches, which one would you suggest for training models for a new language and a vertical domain (alike Sports, Politics or Economy)?
Relevant answer
Answer
The state-of-the-art for most text classification applications relies on embedding your text in real-valued vectors:
The gensim package is popular for training word vectors on data: https://radimrehurek.com/gensim/models/word2vec.html
This method relies on having rich, diverse collections of words and contexts, which your data may not have on its own. Thus it's popular to initialize your embedding matrix using pre-trained word vectors like word2vec or fasttext; in some cases, these will work out of the box, in some you'll want to continue training the vectors on your dataset, in others it's better to just train on your data alone.
The great thing about embedding methods is they don't care about language; you can create an embedding for any language or really any sequential data that endows discrete data with a sort of 'meaning'.
Once you have richer features from your embedding matrix, you can use these as inputs to a classifier, which can be as simple as softmax regression, which assigns probabilities to discrete classes, or as complex as an RNN/LSTM, which ultimately can do the same but typically for sequential data.
The choices you make here depend more heavily on what specific problem you're trying to solve, but here are a few examples:
  • asked a question related to Text Mining
Question
3 answers
I am trying to use the Nlprot, but everytime I try to run I get the same error message:
sh: 1: svm_classify5: not found
Could not open svm_out_1_17802.txt!
The svm is installed, all the paths are checked, but I still can't run the NlProt.
Relevant answer
Answer
svmclassify will be removed in a future release. See fitcsvm, ClassificationSVM, and CompactClassificationSVM instead.
Syntax
Group = svmclassify(SVMStruct,Sample)
Group = svmclassify(SVMStruct,Sample,'Showplot',true)
Description
Group = svmclassify(SVMStruct,Sample) classifies each row of the data in Sample, a matrix of data, using the information in a support vector machine classifier structure SVMStruct, created using the svmtrain function. Like the training data used to create SVMStruct, Sample is a matrix where each row corresponds to an observation or replicate, and each column corresponds to a feature or variable. Therefore, Sample must have the same number of columns as the training data. This is because the number of columns defines the number of features. Group indicates the group to which each row of Sample has been assigned.
Group = svmclassify(SVMStruct,Sample,'Showplot',true) plots the Sample data in the figure created using the Showplot property with the svmtrain function. This plot appears only when the data is two-dimensional.
  • asked a question related to Text Mining
Question
1 answer
I am working in Text Segmentation Project.I need to build a lexical chain depending on WordNet or some other corpora from plain text.
There is decision Tree algorithm like C 4.5 to implement lexical chain.Being not much skilled in Python ,It's tough for me to manipulate decision tree.Is there any Python Package or Code available for finding lexical chain?
Relevant answer
Answer
See the following article
Text Summarization Using Lexical Chains
  • asked a question related to Text Mining
Question
9 answers
Hi. I have a query regarding Text Classification. I have a list of words with the following attributes. word, weight, class. The class can be positive or negative. Weight is between -1 to 1. How can I train a classifier like SVM using this word list to classify unseen documents? An example in any tool is welcome
Relevant answer
Answer
weka , rapid miner, sklearn python library are the easy to use for classification
  • asked a question related to Text Mining
Question
2 answers
I wish to work in this area but not finding enough resources.Suggest me some good journals or site where I can study about this.
Relevant answer
Answer
Natural Language Processing(NLP) holds great promise for making computer interaction easier with the naïve users or non- computer programmers.
  • asked a question related to Text Mining
Question
3 answers
I have done twitter sentiment analysis using VADER l