Science topic

OCR - Science topic

Explore the latest questions and answers in OCR, and find OCR experts.
Questions related to OCR
  • asked a question related to OCR
Question
2 answers
Hi everyone,
I’m currently working on a project involving the integration of a locally deployed large language model (we are currently using 14b DeepSeek) with a local knowledge base. The knowledge base mainly consists of a terminology glossary (we are still discussing the best data structure for storing it) and literature documents (such as PDFs or structured text).
I’d like to ask the community:
  • What are some effective approaches or best practices for integrating LLMs with local knowledge sources?
  • Are there any recommended tools, frameworks, or design patterns that have worked well for you?
  • Any relevant academic literature or project references you would suggest?
I'm particularly interested in strategies for enabling the LLM to accurately retrieve and utilize information from the local knowledge base during inference.
I’ll be updating this post with any progress or insights we gain along the way, and I really appreciate any guidance or suggestions you may have!
Thanks in advance!
Relevant answer
Answer
Hi Kai Davidson,
Your project sounds interesting! Here are some effective approaches and best practices for integrating your locally deployed 14B DeepSeek LLM with a terminology glossary and literature documents:
1. Data Structuring
For your terminology glossary, a relational database (e.g., PostgreSQL) or a key-value store like Redis can work well for fast lookups. For literature documents, converting PDFs to structured text using tools like PyMuPDF or Apache Tika and storing them in a vector database (FAISS, Weaviate) for semantic search is a good strategy.
2. Integration Approaches
  • Embedding-based Retrieval: Convert both glossary terms and literature into dense vectors using models like Sentence-BERT. Store them in a vector database (e.g., FAISS, Pinecone) for efficient retrieval.
  • Retrieval-Augmented Generation (RAG): Use a retrieval system to pull relevant content from the knowledge base, which is then injected into the LLM’s prompt to generate accurate responses.
  • Fine-Tuning: Fine-tune your LLM on your local data (terminology and documents) to improve its understanding of specific terminology.
3. Recommended Tools
  • FAISS or Weaviate for vector search
  • Haystack for building retrieval-augmented pipelines
  • ElasticSearch for text-based document search
  • TensorFlow or PyTorch for fine-tuning
4. Design Patterns
  • Query Expansion: Expand queries with related terms from your glossary to improve retrieval.
  • Knowledge Injection: Inject retrieved knowledge directly into the LLM’s prompt to provide more accurate responses.
5. Relevant References
  • "Retrieval-augmented generation for knowledge-intensive NLP tasks" by Lewis et al. (2020) – Introducing the RAG architecture for combining retrieval and generation.
  • "Learning to Retrieve with Generative Models" by Karpukhin et al. (2020) – Discusses generative models for retrieval tasks.
By leveraging embedding-based retrieval, fine-tuning, and retrieval-augmented generation, you can create an effective integration between your LLM and knowledge base. Tools like FAISS, Haystack, and Weaviate will help with both retrieval and storage.
Hope this helps, and feel free to reach out if you have more questions!
  • asked a question related to OCR
Question
2 answers
Since reactive oxygen species (ROS) can influence mitochondrial respiration, I am wondering whether elevated ROS levels contribute to an increase in OCR during the assay; and if so, at which phase of the Mitostress Test (Basal, Oligomycin, FCCP, or Rotenone/Antimycin A)? Are there specific treatments (e.g., with antioxidants) where this effect is diminished?
I would highly appreciate your insights!
Relevant answer
Answer
If the ROS originate during your Sea Horse Measurement either from the mitochondrial source or extra-mitochondrial sources of the cells, definitely there will be an increase in O2 consumption. This is mainly because of the generation of ROS (superoxide) form molecular Oxygen (O2). In my earlier days, we used to differentiate from the CN-sensitive and CN-insensitive O2 consumption. You should try to block each step of the mitochondrial ET (Complex-I to IV using Cyanide, Oligomycin, FCCP, Rotenone, and Antimycin) and measure O2 consumption and simultaneously the specific ROS (Superoxide and Hydrogen Peroxide) formation. Simply, use SOD-inhibitable Epinephrine (adrenaline) oxidation for superoxide generation in cells under the treatment of different complex-specific inhibitors with cells and under identical conditions, measure oxygen consumption by cells with Sea Horse. Similarly, measure the formation of H2O2 under identical conditions. While measuring the H2O2 production by cells, you may also include a GSH-Peroxidase inhibitor or Catalase inhibitor (azide).
Definitely, ROS production leads to increase in oxygen consumption due to uncoupling of the respiratory ET chain.
Note:
Cyanide is a potent poison that inhibits cellular respiration by binding to cytochrome c oxidase (CCO) in the mitochondria. This inhibition leads to an accumulation of electrons in the electron transport chain, resulting in the production of reactive oxygen species (ROS).
Physiol Rev
. 2014 Jul;94(3):909–950. doi: 10.1152/physrev.00026.2013
Plant Physiol
. 2002 Apr;128(4):1271–1281. doi: 10.1104/pp.010999
There is ample work done in this area.
I just suggested a couple of papers. Please read.
Please read earlier papers published by Dr. Britton Chance and Dr. Hagihara.
I spent decades on this topic. This is my 2 cents worth suggestion.
Good area to investigate, especially in the area of Ferroptosis. Since I am a lipidologist, I am interested in the lipid peroxides and mitochondrial lipids.
Good question (for me).
Parinandi
The Ohio State University Wexner Medical Center
Columbus, OH
  • asked a question related to OCR
Question
3 answers
3D cell culture experiments and assays are often still carried out in the standard incubator at 5% CO2, which means around 19% oxygen. However, physiological values are different. In your opinion, how important is it to consider the physiological oxygen concentration when performing 3D cell culture assays?
We are currently working on a research project that focuses on these issues, including high-throughput and automation. I would appreciate a discussion and also participation in a short survey on this topic:
Relevant answer
Answer
I authored a white paper on this topic approximately a year ago, which addresses many of the questions you’ve raised in greater depth. While it aligns with the points already mentioned here, it also explores the subject more comprehensively. Below are the key thoughts:
Incubator Oxygen Levels: Standard incubator oxygen levels can typically range between 16-18% O2, which is hyperoxic for most cell types. As already noted by others, this can introduce significant variability in experimental outcomes, as oxygen is a critical regulator of numerous cellular mechanisms.
Physiological Relevance: For studies aiming to investigate natural physiological responses or translational applications, it is crucial to account for the impact of physiological oxygen levels have on cellular responses.
Pericellular Oxygen: While ambient oxygen levels are important, the oxygen concentration at the pericellular level (i.e., directly surrounding the cells) is far more critical. Using oxygen-permeable plasticware and, where possible, measuring oxygen levels at the cellular interface are worthwhile investments to enhance experimental accuracy.
3D Cell Constructs: While this point is not mentioned directly in the white paper, I did want to try and answer your question here a bit more. For 3D cell cultures and constructs, both the inner and outer layers of the construct experience different oxygen levels. This gradient should be carefully considered when interpreting results. If possible to measure the pO2 of the inner and outer layers, it would be informative.
Performing Hypoxia/Physoxia Studies: Conducting experiments under hypoxic or physioxic conditions may seem straightforward, but it is essential to critically evaluate the oxygen levels chosen. Matching in vivo pericellular oxygen levels is vital for experimental validity. This aspect is frequently misunderstood, underscoring the importance of consulting existing research on the oxygen levels specific to the tissues being modeled. For further reading on this topic, I recommend exploring:
The full white paper I wrote is here:
  • asked a question related to OCR
Question
11 answers
Hello, I performed a titration of FCCP in HEK cells and two other melanoma cells lines in order to do a mito stress test with the Seahorse technology. While I can clearly see an increase in OCR with 0.5µM FCCP for the two melanoma cell lines, I can't see any increase in HEK cells (neither with FCCP 0.125 and 0.25µM) and a decrease with FCCP (1 and 2µM)... Has anyone encountered this situation with HEK cells? and what can be done about it?
Thank you!
Relevant answer
Answer
I use HEK293 and get a good FCCP response. A couple of pointes for HEK293 cells respiration assay:
1) As Yuanyuan mentioned, HEK293 are loosely attached to the plate, so it's always a good idea to coat and then seed the cells.
2) For 24 wellplate, I seed 70,000 cells. Optimize your cell number for 96 well plate to get maximum response.
3) For me, HEK293 cells give a good response at 1 uM FCCP concentration.
  • asked a question related to OCR
Question
1 answer
I am doing some seahorse assays with bone marrow derived macrophages and I see no reponse to oligomycin. During an ATP rate assay, OCR is not decreasing and ECAR is not increasing. Any thoughts about this?
Relevant answer
Answer
Hello! Were you able to solve it? If not, please detail the protocol you are doing. There could be several reasons.
  • asked a question related to OCR
Question
4 answers
Hi everyone, I'm encountering a problem in fixing the cells after the Glyco and the Mito stress assays in Seahorse XFe96 analyser. I start with monolayer of 100% confluent before the assay and after the assay most of the cells are detaching from the wells. And that eventually this affects the measurements of OCR and ECAR. I'm working on adherent cell line. Did any of you noticed this before?
Relevant answer
Answer
I used to face a similar problem while working with human monocytes, which are semi-adherent in nature. Eventually, I figured out that it was due to pipetting. You need to change the medium via slow pipetting and not@ touch the pipette tip to the well bottom. With practice, you will stop losing cells.
Thanks.
  • asked a question related to OCR
Question
3 answers
Hi all.
I am doing some tests with transfections and getting this weird basal OCR profile in mitostress tests. Could someone tell me what is wrong here???
The cells look perfectly fine in wells both before and after the experiment.
Relevant answer
Answer
Rafal Czapiewski, If oligomycin is leaking, the maximal respiration must rise higher than the than the basal respiration after the injection of FCCP drug
  • asked a question related to OCR
Question
3 answers
Is there any application to convert Arabic text in image into text for copy/paste ?
Relevant answer
Answer
Sofyane Bouameur Sofyane, the one I know of is "i20CR. --- Advertised as, "free online Optical Character Recognition (OCR) that extracts Arabic text from images and scanned documents so that it can be edited, formatted, indexed, searched, or translated." Hope this helps. More information at this link https://www.i2ocr.com/free-online-arabic-ocr#:~:text=i2OCR%20is%20a%20free%20online,indexed%2C%20searched%2C%20or%20translated.
  • asked a question related to OCR
Question
1 answer
I am working with fibroblasts and recently conducted a mito stress test using seahorse technology. Surprisingly, I observed unexpectedly low basal oxygen consumption rates (OCR). Furthermore, instead of the anticipated increase in OCR following the oligomycin pulse, I observed a decrease. I'm puzzled about the potential reasons behind this unexpected outcome. Could you provide any insights or suggestions to help me identify the problem?
Relevant answer
Answer
Your first question is about low basal OCR. The first few questions that come to mind for troubleshooting this are:
  • Do you have enough cells in each well? It looks like Agilent recommends 20×10^3 cells/well for human skin fibroblasts, but I don't know what your protocol says to do. If there aren't enough cells then you will be below the minimum detection limit for the oxygen probe. The output will then be a negative value because the machine is reading zero and then the value of the blanks is automatically subtracted from it.
  • Is it plated evenly? The cells should settle to the bottom of the plate in a relatively even layer. Or are they adherent?
  • Do you have enough blanks in your plate? How many depends on if you are using a 24 well or 96 well plate. I'd say 4 for a 24 well plate at at least 8 for a 96 well.
  • Have you designated blank wells as "blanks" in the software? Because Wave software automatically subtracts blanks from other wells.
  • Have you entered the correct total well volume in the "Run Assay" tab? This is the starting well volume plus the injection volumes.
  • It looks like you have it set to "normalize off". Have you have normalized results to cell number separately? You can have the software do it.
Your second question is about oligomycin. Oligomycin inhibits complex V in mitochondria (ATP synthase) causing a decrease in the flow of electrons through the electron transport chain. This decreases ATP-linked respiration and results in decreased OCR. This allows you to calculate ATP production and proton leak. So if you are seeing a decrease from basal after the injection of oligomycin then it is actually working correctly.
I recommend looking through these protocols to help with additional troubleshooting for your problem:
  • Mishra P, Zhang T, Guo M, Chan D. Mitochondrial Respiratory Measurements in Patient-derived Fibroblasts. Bio Protoc. 2019 Dec 5;9(23):e3446. doi: 10.21769/BioProtoc.3446.
  • Gu X, Ma Y, Liu Y, Wan Q. Measurement of mitochondrial respiration in adherent cells by Seahorse XF96 Cell Mito Stress Test. STAR Protoc. 2020 Dec 30;2(1):100245. doi: 10.1016/j.xpro.2020.100245.
  • Also search for Agilent's "Report Generator User Guide"
I hope that helps.
- Melissa
  • asked a question related to OCR
Question
3 answers
I recently ran an ATP rate assay using the Seahorse XFp mini plate (8 wells). I got OCR values way higher than I typically see in publications. Basal OCR for control and experimental samples ranged from 300-500 pmol/min. I am wondering what could go wrong to get such unexpected higher values! Thank you in advance for any help!
Best,
Samina.
Relevant answer
Answer
Best of luck with your future work.
  • asked a question related to OCR
Question
3 answers
Hi all,
I've recently upgraded from the Agilent Seahorse XF24 to the XFe24 and have started having issues with the OCR calibration. The rep suggested that storing the hydrated cartridge at 4C even within 72 hours is potentially damaging the probes and messing with the calibration, so to keep it in the 37C incubator even over 24 hours.
Has anyone had issues like this? I'm nervous about what's going to grow in the wells over the weekend (and of course I don't want the cartridge to dry out!).
Thanks for any help!
Melanie
Relevant answer
Answer
You should be able to use the plate if hydrated a minimum of 4 hrs. We have stored Seahorse plates in calibrant solution for up to 96 hrs at 28C (not in sterile conditions) without a problem. I haven't tried this at 37C though. After reassembling the plate after adding calibrant solution, we put the cartridge back in its box and tape its foil lid back onto it. This protects it from light and helps prevent drying out over a longer period of time.
  • asked a question related to OCR
Question
6 answers
I want to compare recognition rate with other dataset, so that  I need  Isolated printed Arabic character datasets.
  • asked a question related to OCR
Question
3 answers
Hello, I performed a titration of FCCP in HaCaT cells in order to do a mito stress test with the Seahorse technology (Agilant make). I have tried 0.125, 0.25µM, 0.5µM, 1 and 2µM concentration of FCCP but can't see an increase in OCR (even bellow basal level). Has anyone encountered this situation with HaCaT cells? and what can be done about it?
Thank you!
Relevant answer
Hi @juliane,
Yes.... Finally, I optimized the experiment.
I would suggest to start with lower concentration of FCCP (try with 0.01uM). Some cells are too sensitive to FCCP. At higher concentrations, they got saturated.
Best of luck for the experiment.
  • asked a question related to OCR
Question
3 answers
Does anyone get errors with using the Recognition part in " Scene Text Detection and Recognition by CRAFT and a Four-Stage Network" ?
Relevant answer
Answer
etecting and Recognizing texts for a given natural scene image using EAST and Tesseract Algorithms.
Regards,
Shafagat
  • asked a question related to OCR
Question
2 answers
During Seahorse mitostress assays, we have great variance between different cell lines regarding their oxygen consumption. One cell line is tricky because at relevant confluency, the basal OCR is quite high, and then it gets very high after adding FCCP and correspondingly the oxygen level in the wells drops dramatically and has been down to between 10 and 20. How low can the O2 level be without having hypoxia? I realize that if we get hypoxic conditions during the run, the OCR values will get a "false" value that is lower than it should be because the cells have not enough O2 to consume. I'm just unsure when this kicks in - is it around 20, 10 or 5 mmHg?
Relevant answer
Answer
This will happen with hearty cells/isolated mito when OCR exceeds around 500 pmol O2/min or so (in the XF96). You can usually notice this happening when you see your O2 level data starts to look like a backwards 'J'. This is nicely described with 'Fig 7' from the following protocol article about respirometry in frozen mitochondria samples.
  • asked a question related to OCR
Question
5 answers
I recently came to know about the commercial service https://mathpix.com/ which claims to convert mathematical formulas from (scanned) pdf or even handwritten text to LaTeX.
I have no experience with this. I am interested whether there is an open source solution which solves the same (or a similar) problem.
Relevant answer
Answer
@Knoll, I did it for my research paper. It was about mathematical formulas and equations on panel data econometric model. To open a pdf with MS word, you need to, at first, create a blank word file, then go to the option button on the left corner of the screen, there you will see the option "Open" along with options such as "Save". " Save as". Then click on "Open", then, select the specific pdf file, click on it, and it will be opened on MS word.
  • asked a question related to OCR
Question
3 answers
I created a new OCR dataset used to detect the non-standard license plate . But when use the model that "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation" proposed ,the F-Measure is 98%, FPS is 14. Is there any other innovation point for this dataset?Improve inference speed? Create a new Quadrilateral target detection method?
Relevant answer
Answer
thank you for your suggestion,but the precesion is 98.8% and the recall is 97%,they are both high.@Qamar Ul Islam
  • asked a question related to OCR
Question
4 answers
Hi everybody. I want to use an Agilent Seahorse XFe24 analyzer for my metabolic assays. My cells will be embedded in hydrogels to be able to design a specific environment for them. However, the probes (sensors) of the machine can be lowered up to 200um from the bottom of the plate. The height (200um) is very small for my hydrogels. When I checked the literature, I found that many articles use microbeads but their diameter was also more than 200um. I am not sure how they were able to take accurate measurement from the analyzer even though the height of their gels was more than it should. Is there any idea or anyone that had the same problem? Thanks for the advice.
Relevant answer
Answer
Shambhabi Chatterjee Thank you very much. I will definitely read this protocol. It looks promising. On the other hand, I had already contacted technical support but it was a dead end. I appreciate your help. Have a nice day.
  • asked a question related to OCR
Question
13 answers
Hello,
I would like to extract all the walls from complex .DWG files that comes from the conversion from a .PDF by AutoDWG software. In these big multi-floor plans, there are multiple types of walls (exterior, office, corridors, etc) that have their own thickness and interior composition (trusts, insulation, etc.) and it's not even always clear to the eye where one type of wall ends and another starts. To give you an idea of the problem, here are two examples of very small regions of these kind of plans.
Note that I'm NOT interested in starting from a high resolution image of a scanned floor plan as the information is already vectorial in the DWG file. Also, I must do it by programming and as little user intervention as possible as this must be part of a bigger software solution.
I would be glad if anybody could propose software, libraries or papers describing methods to extract wall information from such a .DWG file. I guess OpenDWG seems to be part of the solution but if someone has more first hand experience with this problem, I would like to know about that!
Relevant answer
Answer
In AutoLISP or ObjectARX C#/C
- Disable all layers except envelopes, partitions, doors and space labels.
- With each space label: Zoom centre then zoom out to a common zoom factor: You'll be able to figure out a factor by zoom extents on the largest space's boundary.
- Use the space label's insertion point (assumption that label insertion points are within their space's boundary.
- Run boundary poly command "_bp" (I think)
- With each space boundary, get the selection set of intersecting envelopes.
- Finally, for each bpoly line segment, match the points from the segment to the adjacent envelope/partition/door.
  • asked a question related to OCR
Question
3 answers
Hi,
I want to develop an algorithm to recognise arabic (moroccan) plate, so i use openalpr library with tesseract. My question is how to train alpr-ocr to recognise arabic plate?
Relevant answer
Answer
you can use the dataset here to train your model https://msda.um6p.ma/msda_datasets
  • asked a question related to OCR
Question
12 answers
Hi everyone. For the last part of my project I have to see how does my program respond to a magnified image. The magnification has to be a part of the code . I've tried imresize() , it will be magnified but the dimension will change too and that is a problem to my code. I wanted to know how can I scale or magnify my image but the dimension stays the same. My images are grayscale back background with a white Persian alphabet in the center(I've attached a sample).
I actually need to magnify the alphabet .
Relevant answer
Answer
To magnify an image without changing dimensions (zoom in), MATLAB provides a GUI based image viewer which can be invoked using the IPT function "imtool". Once it is open within the image viewer you can use the Zoom tool for a magnified view. You can also use the Pixel Region Tool tool for reading the pixel values of a selected region.
  • asked a question related to OCR
Question
3 answers
Hello, I came up with a scenario where the given image is a bottle/can. which have texts all over the bottle. A demo image has been given. As we can see the text can flow from left to right and any OCR system may miss the text from left and right.
So is there any soluntion/s for this like preprocessing in a certain way so that we can read the text or make this round object into a straight one?
Thanks.
Relevant answer
Answer
Taking your example image as representative of what you are going to process (cans), there are several step to preprocess the text
  1. The camera angle does not correspond to a flat Cartesian grid on which you can superimpose the image. Your image needs correcting on a translation on the y axis and a rotation on the z axis to properly represent an acceptable picture for processing. The image appears to be cropped since the bottle on the left gives the impression to be the focal point.
  2. You can use any edge detection algorithm to detect vertical lines and the Hough transform to detect ellipses
  3. If your job is only detecting labels in cans, then you have a finite amount of can sizes (your Monster energy drink is one of those sizes).
  4. Once you have detected where the vertical lines meet the ellipses you have the outline of the can. Now you have to scale it to the appropriate size so that you can treat the rest of the processing uniformly
  5. Do a cylindrical to Cartesian transformation to correct for the cylindrical distortions
  6. Do the text extraction
Take note that also there may be lens distortion at the edges, so I would concentrate also on repositioning the camera for the bottles on the extremes
I would need more samples to be able to provide further refinements to the process, but this would be my first approach to the problem.
  • asked a question related to OCR
Question
1 answer
Initially i intend to train an algorithm with a deep learning approach, based on the literature. However, rather than text in the wild, my goal is to recognize embossed text such as in the attached figure.
Given that, does anyone knows a dataset that contains such type of text - not necessarily related to railways?
Relevant answer
Answer
This is an excellent question. I need time think and research. I have recommended your question!
  • asked a question related to OCR
Question
5 answers
Just wondering if anyone has used Seahorse XF24 ECAR and OCR plates that have expired. We have some from Jan 2017 and I'm wondering if they will be ok.
Let me know!
Thanks.
Relevant answer
Answer
I think if you use a fresh pack it will be fine.
If you use leftovers of a kit that have been frozen for a while activity definitely starts dropping off.
  • asked a question related to OCR
Question
5 answers
I am working on a real time text extraction problem , where I have the option of either capturing image of an object or take a video of the object and then do the text extraction.
I am trying to understand the advantages and disadvantages of both the methods, like in taking a photo of the object, the problem could be image quality, the advantage could be the time taken to process the image.
Similarly in video the image quality may be better, but selecting the best frame could be a challenge, also computationally it looks to be more intensive.
Can anyone list down potential advantages or disadvantages with both the approach?
Relevant answer
Answer
Text information present in pictures and video contain valuable info. Text extraction from image has stages of detection the text from given image, finding the text location, extraction, improvement and recognition of text from the given image. But variations of text just like the variations in orientation, size, style, alignment; low size image distinction and a lot of difficult background create the matter of automatic text extraction extraordinarily troublesome.
The number of techniques and Methodology are planned to this downside.
You can take a look in these articles:
  • asked a question related to OCR
Question
15 answers
I need to convert manually annotated IPA transcriptions into digital format. Does anybody know of an OCR system able to recognise IPA symbols (even just for English phonemes)?
Relevant answer
Answer
I recently heard from Michael Ashby, current president of the International Phonetic Association, that he wrote a paper on OCR of IPA characters. But this was for printed text, not handwritten transcriptions.
> Ashby, Michael. 2017. Recognition where it’s due: Some experiments in optical character recognition (OCR) for phonetic symbols. Journal of the English Phonetic Society of Japan 21. 63–79.
He told me I could freely distribute it so it's attached
  • asked a question related to OCR
Question
4 answers
is it possible to adapt the Multinoms of a charsequence out of pascals triangle - i mentioned this concept in my work - to be a dot to graphical form anayser in a hashed pdf? The hex-Values then be compared to the arrangement by OCR?
Relevant answer
Answer
Dear Georg Neubauer , there is a link between Pascal's triangle and Binomial Coefficients (BC) : https://www.mathwords.com/b/binomial_coefficients_pascal.htm You may approximate BC by using Gaussian or Gamma function:
Deleted research item The research item mentioned here has been deleted
  • asked a question related to OCR
Question
8 answers
if it's suitable or not? if it's OK where I can find this algorithm to try to improve it?
I mean like ASM and AAM these model can use for face recognition, which suitable model can I use it for signature verification?
thanks all
Relevant answer
Answer
Signatures are handwritten and very far from clean cursive writing. OCR will miserably fail.
  • asked a question related to OCR
Question
2 answers
Following titrations I established 20,000 cells per /well to be ideal and using 1um Oligo.
I initally ran the experiments with 40,000 cells per well, but found this to be way over 100% confluent.
Therefore reduced the cell no. to 20,000 per well.
However for the FCCP titration- I don't see a significant rise in OCR regardless of whether i use 1um, 1.5um or 2um. Starting to think perhaps 20,000 cells/well is not enough?
Anyone else has had this problem with fibroblasts? Would it be wise to try a higher cell density even though the confluence is way over 100% or perhaps worth titrating for FCCP above 2um?
Relevant answer
Answer
Thank you Ying Wang . I'm gonna run the experiments again with 40k cells/well. Hopefully it works better this time around.
  • asked a question related to OCR
Question
4 answers
I am in search for any scientific works, tutorials, open-source projects etc. to know about OCR engine development. I strongly interested in applied software development in open collaboration so I think this could be great project to start. Does anyone know some works about neural-based OCR?
Relevant answer
I recommend you this web page https://www.pyimagesearch.com/2018/09/17/opencv-ocr-and-text-recognition-with-tesseract/ this guy has made a lot of tutorials of neuronal networks and if you have the opportunity to buy his books is a good change to start in the computer vision world.
  • asked a question related to OCR
Question
6 answers
Hi,
When studying the respiratory function of isolated mito. by measuring OCR using the Seahorse-
I'm using a buffer (MAS) that contains pyruvate+ malate.
After measuring the basal respiration- ADP, Oligomycin, FCCP and sodium azide are being added by sequential injections:
Basal respiration--> ADP injection--> Oligomycin injection--> FCCP injection--> sodium azide injection.
I'm getting the expected trends, but I was wondering what could be the reason for getting lower OCR after adding FCCP (in comparison to the OCR after adding ADP)?
I saw in many different papers (where similar injections were done) that sometimes the OCR is higher after adding FCCP, but sometimes the OCR is lower after adding FCCP (again- in comparison to the OCR after adding ADP).
Could you help me to figure it out.....? I'm sure that I'm missing something :)
Many thanks for your help!
Relevant answer
Answer
Dear Chen Lesnik,
Seahorse is one of the instruments with too many inborn sources of errors.
Too long to explain. The sequential addition of inhibitors is one of the most stupid approaches to study mitochondria. It gives no valuable information, except showing that the user does not know the basics of mitochondrial functions. I am rough because too many young researchers stick to this method. Use any real polarography, except Oroboros, which is just waste of money.
You should also know that pyruvate with or without malate interacts nonezymatically with hydrogen peroxide and therefore the concentration of pyruvate changes very quickly. One of the disadvantages of the Seahorse is that the relatively high surface and small volume of the medium allows fast interaction with ozone in the air, which destroys pyruvate. In addition you check the amount H2O2 in you medium. Finally, FCCP much more often inhibits respiration rather than stimulate. Find and read my book Practical mitochondriology for explanation.
  • asked a question related to OCR
Question
2 answers
I am working on a problem where I need to read label number of a box when it is moving across to the camera. Currently, I am using an HIK VISION 4MP 12mm IP camera at 1/100s exposure. But the results are not accurate, I wanted to know that, the current camera is really sufficient for reading letters on label. Any camera recommendations are really helpful for my work.
Relevant answer
Answer
your present IP Camera is good enough. But, to meet your requirement you must have to shift to the IP Cameras which has auto tracking capability.
here are my suggestions to meet your present requirement.
thank you
  • asked a question related to OCR
Question
1 answer
Whether the Handwriting Recognition Software admissible in the court of law in respect of report filing by Forensic Document Examiners?
How much accuracy rate of this software is admissible by a court of law?
Relevant answer
Answer
The answer to this question will vary based on the requirements by jurisdiction. But as a general comment (I am not a lawyer) I would say that there are several issues to contemplate:
  1. How does the software compare to a forensic expert evaluation
  2. Whether the software is a blackbox or whitebox. You will need to testify and if you have a good lawyer in cross he will grill you as an expert witness on internals of the software
  3. whether the software can carry out an analysis given the constraints of the case(volume of documents on which the analysis is carried, etc.)
  4. Whether it is a tool that provides consistent results (internal consistency)
  5. Whether the software can make reproducible results that can be confirmed by other software/experts (external consistency)
  6. Whether it is a tool of common use on the practice
Hope this helps
  • asked a question related to OCR
Question
3 answers
In a phase of my project, I need to extract and recognise texts from computer generated medical prescription. Although I have done it based on Tesseract-OCR, I'm looking for better and faster processes.
Relevant answer
Answer
  • asked a question related to OCR
Question
1 answer
I Laminated the LCD/Glass with Acrylate OCR. and I observed some discoloration of LCD after 95'C aging and also found it after Solar test. it looks 'red'
I heard PVA-Iodine complex could be degradated under heat or UV exposure.
But I cannot find its mechanism how to be occured and how to reveal red color...
would you help me to define this issue, please?
Thank you in advance!
Relevant answer
Answer
The orientation of polarizer changes the color of LCD. Try other orientation you'll get new pattern of colors.
Good Luck
  • asked a question related to OCR
Question
7 answers
If any one Know better algorithm to segment touching handwriting characters OR link for code to apply that to help me. Any suggestion to start with it.
I appreciate any idea.
Thank you in advance .
Relevant answer
Answer
Please look at my survey on this topic. I hope it will help you.
  • asked a question related to OCR
Question
5 answers
Several OCR tools on printed text is available, also a few HCR tools/apps are available that aims to convert handwritten text into digital form. But I have some documents that contains both - certain portion is printed and rest are handwritten. Please suggest a system that provide good accuracy in such mixed text.
Relevant answer
Answer
As per my knowledge i can say these things, if you have document images you have extract the text part from that. then do some preprocess on them like removing noises, extract the potential features from the preprocessed images. store feature vector with corrosponding labels( here labes are those which can related to particular class) then you can use classifier there are many classifiers are available i.e., KNN, SVM, LDA and so on. for the classification you will have app called classification learner app in Matlab 2015b onwards versions. By using classifier you obtain the recognition accuracy result. if the accuracy is nearer to the 100% then that result is treated as a highest result.
with Best wishes,
SATISH
  • asked a question related to OCR
Question
2 answers
Machine: Agilent Seahorse XFp Analyzers
I want to measure OCR and ECAR of live cells using a Agilent’s Seahorse Flux analyzer (with 96 wells) with cancer cell lines.
I would like to learn exaclty, how I can normalize the results for the cell number per well after the run. I know cell number before experiment but it can change while running.
Could you suggest any updated methods for this problem?
Thank you.
Relevant answer
Answer
Hi Elif,
I encountered the same problem. There are several approaches that you can take, such as protein normalization or nuclear DNA staining. In my experience the DNA staining works better.
Briefly, after the seahorse run I fix the cells in methanol : acetone (4:1) for 10 min at 4°C , wash 2X with PBS, and then stain with DAPI (2ug/ml) for 10 min room temperature, wash and measure in a plate reader.
You can use also Hoechst stain instead of DAPI.
I hope this helps and good luck.
  • asked a question related to OCR
Question
2 answers
In order to test the effect of a molecule (A) on metabolic patterns, I analyzed changes of ECAR and OCR in neurons and astrocutes. In neurons, A augmented ECAR but had no effect on OCR. However, in astrocytes, A both increased ECAR and OCR. Can anyone give me some iedas about this results. I am very confused. Thanks in advance.
Relevant answer
Answer
Rafael Linden Thanks for your suggestion. I agree with your opinion. The differential responses of neurons and astrocytes to a same stimuli may be due to their own differences in metabolic preferentials. But how this differences cause this phenomenon needs further attention. With the papers you suggested read, I've got some ideas. Maybe sometime I will share the results and we can discuss this later. Thanks again.
  • asked a question related to OCR
Question
4 answers
Dear Members,
I am trying to work on a solution to recognize handwritten text from scanned Form like Banking Form , Insurance form etc . I need your help to get some resource to study on this .Also , if there is any link to refer that would be great help .
Relevant answer
Answer
Well said at Benard
  • asked a question related to OCR
Question
2 answers
Dear researchers,
when I use tesseract for character recognition, there exist so many wrong results which the character width or height is about 4 pixels.
Is there any way to restrict the tesseract recognition by a range of character size?
thank you very much for your time.
Best Regards
Relevant answer
Answer
Dear Benjamin,
Thank you very much for your answer.
We have hundreds of thousand aerial photos, which we hope to read some serial numbers from them.
I am working in R.
when tesseract recognizes them some noises which are about 3 pixels width or height are reported wrongly.
I was thinking of how to tell the tesseract not select numbers which are out of a range size.
I searched for long time but could not find a solution.
I would be so appreciate if having any hint.
Best regards and thank you for your time
  • asked a question related to OCR
Question
4 answers
Dear researchers,
Reading numbers from an image is so useful in our job. As it seems the Tesseract package is a new and powerful OCR tools in R.
I would be so appreciate if there be samples of how use this package in programs.
Best Wishes
Ali Madad
Relevant answer
Answer
Dear Dr. Mohamed AbdElAziz Khamis ,
Thank you very much for your guides.
Best Wishes
Ali Madad
  • asked a question related to OCR
Question
2 answers
Hello dear researchers,
I have hundreds of thousands aerial images which all have some different numbers on them, which I need to recognize them.
Is there any free OCR tools in python (or other platforms) for this purpose?
Many thanks for your time
Ali Madad
Relevant answer
Answer
Dear Mohamed AbdElAziz Khamis,
thank you very much for your fine answer
best wishes
  • asked a question related to OCR
Question
1 answer
I have transduce HEK293T cell lines and am attempting to characterize some cellular and mitochondrial phenotypes after KD of my gene of interest. Seahorse is one of my main proposed assays but I can't seem to get acceptable results in terms of my variation and error bar sizes. They are literally > 100 units +/- for both ECAR and OCR. I have tried a range of about 4 cell densities from 1X10^4 - 2X10^5 and found a good one around 4X10^4 for 50-90% confluence 12 hours (O/N) before starting. I have tried 3-6 replicates (wells) per sample. I coat the plates with Poly-D lysine before since HEK are so loosely adherent and I reduced protocol mixing to a minimum of 1 so the plate is not shaken so much. That seemed to help a lot but still not enough. I always check the cells before and after and I normalize by cell number using CyQuant which should fix a lot of the problem too. Also, I just purchased CellTak since so many scientists have sworn by it but haven't had the chance to test with this. The only other thing I can think of to troubleshoot this is to try another cell line known to work well as a positive control. I wonder, do others experience such insane variation too and just remove outliers to produce nice data or is it just me? If I did want to go and remove outliers, is it possible within the wave software or would I need to go through and manually analyze the raw data?
Thank you for any advice in advance,
-A desperate PhD student
Relevant answer
Answer
Contacted Agilent to help troubleshoot my variation and they were very helpful, specifically looked into my raw data file and was able to tell me that the machine from my core facility hasn't had a check up in two years (it's already pretty old but can still generate good data) and that my media is acidic so maybe the media isn't being pH checked or buffered properly. Also, I will do a complete titration of my cells to make sure I am using the optimal density.
  • asked a question related to OCR
Question
9 answers
I am working on techniques to obtain high resolution reconstructed images of license plates. The source of these images are from CCTV video footage.
Relevant answer
  • asked a question related to OCR
Question
4 answers
Hello,
I need an effective tool for converting handwritten text to MS Word text that can be edited. Please state your recommendation based on your personal experience.
Relevant answer
Answer
Thank you Davit for your answer. Which tool(s) did you test?
  • asked a question related to OCR
Question
4 answers
I am doing recognition for characters in licence plate. I want to use MATLAB inbuilt ocr for recognition but I am not able to make it recognise all character clearly even though I did some preprocessing. I want to use ocr trainer but there is no proper material to do so. Suggest if there is any other method to do recognition
Relevant answer
Answer
Already many people make OCR for lisence plate.
And successful.
Except for conditions where the plate is degraded in quality, it is rarely successful.
Especially if the letters connect because of dirt.
  • asked a question related to OCR
Question
5 answers
I am developing a licence plate detector and recogniser.I developed the detector using Convolution Neural Network which has state of the art accuracy in real images.Now comes second part in Developing a Recogniser trained a OCR but it does not give state of art accuracy.Please suggest any Method which can recognise number from real world number plates.
Relevant answer
Answer
Deep neural network such as CNN, SAE etc..
  • asked a question related to OCR
Question
6 answers
Working on a project for work to streamline the data collection process.
Relevant answer
Answer
  • asked a question related to OCR
Question
3 answers
I have a set of book images that need to be cropped (No patterns in the photos) and improved to apply OCR, the images are from old books. Does anyone have a suggestion or example algorithm for such automation?
Relevant answer
Answer
Abedallatif Baba, yes, I'll make it available tomorrow.
Saif Aldeen Saad Alkadhim it is not possible to use photoshop because the idea is to create an opensource tool for the creation of virtual document libraries.
  • asked a question related to OCR
Question
1 answer
I would like to use such a procedure in my thesis to streamline the process of data processing while avoiding data entry errors.
Someone knows a free / less expensive alternative to the Flexicapture software (https://www.abbyy.com/flexicapture/) for character recognition (OMR type) and data extraction for questionnaires?
Relevant answer
Are you familiar with CVISION Technologies, Inc?
  • asked a question related to OCR
Question
7 answers
Oxygen consumption ratio (OCR), maximal oxygen consumption and mitochondrial reserve capacity are good indicators of mitochondrial function.
To measure this you need  an oxygen electrode and the use of inhibitors such as oligomycin, FCCP, etc.
But is there an alternative to the use of oxygen electrode?
Can you suggest any other good methods to assess mitochondrial function?
(Apart from measuring ATP or ROS formation...)
Relevant answer
Answer
Well. We (laboratory of structure and functions of mitochondria) use oxygen electrode and confocal microscopy mostly.
So if you have good confocal (or very good fluorescent) microscope, which can resolve mitochondria - you can work with different fluoresent probes. For potential (tmre), ros (dcf), different mitotracers and so on. If you can resolve individual mitochondria you can do quite a lot of things - measure quantity, do 3d segmentation (if you have z-stacks) and have their volume, check their movement and shape (fragmentation process). And you can check dynamics of this processes and reaction to some changes be it chemical, or physical (like oxygen depletion). You can try to fixate and use immunohistochemistry.
Plus you can isolate individual mitochondria and use flow cytometry on functioning mito or check something via western blotting or similar methods.
Much depends on your microscope and object.
Hope it helps, I can try to think more if you specify task a little.
  • asked a question related to OCR
Question
5 answers
Dear researchers,
I want to know about the emerging topics in the field of "Document analysis and recognition" and its related areas such as : pre-processing, document Layout analysis, OCR technologies, ... etc.
Thank you in advance.
Relevant answer
Answer
Hello,
One of the most emerging topic in the field of document analysis and recognition is Word Spotting. Word Spotting is an alternative of the OCR because OCR does not always generate accurate results when treating ancient manuscript documents. 
  • asked a question related to OCR
Question
4 answers
I am working for development of OCR for Odia language. But, the segmentation from old scanned document is not being resolved. Please provide relevant documents or techniques. 
Relevant answer
Answer
Hi.. Mamata Nayak
My be this interesting for you.
Arabic Character Recognition System Development - ResearchGate
The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013). ArabicCharacter Recognition System Development. Iping Supriana.
  • asked a question related to OCR
Question
3 answers
I found databases for Arabic characters and numerals. 
I also found several Arabic Words Databases (printed and handwritten) 
However, I am looking for full documents, text with images. 
Thank you
Relevant answer
  • asked a question related to OCR
Question
1 answer
I'm running a seahorse fatty acid oxidation assay in THLE-2 human hepatocytes. The basal OCR is very low (~30pmol/min) and declines steadily over the course of the assay. I've attempted this assay several times, adjusting the starvation parameters to no avail; both the rep and I are pretty confident this is not caused by low cell number (they're in a confluent monolayer) or gradual cell death. I'm also pretty sure that the respective concentrations of oligomycin and FCCP are ok, as I've titrated these successfully with no problem whatsoever. Its been suggested that I adjust the media supplements, but I don't want to detract from the point of the assay by encouraging my cells to use glucose/glutamine rather than fatty acids. Has anyone else encountered this problem or have any suggestions with regards to troubleshooting? 
Thanks! 
Relevant answer
Answer
Did you hydrate the green plate overnight at 37'C? Did you use Celltak? Also check the pH of all buffers? How did cell density test look like?
  • asked a question related to OCR
Question
6 answers
I'm wondering what kind of integrity the cells have afterwards. I'm using cell tak to adhere bovine lymphocytes and then interrogating their OCR and ECAR with the seahorse instrument. 
Relevant answer
Answer
Thank you! In order to save face with surface markers I have started using Hank's Buffer Saline Solution with 0.5% EDTA and incubating for 15-20min at 37C and then pipetting to get my cells. It seems to work relatively well. There still seems to be some loss of cell number, but it is a decent option.
  • asked a question related to OCR
Question
2 answers
How can tesseract along with OCR feeder can be implemented in the web application. Need help on integration.
Relevant answer
Answer
Hi Sruthi,
Please Try this Youtube video . May be helpful : https://www.youtube.com/watch?v=Mjg4yyuqr5E
  • asked a question related to OCR
Question
4 answers
I have hundreds of pdf file i.e (A1,A2, A3....).pdf that I need to go through to compare the content in each page to other set of pdf file i.e (B1, B2, B3....).pdf. Is there a way to import each page of a both corresponding pdf file as an image and do some sort of image processing to see if both page share same or similar contents. If possible, make it in such a way where i can input the pdf file location and code does the rest. Also, compute a numerical representation of image comparison. i.e 100 means both page are same, 90-99 is similar and anything less human intervention needed to make a judgment.
PDF files contains scanned reports with lots of text (OCR) and some table and few image and are at least 50 page or more.
I am familiar in Matlab and Python.
I would greatly appreciate hints and suggestion that will help solve this problem.
Thank you
Relevant answer
Answer
Perhaps you have already thought of this, but imagine A1 and B1 documents that have identical text, except B1 inserts 1/2 blank page at the beginning ... therefore a page by page comparison will show that the two documents are different even though we know the texts are the same by hypothesis. So, the idea of page comparison as opposed to content comparison seems flawed. Have you thought of adapting plagiarism detection software?
You could store all the A series documents, by uploading, and then upload all the B series and see if there are measurable overlap. I believe the software will accept PDF directly.. Such software may be expensive / costly to license for this use.
  • asked a question related to OCR
Question
3 answers
I'm working on tesseract-ocr. I want to know how line finding is done in tesseract. of course comments is given in code but I'm not getting it. Can anyone suggest me any documents or any good algorithms for line finding?
Relevant answer
Answer
I have tried to segment the reflected laser lines from the arc light modified background. I am not sure if your situation is similar. I hope my paper could help:)
Z.Z. Wang, Monitoring of GMAW Weld Pool From the Reflected Laser Lines for Real-Time Control, IEEE T IND INFORM, 10 (4), pp. 2073-2083, 2014
  • asked a question related to OCR
Question
15 answers
I'm working on OCR. I want to improve accuracy of tesseract open source OCR engine. It works ok if image has uniform light but it fails when image is non-unifomly lighted. Is there any way to convert non uniform lighted image into unifom illuminated Image? I have attechted one photo here. When we threshold it with global thresholding, it make it worse, however adaptive thresholding works but I need to change parameters in adaptive thresholding function for different images. 
Relevant answer
Answer
Dear Gautam, 
my answer seems to be simple. but for your application this is the best thing. 
histogram equalization  may solve the problem up to some degree. 
homomorphic filtering is the best solution for the problem. because in Homomorphic filtering, the image is taken as product of illumination and reflectance. f= i*r. 
the simplest of all solutions could be:
the image you have shown is a monochrome image. you can convert in to binary image with some threshold. it could be 0.5.
the illumination is uniform through out the whole image. if you are not satisfied with 255 background. you can change as well with some less intensity.
i hope your problem got solved with these simple solutions.
  • asked a question related to OCR
Question
3 answers
Dear colleagues,
I am looking for a music OCR notation software, to edit handwritten music. The source of this music is scanned scores, written with pen. The platform I use is windows 7.
Any ideas would be helpful!
Thanks!
Yannis Kyriakoulis
Relevant answer
Answer
You can try also Audiveris open source sw
Homepage link:   https://audiveris.kenai.com/
Reagards,
Marco
  • asked a question related to OCR
Question
1 answer
I'm trying to look at ECAR and OCR in Creatine deficient cells using the seahorse assay, I would prefer to run all my samples at the same time on a single 96 well plate, is it possible to freeze synaptosomes, like with cells  and store them for a later run?
Relevant answer
Answer
Hi Kenea,
Yes indeed, synaptosomes can be frozen for later use - we did not invent the method (look at the citations) but we have adapted it for an additional use: to introduce reagents into the terminals during the freezing step (see paper below).  . 
Good luck with your experiments.
Elise
Nath AR, Chen RHC and Stanley EF#.  Cryoloading: Introducing Large Molecules into Live Synaptosomes.  Frontiers in Cellular Neuroscience.  (2014) Jan 23;8:4. doi: 10.3389/fncel.2014.00004. eCollection 2014.
  • asked a question related to OCR
Question
3 answers
Is there a framework for tests and research purpose, that contain the various levels of document recognition (preprocessing, segmentation, feature extraction, recognition and post-processing) allowing us to develop, test, compare and improve algorithms and contributions.
I found this http://gamera.informatik.hsnr.de/ , but it does not incorporate all the stages.
Thanks in advance
  • asked a question related to OCR
Question
5 answers
For implementing line and word segmentation of handwritten document images using matlab.
Relevant answer
Answer
Thank you Youssef Boulid for your valuable suggestions
  • asked a question related to OCR
Question
3 answers
I am making a research to create indexes that will contain names and other keywords. My resource texts are written in Greek polytonic characters. I think that it would be very useful to find a way to make them editable and searchable. Furthermore, in order to summarize and classify the information mined, I believe that a software with stylometry function is needed. For the above reasons I am looking for: a) OCR software, b) stylometry software.
Any kind of help will be greatly appreciated! Thank you!
Relevant answer
Answer
Hi, are you aware of the following open-source system?
I don't have any personal experience with the program but it seems it might worth a try. 
  • asked a question related to OCR
Question
5 answers
I need a hint how choose feature from connected alphabet. If you do not familiar with this language just think English handwriting where in a word every letter is connected.
Relevant answer
Answer
Here is a recent paper.
I hope, it will help you.
Good luck
  • asked a question related to OCR
Question
3 answers
There are two types of Character recognition system:
1) Offline
2) Online
In online , which device we use to write characters/words?
Relevant answer
Answer
If you are planning to implement, try using deeplearning methods. See the following tutorial and webpage.
  • asked a question related to OCR
Question
3 answers
Which are the most suitable algorithm for the post-processing step IN Arabic OCR system?
I need to improve the obtained accuracy, using some TAL technique... how can I start?
Thanks !!
Relevant answer
Answer
For the post-processing step you can use rule-based or probabilistic techniques or a combination. As a simple technique you can start with decision trees. The most common way is using Hidden Markov Models for these. Also neural networks are popular not only in postprocessing. e.g. recurrent neural networks with a long short term memory(LSTM).
  • asked a question related to OCR
Question
4 answers
What are the possible and effective methods (quantitative/qualitative) to determine accuracy of Optical character recognition (OCR)?
Relevant answer
Answer
You can determine the accuracy by testing your recognition model. The easiest way is by dividing your dataset into training and testing groups. Then use training data to train your system which be tested by testing data. Each succeed case which recognized is counted as correct counter (r) and the each case was not recognized is counted as incorrect case. The accuracy rate can be calculated by = R/ number of testing cases.  
  • asked a question related to OCR
Question
7 answers
Hi friends,
i want to classify the scanned  document images and so many methods are there. But it depends high with texts in the document. Please suggest any best algorithm that can classify the document without using the texts.
I have added few sample images. Example i am having 500 documents with different layout . If i feed the image into the engine, it should tell this document is this type(i.e Form 16a, w2 tax).
Relevant answer
Answer
One possible approach could be to detect the straight lines first using Hough line detector. Hough line detector can identify lines with their position and orientation (slope).  As different layouts will have different numbers of horizontal and vertical lines placed at different positions, this (position & orientation of lines) could be a pretty good feature.  
Hope it helps.
  • asked a question related to OCR
Question
3 answers
I want to analyze character recognition using a Tesseract OCR engine for my scientific writing. I've read quite a lot of journal articles on Tesseract OCR engines and there it is said that Tesseract only can be analyzed using fuzzy space and neural networks. I'm therefore a bit confused as to what exactly fuzzy space is. Could anyone explain?
Relevant answer
Answer
A fuzzy space is a space for the representation of information. It is described by an n-dimensional vector where the  components are in the range [0,1].
A fuzzy space can be constructed by a fuzzy partition with n fuzzy subsets.
For example, the FCM clustering algorithm generates a fuzzy space .
  • asked a question related to OCR
Question
1 answer
Dear Friends 
Are there any publicly available/ licensed Offline Tamil Handwritten character database ? 
If so, please provide links or names of the database. 
Many thanks
Relevant answer
Answer
  • asked a question related to OCR
Question
3 answers
I have a requirement to scan large documents and extract the text out of them. How should we scan the book and what ways are the most efficient ways of doing this. How can I do this in the most efficient way and be able to get the most accuracy from an OCR program.
Relevant answer
Answer
You can use the following  site:
Then create the following datasets
Data Set Information:
This database has been artificially generated by using a first order theory which describes the structure of ten capital letters of the English alphabet and a random choice theorem prover which accounts for etherogeneity in the instances. The capital letters represented are the following: A, C, D, E, F, G, H, L, P, R. Each instance is structured and is described by a set of segments (lines) which resemble the way an automatic program would segment an image. Each instance is stored in a separate file whose format is the following:
CLASS OBJNUM TYPE XX1 YY1 XX2 YY2 SIZE DIAG
where CLASS is an integer number indicating the class as described below, OBJNUM is an integer identifier of a segment (starting from 0) in the instance and the remaining columns represent attribute values. For further details, contact the author.
Attribute Information:
TYPE: the first attribute describes the type of segment and is always set to the string "line". Its C language type is char.
XX1,YY1,XX2,YY2: these attributes contain the initial and final coordinates of a segment in a cartesian plane. Their C language type is int.
SIZE: this is the length of a segment computed by using the geometric distance between two points A(X1,Y1) and B(X2,Y2). Its C language type is float.
DIAG: this is the length of the diagonal of the smallest rectangle which includes the picture of the character. The value of this attribute is the same in each object. Its C language type is float.
Good Luck
  • asked a question related to OCR
Question
2 answers
I am working on degraded document image enhancement. Most of them are using complex techniques If anyone knows a simple method for character segmentation, tell me the paper name or code link. Thank you in advance.
Relevant answer
Answer
hi sabari nathan,
about this field, I can explain you that you can use close and open  operators, Alternatively (Morphologically operators) with the best structure element for your purpose. This method get a time and must call more and have a good results. for rising percentage of results, you can use machine learning and classifires to improve your method.
  • asked a question related to OCR
Question
3 answers
Is there anybody who is working on checking Plagiarism of Urdu text? Or on Urdu OCR?
Relevant answer
Answer
Dear friend
Greetings.
Till now the researchers worked with English language plagiarism as the major mode of communication in the entire globe is English. Thats why the research publications are mostly published in English globally. Besides this, there are other reasons too. 
If you are willing to start working for Urdu plagiarism, then it will be a good work as it will help the Urdu language users.
I hope this helps you.
Best regards
Dr.Indrajit Mandal, Ph.D.
  • asked a question related to OCR
Question
4 answers
Hello, I'm searching for a good free OCR algorithm to recognize mathematical patterns. I need it to create an Android application that uses this algorithm to give users the possibility to create a string of a mathematical printed pattern.
Relevant answer
Answer
The only commercial one that I know does math formulas is InftyReader, see link in my previous post. Some  Commercial ones may, possibly, be trained to recognize  math formula, but I this may not work very well. Also most of this software  works on Windows.  
One exception may be ABBYY cloud OCR:
this is cloud based,  so it is suitable  for constructing Android apps.  However, I don't know if it recognizes math formulas 
If you are interested in hanwriting recognition one commercial solution is MyScript. 
  • asked a question related to OCR
Question
3 answers
Shall I use a symlet wavelet to create a mask for separating an image from a scanned text?
Relevant answer
Answer
Actually, symlet wavelets are are a modified version of Daubechies wavelets with increased symmetry and Daubechies wavelet are used to extract features in OCR system.
  • asked a question related to OCR
Question
5 answers
I have been using a Seahorse Extracellular Flux assay to decipher the mechanism of action of a specific drug in glioblastoma primary cells. The basal rate of OCR and ECAR seems normal, however when media gets injected from Port A (instead of oligomycin) I get an increase in OCR and a sharp decrease in ECAR. This really puzzles me and does not allow me to ask further questions about my specific drug mechanism. Does anyone know how to fix this issue? Thank you!
Relevant answer
Answer
Hi Natalie,
did you check ph of media before adding it to port A or have you ever checked it, saperetaly? Is it a media alone or with your specific drug? If it contains your drug then drug has a strong effect on ph (alkaline effect). Do you keep your plate at 37oC without 5% CO2 during preincubation (48 mins)? I think these things can cause your problem.
Best,
Attila
  • asked a question related to OCR
Question
1 answer
Phonetic transcription.
Relevant answer
Answer
Have you checked CHILDES?
  • asked a question related to OCR
Question
5 answers
for shape analysis, topological and geometrical features.
Relevant answer
Answer
Yep, We used LBP feature for OCR in Birjand competition and we gained the first place position.
  • asked a question related to OCR
Question
4 answers
Many feature extraction methods are available for OCR. I need high recognition rate technique.
Relevant answer
Answer
many type of technique its depend on you
  • asked a question related to OCR
Question
3 answers
In OCR is there any research done for free-form handwritten recognition in gujarati or any other language?
Relevant answer
Answer
There has been lot research on Printed Gujarati script but little work in the domain of handwritten gujarati script. There isn't any whole OCR system build for handwritten gujarati script. I have build gujarati OCR for printed and handwritten gujarati script but there are still some problems which I have to solve.
In Bangla, Hindi and Devnagri script, there's so much work has been done on Printed as well as handwritten script.
follow the link given below:
  • asked a question related to OCR
Question
5 answers
I am currently working on handwritten digit recognition. I need a confidence score for each recognized digit. Can any one help me solve this problem?
Relevant answer
Answer
If you use libSVM, it can output the confidence score for you. SHOGUN library also supports this functionality.
If you have to implement the algorithm by yourself, please read the articles listed below.
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM TIST 2
(2011) 27 (Section 8)
LIN, H.-T., LIN, C.-J., ANDWENG, R. C. 2007. A note on Platt’s probabilistic outputs for support vector machines.
Mach. Learn. 68, 267–276.
Sample codes:
  • asked a question related to OCR
Question
9 answers
I am trying to develop am image processing algorithm to automatically recognize the embossed digits on a credit card. I know that the font used is Farrington 7B (http://www.barcode-soft.com/farrington7b_font.aspx). So I have the numbers which I have to compare the digits to. The main problem lies in segmenting the individual digits. I have been able to crop out the approximate ROI (Region Of Interest) with some margin, from the image of the entire card, and a few of them have been attached in the file below. Please help me with the following :
1. Removal of background image
2. Segmenting the digits (This step is difficult because some of the cards have the same font color as the background color)
3. Identifying the digits (I am planning to use ANNs (Artificial Neural Networks), but I am not sure how it is going to pan out.)
Thanks in advance.
Relevant answer
Answer
An ANN seems overkill to me.
In the images, you have at least two different fonts (for those who cannot read docx files, rename it to zip, and you will find ten jpg and one png image).
Are the images calibrated in the sense that the height of a digit is always the same number of pixels? If yes, you are lucky, if not, you probably need to calibrate them or include magnification as an extra variable to optimize.
I would create a set of ten images for each font with uniform background, calculate an Euclidean distance-transform to the edges, and use these to do chamfer matching.
That method is fast and pretty tolerant to variably backgrounds.
Nice problem and good luck!
  • asked a question related to OCR
Question
13 answers
I am working on algorithm design for content based image retrieval of web-based arabic character image.
Relevant answer
Answer
You can also use farsi character image.i recommend you http://farsiocr.ir/
  • asked a question related to OCR
Question
2 answers
I'm involved in OCR, and would like to use a large dataset of printed characters (not handwritten). Is there such a dataset available? It would be nice to find one having different fonts and/or noisy images.
Relevant answer
Answer
yes i have prepared a large dataset .. but it depends which languagae you need the dataset.. as i have custom built a software to create thousands of the images in a minute.. hope you can contact me.. my email address is (Dill.Nawaz@gmail.com)
  • asked a question related to OCR
Question
6 answers
I am working on text extraction of an image in Matlab. Can anybody tell me the basic path to follow regarding literature and implementation in steps. I will be thankful.
Relevant answer
Answer
Mainly it has two steps i.e. Text Extraction and Recognition. Text extraction involves image prepossessing ( Intensity normalization etc.), text area localisation ( i.e. connected component, blob analysis) and removal of non-textual region ( by applying some non-text related rules).
Recognition without OCR can be done using HMM.