Science topic
OCR - Science topic
Explore the latest questions and answers in OCR, and find OCR experts.
Questions related to OCR
Hi everyone,
I’m currently working on a project involving the integration of a locally deployed large language model (we are currently using 14b DeepSeek) with a local knowledge base. The knowledge base mainly consists of a terminology glossary (we are still discussing the best data structure for storing it) and literature documents (such as PDFs or structured text).
I’d like to ask the community:
- What are some effective approaches or best practices for integrating LLMs with local knowledge sources?
- Are there any recommended tools, frameworks, or design patterns that have worked well for you?
- Any relevant academic literature or project references you would suggest?
I'm particularly interested in strategies for enabling the LLM to accurately retrieve and utilize information from the local knowledge base during inference.
I’ll be updating this post with any progress or insights we gain along the way, and I really appreciate any guidance or suggestions you may have!
Thanks in advance!
Since reactive oxygen species (ROS) can influence mitochondrial respiration, I am wondering whether elevated ROS levels contribute to an increase in OCR during the assay; and if so, at which phase of the Mitostress Test (Basal, Oligomycin, FCCP, or Rotenone/Antimycin A)? Are there specific treatments (e.g., with antioxidants) where this effect is diminished?
I would highly appreciate your insights!
3D cell culture experiments and assays are often still carried out in the standard incubator at 5% CO2, which means around 19% oxygen. However, physiological values are different. In your opinion, how important is it to consider the physiological oxygen concentration when performing 3D cell culture assays?
We are currently working on a research project that focuses on these issues, including high-throughput and automation. I would appreciate a discussion and also participation in a short survey on this topic:
Hello, I performed a titration of FCCP in HEK cells and two other melanoma cells lines in order to do a mito stress test with the Seahorse technology. While I can clearly see an increase in OCR with 0.5µM FCCP for the two melanoma cell lines, I can't see any increase in HEK cells (neither with FCCP 0.125 and 0.25µM) and a decrease with FCCP (1 and 2µM)... Has anyone encountered this situation with HEK cells? and what can be done about it?
Thank you!
I am doing some seahorse assays with bone marrow derived macrophages and I see no reponse to oligomycin. During an ATP rate assay, OCR is not decreasing and ECAR is not increasing. Any thoughts about this?
Hi everyone, I'm encountering a problem in fixing the cells after the Glyco and the Mito stress assays in Seahorse XFe96 analyser. I start with monolayer of 100% confluent before the assay and after the assay most of the cells are detaching from the wells. And that eventually this affects the measurements of OCR and ECAR. I'm working on adherent cell line. Did any of you noticed this before?
Hi all.
I am doing some tests with transfections and getting this weird basal OCR profile in mitostress tests. Could someone tell me what is wrong here???
The cells look perfectly fine in wells both before and after the experiment.

Is there any application to convert Arabic text in image into text for copy/paste ?
I am working with fibroblasts and recently conducted a mito stress test using seahorse technology. Surprisingly, I observed unexpectedly low basal oxygen consumption rates (OCR). Furthermore, instead of the anticipated increase in OCR following the oligomycin pulse, I observed a decrease. I'm puzzled about the potential reasons behind this unexpected outcome. Could you provide any insights or suggestions to help me identify the problem?
I recently ran an ATP rate assay using the Seahorse XFp mini plate (8 wells). I got OCR values way higher than I typically see in publications. Basal OCR for control and experimental samples ranged from 300-500 pmol/min. I am wondering what could go wrong to get such unexpected higher values! Thank you in advance for any help!
Best,
Samina.
Hi all,
I've recently upgraded from the Agilent Seahorse XF24 to the XFe24 and have started having issues with the OCR calibration. The rep suggested that storing the hydrated cartridge at 4C even within 72 hours is potentially damaging the probes and messing with the calibration, so to keep it in the 37C incubator even over 24 hours.
Has anyone had issues like this? I'm nervous about what's going to grow in the wells over the weekend (and of course I don't want the cartridge to dry out!).
Thanks for any help!
Melanie
I want to compare recognition rate with other dataset, so that I need Isolated printed Arabic character datasets.
Hello, I performed a titration of FCCP in HaCaT cells in order to do a mito stress test with the Seahorse technology (Agilant make). I have tried 0.125, 0.25µM, 0.5µM, 1 and 2µM concentration of FCCP but can't see an increase in OCR (even bellow basal level). Has anyone encountered this situation with HaCaT cells? and what can be done about it?
Thank you!
Does anyone get errors with using the Recognition part in " Scene Text Detection and Recognition by CRAFT and a Four-Stage Network" ?
During Seahorse mitostress assays, we have great variance between different cell lines regarding their oxygen consumption. One cell line is tricky because at relevant confluency, the basal OCR is quite high, and then it gets very high after adding FCCP and correspondingly the oxygen level in the wells drops dramatically and has been down to between 10 and 20. How low can the O2 level be without having hypoxia? I realize that if we get hypoxic conditions during the run, the OCR values will get a "false" value that is lower than it should be because the cells have not enough O2 to consume. I'm just unsure when this kicks in - is it around 20, 10 or 5 mmHg?
I recently came to know about the commercial service https://mathpix.com/ which claims to convert mathematical formulas from (scanned) pdf or even handwritten text to LaTeX.
I have no experience with this. I am interested whether there is an open source solution which solves the same (or a similar) problem.
I created a new OCR dataset used to detect the non-standard license plate . But when use the model that "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation" proposed ,the F-Measure is 98%, FPS is 14. Is there any other innovation point for this dataset?Improve inference speed? Create a new Quadrilateral target detection method?

Hi everybody. I want to use an Agilent Seahorse XFe24 analyzer for my metabolic assays. My cells will be embedded in hydrogels to be able to design a specific environment for them. However, the probes (sensors) of the machine can be lowered up to 200um from the bottom of the plate. The height (200um) is very small for my hydrogels. When I checked the literature, I found that many articles use microbeads but their diameter was also more than 200um. I am not sure how they were able to take accurate measurement from the analyzer even though the height of their gels was more than it should. Is there any idea or anyone that had the same problem? Thanks for the advice.
Hello,
I would like to extract all the walls from complex .DWG files that comes from the conversion from a .PDF by AutoDWG software. In these big multi-floor plans, there are multiple types of walls (exterior, office, corridors, etc) that have their own thickness and interior composition (trusts, insulation, etc.) and it's not even always clear to the eye where one type of wall ends and another starts. To give you an idea of the problem, here are two examples of very small regions of these kind of plans.
Note that I'm NOT interested in starting from a high resolution image of a scanned floor plan as the information is already vectorial in the DWG file. Also, I must do it by programming and as little user intervention as possible as this must be part of a bigger software solution.
I would be glad if anybody could propose software, libraries or papers describing methods to extract wall information from such a .DWG file. I guess OpenDWG seems to be part of the solution but if someone has more first hand experience with this problem, I would like to know about that!


Hi,
I want to develop an algorithm to recognise arabic (moroccan) plate, so i use openalpr library with tesseract. My question is how to train alpr-ocr to recognise arabic plate?
Hi everyone. For the last part of my project I have to see how does my program respond to a magnified image. The magnification has to be a part of the code . I've tried imresize() , it will be magnified but the dimension will change too and that is a problem to my code. I wanted to know how can I scale or magnify my image but the dimension stays the same. My images are grayscale back background with a white Persian alphabet in the center(I've attached a sample).
I actually need to magnify the alphabet .

Hello, I came up with a scenario where the given image is a bottle/can. which have texts all over the bottle. A demo image has been given. As we can see the text can flow from left to right and any OCR system may miss the text from left and right.
So is there any soluntion/s for this like preprocessing in a certain way so that we can read the text or make this round object into a straight one?
Thanks.

Initially i intend to train an algorithm with a deep learning approach, based on the literature. However, rather than text in the wild, my goal is to recognize embossed text such as in the attached figure.
Given that, does anyone knows a dataset that contains such type of text - not necessarily related to railways?

Just wondering if anyone has used Seahorse XF24 ECAR and OCR plates that have expired. We have some from Jan 2017 and I'm wondering if they will be ok.
Let me know!
Thanks.
I am working on a real time text extraction problem , where I have the option of either capturing image of an object or take a video of the object and then do the text extraction.
I am trying to understand the advantages and disadvantages of both the methods, like in taking a photo of the object, the problem could be image quality, the advantage could be the time taken to process the image.
Similarly in video the image quality may be better, but selecting the best frame could be a challenge, also computationally it looks to be more intensive.
Can anyone list down potential advantages or disadvantages with both the approach?
I need to convert manually annotated IPA transcriptions into digital format. Does anybody know of an OCR system able to recognise IPA symbols (even just for English phonemes)?
is it possible to adapt the Multinoms of a charsequence out of pascals triangle - i mentioned this concept in my work - to be a dot to graphical form anayser in a hashed pdf? The hex-Values then be compared to the arrangement by OCR?
if it's suitable or not? if it's OK where I can find this algorithm to try to improve it?
I mean like ASM and AAM these model can use for face recognition, which suitable model can I use it for signature verification?
thanks all
Following titrations I established 20,000 cells per /well to be ideal and using 1um Oligo.
I initally ran the experiments with 40,000 cells per well, but found this to be way over 100% confluent.
Therefore reduced the cell no. to 20,000 per well.
However for the FCCP titration- I don't see a significant rise in OCR regardless of whether i use 1um, 1.5um or 2um. Starting to think perhaps 20,000 cells/well is not enough?
Anyone else has had this problem with fibroblasts? Would it be wise to try a higher cell density even though the confluence is way over 100% or perhaps worth titrating for FCCP above 2um?
I am in search for any scientific works, tutorials, open-source projects etc. to know about OCR engine development. I strongly interested in applied software development in open collaboration so I think this could be great project to start. Does anyone know some works about neural-based OCR?
Hi,
When studying the respiratory function of isolated mito. by measuring OCR using the Seahorse-
I'm using a buffer (MAS) that contains pyruvate+ malate.
After measuring the basal respiration- ADP, Oligomycin, FCCP and sodium azide are being added by sequential injections:
Basal respiration--> ADP injection--> Oligomycin injection--> FCCP injection--> sodium azide injection.
I'm getting the expected trends, but I was wondering what could be the reason for getting lower OCR after adding FCCP (in comparison to the OCR after adding ADP)?
I saw in many different papers (where similar injections were done) that sometimes the OCR is higher after adding FCCP, but sometimes the OCR is lower after adding FCCP (again- in comparison to the OCR after adding ADP).
Could you help me to figure it out.....? I'm sure that I'm missing something :)
Many thanks for your help!
I am working on a problem where I need to read label number of a box when it is moving across to the camera. Currently, I am using an HIK VISION 4MP 12mm IP camera at 1/100s exposure. But the results are not accurate, I wanted to know that, the current camera is really sufficient for reading letters on label. Any camera recommendations are really helpful for my work.
Whether the Handwriting Recognition Software admissible in the court of law in respect of report filing by Forensic Document Examiners?
How much accuracy rate of this software is admissible by a court of law?
In a phase of my project, I need to extract and recognise texts from computer generated medical prescription. Although I have done it based on Tesseract-OCR, I'm looking for better and faster processes.
I Laminated the LCD/Glass with Acrylate OCR. and I observed some discoloration of LCD after 95'C aging and also found it after Solar test. it looks 'red'
I heard PVA-Iodine complex could be degradated under heat or UV exposure.
But I cannot find its mechanism how to be occured and how to reveal red color...
would you help me to define this issue, please?
Thank you in advance!
If any one Know better algorithm to segment touching handwriting characters OR link for code to apply that to help me. Any suggestion to start with it.
I appreciate any idea.
Thank you in advance .
Several OCR tools on printed text is available, also a few HCR tools/apps are available that aims to convert handwritten text into digital form. But I have some documents that contains both - certain portion is printed and rest are handwritten. Please suggest a system that provide good accuracy in such mixed text.
Machine: Agilent Seahorse XFp Analyzers
I want to measure OCR and ECAR of live cells using a Agilent’s Seahorse Flux analyzer (with 96 wells) with cancer cell lines.
I would like to learn exaclty, how I can normalize the results for the cell number per well after the run. I know cell number before experiment but it can change while running.
Could you suggest any updated methods for this problem?
Thank you.
In order to test the effect of a molecule (A) on metabolic patterns, I analyzed changes of ECAR and OCR in neurons and astrocutes. In neurons, A augmented ECAR but had no effect on OCR. However, in astrocytes, A both increased ECAR and OCR. Can anyone give me some iedas about this results. I am very confused. Thanks in advance.
Dear Members,
I am trying to work on a solution to recognize handwritten text from scanned Form like Banking Form , Insurance form etc . I need your help to get some resource to study on this .Also , if there is any link to refer that would be great help .
Dear researchers,
when I use tesseract for character recognition, there exist so many wrong results which the character width or height is about 4 pixels.
Is there any way to restrict the tesseract recognition by a range of character size?
thank you very much for your time.
Best Regards
Dear researchers,
Reading numbers from an image is so useful in our job. As it seems the Tesseract package is a new and powerful OCR tools in R.
I would be so appreciate if there be samples of how use this package in programs.
Best Wishes
Ali Madad
Hello dear researchers,
I have hundreds of thousands aerial images which all have some different numbers on them, which I need to recognize them.
Is there any free OCR tools in python (or other platforms) for this purpose?
Many thanks for your time
Ali Madad
I have transduce HEK293T cell lines and am attempting to characterize some cellular and mitochondrial phenotypes after KD of my gene of interest. Seahorse is one of my main proposed assays but I can't seem to get acceptable results in terms of my variation and error bar sizes. They are literally > 100 units +/- for both ECAR and OCR. I have tried a range of about 4 cell densities from 1X10^4 - 2X10^5 and found a good one around 4X10^4 for 50-90% confluence 12 hours (O/N) before starting. I have tried 3-6 replicates (wells) per sample. I coat the plates with Poly-D lysine before since HEK are so loosely adherent and I reduced protocol mixing to a minimum of 1 so the plate is not shaken so much. That seemed to help a lot but still not enough. I always check the cells before and after and I normalize by cell number using CyQuant which should fix a lot of the problem too. Also, I just purchased CellTak since so many scientists have sworn by it but haven't had the chance to test with this. The only other thing I can think of to troubleshoot this is to try another cell line known to work well as a positive control. I wonder, do others experience such insane variation too and just remove outliers to produce nice data or is it just me? If I did want to go and remove outliers, is it possible within the wave software or would I need to go through and manually analyze the raw data?
Thank you for any advice in advance,
-A desperate PhD student
I am working on techniques to obtain high resolution reconstructed images of license plates. The source of these images are from CCTV video footage.
Hello,
I need an effective tool for converting handwritten text to MS Word text that can be edited. Please state your recommendation based on your personal experience.
I am doing recognition for characters in licence plate. I want to use MATLAB inbuilt ocr for recognition but I am not able to make it recognise all character clearly even though I did some preprocessing. I want to use ocr trainer but there is no proper material to do so. Suggest if there is any other method to do recognition
I am developing a licence plate detector and recogniser.I developed the detector using Convolution Neural Network which has state of the art accuracy in real images.Now comes second part in Developing a Recogniser trained a OCR but it does not give state of art accuracy.Please suggest any Method which can recognise number from real world number plates.
Working on a project for work to streamline the data collection process.
I have a set of book images that need to be cropped (No patterns in the photos) and improved to apply OCR, the images are from old books. Does anyone have a suggestion or example algorithm for such automation?
I would like to use such a procedure in my thesis to streamline the process of data processing while avoiding data entry errors.
Someone knows a free / less expensive alternative to the Flexicapture software (https://www.abbyy.com/flexicapture/) for character recognition (OMR type) and data extraction for questionnaires?
Oxygen consumption ratio (OCR), maximal oxygen consumption and mitochondrial reserve capacity are good indicators of mitochondrial function.
To measure this you need an oxygen electrode and the use of inhibitors such as oligomycin, FCCP, etc.
But is there an alternative to the use of oxygen electrode?
Can you suggest any other good methods to assess mitochondrial function?
(Apart from measuring ATP or ROS formation...)
Dear researchers,
I want to know about the emerging topics in the field of "Document analysis and recognition" and its related areas such as : pre-processing, document Layout analysis, OCR technologies, ... etc.
Thank you in advance.
I am working for development of OCR for Odia language. But, the segmentation from old scanned document is not being resolved. Please provide relevant documents or techniques.
I found databases for Arabic characters and numerals.
I also found several Arabic Words Databases (printed and handwritten)
However, I am looking for full documents, text with images.
Thank you
I'm running a seahorse fatty acid oxidation assay in THLE-2 human hepatocytes. The basal OCR is very low (~30pmol/min) and declines steadily over the course of the assay. I've attempted this assay several times, adjusting the starvation parameters to no avail; both the rep and I are pretty confident this is not caused by low cell number (they're in a confluent monolayer) or gradual cell death. I'm also pretty sure that the respective concentrations of oligomycin and FCCP are ok, as I've titrated these successfully with no problem whatsoever. Its been suggested that I adjust the media supplements, but I don't want to detract from the point of the assay by encouraging my cells to use glucose/glutamine rather than fatty acids. Has anyone else encountered this problem or have any suggestions with regards to troubleshooting?
Thanks!
I'm wondering what kind of integrity the cells have afterwards. I'm using cell tak to adhere bovine lymphocytes and then interrogating their OCR and ECAR with the seahorse instrument.
How can tesseract along with OCR feeder can be implemented in the web application. Need help on integration.
I have hundreds of pdf file i.e (A1,A2, A3....).pdf that I need to go through to compare the content in each page to other set of pdf file i.e (B1, B2, B3....).pdf. Is there a way to import each page of a both corresponding pdf file as an image and do some sort of image processing to see if both page share same or similar contents. If possible, make it in such a way where i can input the pdf file location and code does the rest. Also, compute a numerical representation of image comparison. i.e 100 means both page are same, 90-99 is similar and anything less human intervention needed to make a judgment.
PDF files contains scanned reports with lots of text (OCR) and some table and few image and are at least 50 page or more.
I am familiar in Matlab and Python.
I would greatly appreciate hints and suggestion that will help solve this problem.
Thank you
I'm working on tesseract-ocr. I want to know how line finding is done in tesseract. of course comments is given in code but I'm not getting it. Can anyone suggest me any documents or any good algorithms for line finding?
I'm working on OCR. I want to improve accuracy of tesseract open source OCR engine. It works ok if image has uniform light but it fails when image is non-unifomly lighted. Is there any way to convert non uniform lighted image into unifom illuminated Image? I have attechted one photo here. When we threshold it with global thresholding, it make it worse, however adaptive thresholding works but I need to change parameters in adaptive thresholding function for different images.

Dear colleagues,
I am looking for a music OCR notation software, to edit handwritten music. The source of this music is scanned scores, written with pen. The platform I use is windows 7.
Any ideas would be helpful!
Thanks!
Yannis Kyriakoulis
I'm trying to look at ECAR and OCR in Creatine deficient cells using the seahorse assay, I would prefer to run all my samples at the same time on a single 96 well plate, is it possible to freeze synaptosomes, like with cells and store them for a later run?
I need to know about PSO for text segmentation, so PSO can segment the text in the picture, so it can be read by the OCR.
Is there a framework for tests and research purpose, that contain the various levels of document recognition (preprocessing, segmentation, feature extraction, recognition and post-processing) allowing us to develop, test, compare and improve algorithms and contributions.
I found this http://gamera.informatik.hsnr.de/ , but it does not incorporate all the stages.
Thanks in advance
For implementing line and word segmentation of handwritten document images using matlab.
I am making a research to create indexes that will contain names and other keywords. My resource texts are written in Greek polytonic characters. I think that it would be very useful to find a way to make them editable and searchable. Furthermore, in order to summarize and classify the information mined, I believe that a software with stylometry function is needed. For the above reasons I am looking for: a) OCR software, b) stylometry software.
Any kind of help will be greatly appreciated! Thank you!
I need a hint how choose feature from connected alphabet. If you do not familiar with this language just think English handwriting where in a word every letter is connected.
There are two types of Character recognition system:
1) Offline
2) Online
In online , which device we use to write characters/words?
Which are the most suitable algorithm for the post-processing step IN Arabic OCR system?
I need to improve the obtained accuracy, using some TAL technique... how can I start?
Thanks !!
What are the possible and effective methods (quantitative/qualitative) to determine accuracy of Optical character recognition (OCR)?
Hi friends,
i want to classify the scanned document images and so many methods are there. But it depends high with texts in the document. Please suggest any best algorithm that can classify the document without using the texts.
I have added few sample images. Example i am having 500 documents with different layout . If i feed the image into the engine, it should tell this document is this type(i.e Form 16a, w2 tax).


I want to analyze character recognition using a Tesseract OCR engine for my scientific writing. I've read quite a lot of journal articles on Tesseract OCR engines and there it is said that Tesseract only can be analyzed using fuzzy space and neural networks. I'm therefore a bit confused as to what exactly fuzzy space is. Could anyone explain?
Dear Friends
Are there any publicly available/ licensed Offline Tamil Handwritten character database ?
If so, please provide links or names of the database.
Many thanks
I have a requirement to scan large documents and extract the text out of them. How should we scan the book and what ways are the most efficient ways of doing this. How can I do this in the most efficient way and be able to get the most accuracy from an OCR program.
I am working on degraded document image enhancement. Most of them are using complex techniques If anyone knows a simple method for character segmentation, tell me the paper name or code link. Thank you in advance.
Is there anybody who is working on checking Plagiarism of Urdu text? Or on Urdu OCR?
Hello, I'm searching for a good free OCR algorithm to recognize mathematical patterns. I need it to create an Android application that uses this algorithm to give users the possibility to create a string of a mathematical printed pattern.
Shall I use a symlet wavelet to create a mask for separating an image from a scanned text?
I have been using a Seahorse Extracellular Flux assay to decipher the mechanism of action of a specific drug in glioblastoma primary cells. The basal rate of OCR and ECAR seems normal, however when media gets injected from Port A (instead of oligomycin) I get an increase in OCR and a sharp decrease in ECAR. This really puzzles me and does not allow me to ask further questions about my specific drug mechanism. Does anyone know how to fix this issue? Thank you!
for shape analysis, topological and geometrical features.
Many feature extraction methods are available for OCR. I need high recognition rate technique.
In OCR is there any research done for free-form handwritten recognition in gujarati or any other language?
I am currently working on handwritten digit recognition. I need a confidence score for each recognized digit. Can any one help me solve this problem?
I am trying to develop am image processing algorithm to automatically recognize the embossed digits on a credit card. I know that the font used is Farrington 7B (http://www.barcode-soft.com/farrington7b_font.aspx). So I have the numbers which I have to compare the digits to. The main problem lies in segmenting the individual digits. I have been able to crop out the approximate ROI (Region Of Interest) with some margin, from the image of the entire card, and a few of them have been attached in the file below. Please help me with the following :
1. Removal of background image
2. Segmenting the digits (This step is difficult because some of the cards have the same font color as the background color)
3. Identifying the digits (I am planning to use ANNs (Artificial Neural Networks), but I am not sure how it is going to pan out.)
Thanks in advance.
I am working on algorithm design for content based image retrieval of web-based arabic character image.
I'm involved in OCR, and would like to use a large dataset of printed characters (not handwritten). Is there such a dataset available? It would be nice to find one having different fonts and/or noisy images.
I am working on text extraction of an image in Matlab. Can anybody tell me the basic path to follow regarding literature and implementation in steps. I will be thankful.