Science topic
Image Recognition - Science topic
Explore the latest questions and answers in Image Recognition, and find Image Recognition experts.
Questions related to Image Recognition
2025 2nd International Conference on Advanced Image Processing Technology (AIPT 2025) will be held on May 23-25, 2025 in Guangzhou, China.
Conference Website: https://ais.cn/u/22a2iy
---Call for papers---
The topics of interest include, but are not limited to:
◕ Image Processing Fundamentals and Techniques
Image Enhancement
Image Recovery and Reconstruction
Image Compression Techniques
Edge Detection Methods
Image Generation and Synthesis
......
◕ Emerging Technologies and Trends
Artificial Intelligence and Image Processing
Combining Natural Language Processing with Image Processing
Combination of Computer Vision and Image Processing
Quantum Computing in Image Processing
Edge Computing and Image Processing
......
◕ Image Analysis and Applications
Image Recognition
Target Detection and Tracking
Image Segmentation
Multi-label Image Classification
Dynamic Scene Understanding
Visual SLAM (Simultaneous Localization and Map Building
......
---Publication---
All papers will be reviewed by two or three expert reviewers from the conference committees. After a careful reviewing process, all accepted papers will be published in the SPIE - The International Society for Optical Engineering (ISSN: 0277-786X) and will be submitted to EI Compendex, Scopus and Inspec for indexing.
---Important Dates---
Submission Date: April 15, 2025
Registration Deadline: April 29, 2025
Final Paper Submission Date: May 18, 2025
Conference Dates: May 23-25, 2025
--- Paper Submission---
Please send the full paper(word+pdf) to Submission System:
2024 4th International Conference on Image Processing and Intelligent Control (IPIC 2024) will be held from May 10 to 12, 2024 in Kuala Lumpur, Malaysia.
Conference Webiste: https://ais.cn/u/ZBn2Yr
---Call For Papers---
The topics of interest for submission include, but are not limited to:
◕ Image Processing
- Image Enhancement and Recovery
- Target detection and tracking
- Image segmentation and labeling
- Feature extraction and image recognition
- Image compression and coding
......
◕ Intelligent Control
- Sensors in Intelligent Photovoltaic Systems
- Sensors and Laser Control Technology
- Optical Imaging and Image Processing in Intelligent Control
- Fiber optic sensing technology in the application of intelligent photoelectric system
......
All accepted papers will be published in conference proceedings, and submitted to EI Compendex, Inspec and Scopus for indexing.
Important Dates:
Full Paper Submission Date: April 19, 2024
Registration Deadline: May 3, 2024
Final Paper Submission Date: May 3, 2024
Conference Dates: May 10-12, 2024
For More Details please visit:
Invitation code: AISCONF
*Using the invitation code on submission system/registration can get priority review and feedback

Do you recognize this planet? Is this a photo or a fantastic image of one of recently discovered and entirely covered with liquid oceans exoplanets?
This and other similar images will be used here to test our humans ability in recognizing visual patterns. Without (best as first guess) or with (final resort) help of AI. The conclusions may serve to enhancing AI tool for recognizing imaged objects.
(To enhance such alghoritms we may later discuss the way we found the solution without them.)
The first attached photo has already been solved a long time ago in one of RG precursors of this thread - nevertheless, I am recalling it in a slightly harder form (without description it contained at its bottom - as such was used to recognize the image by Aleš Kralj).
The rule is who will be first to recognize the planet (or other proponed image) (s)he will have right to demonstrate her/his own example of visual pattern to be solved.
Bjørn Petter Jelle commented once:
The planet thread is back with a lot of new questions and answers! :-) Unfortunately, all the old ones have gone somewhere into oblivion in the digital world of ResearchGate...?
And I'd answered: All is well what ends well. Isn't it?
Anyway, the actual format of RG questions is such that after few dozens answers all the previous also are vanishing in "eternal" oblivion, alike light in a Black Hole :-)
Sometimes, some information occurs again on its surface alike the quanta of Hawking radiation, however. And this new question is such new attempt of returning to our previous common experiments with human brain abiluty for recognizing of images or searching for them without and with refering to AI.

For research purposes, I am looking for good quality images of faces of males and females which vary in masculinity and femininity. Preferably the faces are of caucasian people around 45 years old.
Thank you in advance,
Judith
I am trying to read water meter reading through OCR, however, my first step is to find ROI. I found a dataset from kaggle with the labelled data for the ROI. But they are not in rectangle, rather in polygon shape, some with 5 point, and some with 8 depending on the image. How do I convert this to yolo format?
For example: file name | value | coordinates
id_53_value_595_825.jpg 595.825 {'type': 'polygon', 'data': [{'x': 0.30788, 'y': 0.30207}, {'x': 0.30676, 'y': 0.32731}, {'x': 0.53501, 'y': 0.33068}, {'x': 0.53445, 'y': 0.33699}, {'x': 0.56529, 'y': 0.33741}, {'x': 0.56697, 'y': 0.29786}, {'x': 0.53501, 'y': 0.29786}, {'x': 0.53445, 'y': 0.30417}]}
id_553_value_65_475.jpg 65.475 {'type': 'polygon', 'data': [{'x': 0.26133, 'y': 0.24071}, {'x': 0.31405, 'y': 0.23473}, {'x': 0.31741, 'y': 0.26688}, {'x': 0.30676, 'y': 0.26763}, {'x': 0.33985, 'y': 0.60851}, {'x': 0.29386, 'y': 0.61449}]}
id_407_value_21_86.jpg 21.86 {'type': 'polygon', 'data': [{'x': 0.27545, 'y': 0.19134}, {'x': 0.37483, 'y': 0.18282}, {'x': 0.38935, 'y': 0.76071}, {'x': 0.28185, 'y': 0.76613}]}
I recently came to know about the commercial service https://mathpix.com/ which claims to convert mathematical formulas from (scanned) pdf or even handwritten text to LaTeX.
I have no experience with this. I am interested whether there is an open source solution which solves the same (or a similar) problem.
What about the manual skills?
There are a lot of electronic devices, which help us to adapt to the computer interface but, what about the human interface?
In many case the computer interface are not suitable to the human interface. For this reason, many controllers such as the keyboard, mouse and other game controllers have a long journey in our lifes. They are part of our life as an extension of our own body.
Do you know any other systems that let human to interact with the electronic devices with more freedom? And, what kind of problems are there in these different systems or methods?
For example I could mention Voice Recognition, Image Recognition, Gestural Recognition, Brain Interpretation Sensors, but for sure there are a lot.
What kind of projects, research lines do you know are focus nowadays to improve the human interface but with real adaptation to the human body?
For example, what about the manual skills? Why not to use all the skills of our hand as a magician makes? The illusionist, spend a lot of hours training their hands as essential part of their tricks. Then, why not give another chance to the hands to change the way to interact with the electronic devices?
The COVID period is being a really hard moment for the nurses, doctors as the first barrier to fight agains this horrorific situation. But it is being also a really mental problem for all the human species.
In some way, I was thinking about a solution to mitigate that. For example, avoiding to touch is some of the rules to expand the virus and get contagious.
Therefore, clothing or accessories such as gloves can play a key role.
I suggest in this case the Smart Gloves, and you??
Hello fellow researchers! I am currently doing my final year project which involves Image Recognition with a Supervised Learning Machine Learning Algorithm. Currently, I have datasets comprised of images (without labels of course) that are obtained from Kaggle. The project that is in progress is to inspect the quality of the green coffee bean and classify them to be either healthy or defected. I am new to this field per se, and I find it hard to do the Python and to train my model. Am I in the right course to train my model with just images? Isn't it like unsupervised learning, since my current datasets are labelled but without their input variable.
Dear all,
currently, I am working on content wise image classification, Can you please specify me about image recognition algorithm?
Thanks,
Dear all, I am trying to implement pose normalization for face images using piece-wise affine warping. I am using delaunayTriangulation to construct face mesh based on detected 68 landmarks for two images: one with frontal face and the other with non-frontal face. The resulted meshes do not have the same number of triangles and also have triangles that are different in direction and location.
Could anyone help please? Thanks.
------------------------------------------------------------
% Construct mesh for frontal face image
filename1 = '0409';
img1 = imread([filename1 '.bmp']);
figure, imshow(img1); hold on;
pts1 = load([filename1 '.mat']); % Load 68-landmarks
DT1 = delaunayTriangulation(pts1.pts);
triplot(DT1,'cyan');
% Construct mesh for non-frontal face image
filename2 = '0411';
img2 = imread([filename2 '.bmp']);
figure, imshow(img2); hold on;
pts2 = load([filename2 '.mat']); % Load 68-landmarks
DT2 = delaunayTriangulation(pts2.pts);
triplot(DT2,'cyan');
Hi,
as deep learning is a data-driven approach, the crucial is to have quality data. There exist a lot of datasets for free, but they differ in the quality of labels.
I'm now working on an index, which can tell a researcher quality of the labels, so the researcher may decide if such a dataset is useful nor not. I do have established a pipeline on how to produce such an index in a fully autonomous way. Note, I'm focusing on object detection tasks only, i.e., labels given as bounding-boxes.
The question is: does such the index exist already? I googled a lot and find nothing. It would be nice to compare our approach with existing ones.
Several attempts of image matching using INPHO and Agisoft software were not successful. Any helpful suggestions or related papers would be highly appreciated.
I have a data for image recognition using neural networks. The images are in pgm format.how to pre-process that data to get into a suitable matrix in cpp.
As a recognition method, neural network is superior and powerful, especially in image-recognition, but as a well-known method whether neural network can solve the causality reasoning or not? If not, why?
Deep neural networks (DNNs) have been widely used for closed-set recognition. In other words, they only recognize objects that have been seen in training. Can DNN be used in open-set recognition to identify database objects and reject novel unseen objects as unknown? if yes, how?
How to implement multi class SVM in Matlab? Especially when it comes to creating a training matrix set of image dataset and then testing matrix set of images and group sets etc.
I've read several times that on the problems of large dimension (Image Recognition, Text Mining, ...), the Deep Learning method gives significantly higher accuracy than the "classical" methods (such as SVM, Logistic Regression, etc.). And what happens on problems of ordinary, medium dimension? Let's say that the data set is on the order of 1,000 ... 10,000 objects and the object is characterized by 10 ... 20 parameters. Are there articles that provide data on the comparison of accuracy indicators (Recall, Precision, ...) by Deep Learning and other methods on some benchmarks?
Thanks beforehand for your answer. Regards, Sergey.
The field of image processing is very effective and high performance quantitative method in science and engineering, in particular the Image recognition in the area of computer vision.
Door detection is one of the important issues in indoor navigation.
Canny edge detector is used in door detection
Dear all,
I need some help regarding image recognition and/or augmented reality. I don't want to use to scan images with camera to augment it with virtual content. Instead, I would like to upload images in my mobile from mobile locally and then augment it with virtual contents.
Please help get articles and sample codes in this scenario. Thanks a lot.
The research content includes a proposed algorithm for image/object matching and two proposed algorithms for multiple object detection.
The algorithms for image/object matching and multiple object detection are not related.
My question is how to organize them to form a Phd thesis? How to unify them into a big problem to present? What title is appropriate?
I am currently working on a psychology research project which uses a dual-video task comprised of anxiety-provoking and positive videos to be shown side-by-side. I really want to try and match up the videos as much as possible by perceptual characteristics. For example, sizes of objects on screen, colours, textures, etc. Does anyone know of an algorithm, program, or app which could be used for this purpose?
Using HoG transform i obtained feature vector for each image, now how to classify these images using Sklearn classification algorithm(Knn) using obtained feature vector??
Hi
How to calculate confident level in computer vision. I work on object detection and for that purpose detected relevant features. I work on airplane door detection, so I have some relevant features such as, door window, door handle, text boxes, Door frame lines and so on. Firstly , detect individual features, then in the second level and done some logical organisation of those features where eliminate the wrong detected features.And the end I have some final checks where should remain only features that belong to that object. So my question is with which confident level I can declare that this is the object I like to detect. Any help
What does it mean to get empty matrix when applying HOG transform on an image?
I am working on a segmentation task and aim to use hog descriptor for pixels of an image . Applying transform, I get empty matrix for some windows. What does it mean?
I'm looking for a dataset of images (>100 images) of cells under bright field microscopy. They can be from any species, but human or mice cells are preferable. I have found a couple of sources such as the cell image library but it does not seem to contain bright field images in the quantities I need.
Also note, I am looking for images of cell cultures in particular where only one type of cell is in the image. As such images of tissue are not suitable for my application.
The reason I am looking for these image is to test some image recognition and classification software.
Thanks in advance.
I want to recognise and track objects in real-time video processing. what is the best classifier that I can use for object recognition?
are there any public datasets for training and testing?
I am working on paddy grain(naturally stored) age assessment and the work is based on the husk color. Please let me know the age intervals (in months) where we can recognize the change in the husk color of stored paddy grains.
I need to identify the type of fish caught from fish images. How can I locate anchor points/landmark points to extract features from the image?
Namely, I want to locate eye position, dorsal and pelvic fin. Need to get Fish mouth length, Dorsal and Caudal fin length.
Right now I am trying with SIFT method to get key points.
Can someone suggest me how can I get the specific key points?
I would like to know a good starting point to carry out my research in the above mentioned topic.
I want to do face matching using weighted chi-square distance. now i have done with the chi-square distance. i have divided the face image into 8*8 sub images. I have assigned weights for particular region.
Kindly suggest me how i can proceed further for face matching.
Hello,
I'm working primarily in Python 3.5, so would prefer answers that can work with that language if possible please...
I would like to write a program that will automatically detect clouds in photographs, and also for it to detect what sort of cloud(s) is/are present in the photograph.
This means, unfortunately, that there are no easily-definable shapes, sizes, or colours.
Can anyone recommend some way of going about this?
I would assume it will require supervised learning (e.g. feeding the software images of X type of cloud, and then images of Y type of cloud, and Z type, etc. etc.).
Ultimately, I'd like to be able to feed the program photographs and have it output a list showing the cloud type, pixel coordinates within the photo (if possible), and the filename.
Thank you in advance for your help.
David
Dear colleagues,
I am looking for a music OCR notation software, to edit handwritten music. The source of this music is scanned scores, written with pen. The platform I use is windows 7.
Any ideas would be helpful!
Thanks!
Yannis Kyriakoulis
I'm facing problem of letters extraction. The image is grayscale. The letters are in a row. Background of the image is not homogenous, there can be some texture with some non-white intensity. The letters are black. And of course, the letters are sometimes so similar with the background so they cannot be separated easily. The problem is also similar to car plate letters extraction. But in these case, we can expect "damaged plate with damaged letters". Unfortunately, the particular images are subject of secret project, so I cannot include there an example.
I used several types of thresholds including adaptive ones, histograms-based methods and some segmentation-based method. But no one of them works generally good.
The goal of my task is to extract letters, i.e., detect rectangular area of each of them, or to exclude them from backgroud.
There are two criterias: success rate (I need almost 100 % success rate of extraction) and processing speed (as fast as possible, several ms ideally).
Thank you for your advices, or links to some useful papers.
I am working on a project using multiple sensor. In this project I am detecting and recognising animal. But I am struggling with thermal camera IR. At this point of project I have to recognise animal using thermal imaging. Is their any way we can recognise individual species.
During my research I find out Eyes, ears and nose are high heat emitting areas. But my problem is what if the animal is facing back first and face end.
Thanks
Kind regards
In my project i want to find contour subsets of an image, each delimited by two concavities. The size of every foreground region delimited by a detected contour subset and by the straight line segment connecting the two extremes of the contour subset is computed. If the size is smaller than a given threshold, the peripheral part is regarded as noisy and is removed. As for the threshold, we use the same value adopted during removal of small 8-connected foreground components and filling of thin hollows and small 4-connected holes, i.e., we remove peripheral regions with less than 32 pixels.
Please anyone can help me to find this?
I] Part-I (Orientation Assignment to Keypoint)
In this process
1. First I have selected window of size 16 x 16 around keypoint and calculated magnitude and orientation for each point in window of 16 x 16.
3. Then created a 36 bin histogram of orientation.
4. then I have assigned the mean value of highest bin.(i.e. if 1st bin(0-10) has highest bin of 36 then '5' is assigned as orientation to keypoint.(Is it Correct?))
5. Then I have calculated Gaussian window of size 16 x 16 with sigma value equal to 1.5 times of scale.
6. Then I have multiplied magnitude matrix of size 16 x 16 with Gaussian window
(What is the use of this multiplication?)
Is it require to multiply this multiplication result(Magnitude x Gaussian) with orientation before assigning orientation to keypoint ? (as i found some histogram bins with highest value has less magnitude value.)
As per my logic we should assign the orientation mean to keypoint as orientation of the bin whose value is highest with its magnitude value.
7. Then I have transformed(rotated) coordinates of key point i.e. x,y position of key point with respect to assigned orientation by using 2D transformation. (Is it Correct?)
8. then I have transformed orientation of all sample points included in window of 16 x 16 according to orientation of keypoint.(e.g. if keypoint orientation =5 and if sample point orientation =270 the it will become 275.(Is it Correct ?))
From the study I realize that the feature extraction methods for word level recognition does not fit well for character level recognition. Thus the question is raised in my mind, what algorithm may work fine for character level recognition as the images for each characters are very small (e.g. 30px by 30px).
We run a small scoring shop for university exams. We use a optical recognition scanner and scan sheets through the scanner to score instructor designed exams.
We have been asked to begin scoring multiple answer exams. Our current optical recognition software is good at scoring items with only one correct answer. However we are now being asked to score tests where students should indicate all items which are true, up to three correct options for one item.
Do any of you have a good system for tabulating the correct answer in this type of assessment? Thanks for your help.
I need to know why chi-squared kernel outperform other kernels SVM in image classification, many features are extracted depend on bag of features with histograms and the researchers applied chi-squared kernel in many papers.
It's known the classification performance degrades as a function of the number of classes in the corpus. What I would like to know whether there is a systematic study of this aspect in the literature.
Hello,
Usually, when speaking about the invariance of descriptors, one can assume that a descirptor is invariant so the distance between two features of different images under different transformation is equal zero. But, in reality, it is not the case. So, the question is: What is the distance that we can consider as a refernce to evalute the invariance of a descriptor.
I want to use color information for using traffic sign recognation. But the color effected from illumination for example red color looks like black color. I think maybe color normalization help me. what can ı do?
Hello all,
I would like to use a time-lapse camera to take half hourly photos of a gas meter. Is anybody aware of a number recognition software that could be used in conjunction with the camera so that the readings could be logged in digital formats (as opposed to images?). Time lapse cameras are available commercially (http://www.brinno.com/html/TLC100.html ), but in the absence of a number recognition software, stored images need to be converted into digital values manually, making the process too labour intensive.
I would appreciate any advice.
I am working to count the number of human object that was detected on my camera. Until now I am success to separate the object and background. But I cannot count the number of detected object in simple way. There is any answer to help me to solve this problem? Thanks
Interested in ways to interpret green spaces manually or computer assisted from video footage of walks through urban environments
Problem: We need to recognise a plate or any other ID number on a vehicle. However it's hard to get it shadow free. What methods can be applied to remove shadows effectively? Am i right that there is now way to analyse a picture having shadow of a high contrast?
for image super resolution
I am looking for state of the art methods which are being used for human pose, human upper body, and head detection in still images mainly.
I am working on SIFT and want to know the best matching technique with reference to this to get to know about vlfeat toolbox. It provides a function for feature matching i.e. vl_ubcmatch. I just want to know how this function works and its matlab code. Thank you
I am doing project on night time vehical detection in this only detect the vehical lights so pls give the source code for this project .
in this i am detect lights saparately but in my output street lights also shown in output so please help to me.
the second image is my output image.
first image is my expected output image


Hello I want to implement kadir operator to my saliency maps but I don't understand this method. Are there any matlab codes or libs for kadir operator?
What is the approach to measure the real size (length, height and width) of an object from an image while I don't know the focal length or the object distance (I don't know the origin of the image, i.e. any technical details of the lens or the camera)?
I have a requirement to scan large documents and extract the text out of them. How should we scan the book and what ways are the most efficient ways of doing this. How can I do this in the most efficient way and be able to get the most accuracy from an OCR program.
I have read the paper, "Vehicle logo recognition in traffic images using HOG features and SVM", published in 2013. I have some questions which confuse me several days.
First, in part B of section III, I don't know what is overlapping block. What is difference between "overlapping block" and "block"? According to the Fig.8, is the overlapping block equal to the area which is overlapped by two near blocks? Is the overlapping block only made up by left block and right block? and the overlapping block "is not" made up by upper block and bottom block?
Second, a feature vector is obtained by sampling the histograms from the contributing spatial cells. Is the cells in the overlapping block?
Finally, I don't really understand what is multiple binary classification problems? How to reduce the single multi-class problem into multiple binary classification problems?
The attachment is the paper. Thank you very much for your reply.
if possible, help me with the matlab code... Thanks in advance......
Does anyone have any experience on the development of a domestic robot's ability to locate itself in an indoor environment?
For your answer, take into account the possibility of having a camera on the robot (image recognition may be a way to go?).
I believe it may be necessary to take multiple inputs. For example, an image recognition algorithm, together with a "dead-reckoning" method, such as estimating diplacement as a function of the revolution of the robot's wheels could be used to estimate the position of the robot.
All feedback would be greatly appreciated, as I am just starting with this investigation.
Thank you very much!
I am working on action recognition from multi-views skeleton images captured using kinects. However, when I acquired skeleton images using three kinects with 3 notebooks the images frame across devices is not quite synchronized. Is there any ways to sync all frames from different devices? Or is there a public available database of synced skeleton coordinates that I can download to perfrom action recognition?
I would like to extract various image features for phone screenshot images recognition. I hope the feature extraction method runs fast, so perhaps the method should be implemented in Python and / or C ++. Any source code links would be very helpful!
Thanks a lot!
I plan to do some work about image recognition on mazie pattern recognition for our spray machine. But I didn't find any related research so I don't know difficulties that need to be overcome. So here I need some advice from those who have done related work before. Thanks!
The sprayer is one type of fogging machine which the effective distance of spray is 10m.
I know this is a fairly basic question but I'm having some difficulty finding anything out there - does the emotion of the perceiver influence how neutral faces are encoded / recognized?
Can anyone suggest a novel technique for Arabic character recognition system ?
I am working on pattern detection of dermatological images and I would like to know how to extract and match them.
Is there a way of labeling multiple objects within an image or each object would be separated to get the results on their feature basis?
How would I specify that group 1=apple,group 2 = orange and group3=banana?
SDK -- Software Development Kit
OMR -- Optical Mark Recognition
1. students' profile in a structured format, 2. MPEG7 CE-Shape 1
Also needed are: english fnt datasets and kth-tips
There is attribute annotated images of imagenet. I expect to use them in my research but being a comparison I am also looking for other stuffs made on this subset. Do you know any work you are able to suggest to look?
I need to extract areas that contain any text from the given image which is more like a scene than a simple document. As the next step, I also intend to recognise the text.
In the research for moving object detection in moving camera the first step is estimation ego-motion and then do stabilization the sequence video but some of research have other approach, do you know which they are practical and efficient?