Science topic
Chemometrics - Science topic
Chemometrics is the science of extracting information from chemical systems by data-driven means. It is a highly interfacial discipline, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, applied mathematics, and computer science, in order to address problems in chemistry, biochemistry, medicine, biology and chemical engineering.
Questions related to Chemometrics
May I know if there is any website that have free dataset of gravimetric method (ex: karl fischer titration) and spectra method?
It will be great if anyone willing to share their dataset.
Thanks
Hi!
I'm doing the NIR research and I'm newbie on chemometrics. To analyse NIR data, I use PLS regression as I'd like to quantify the number of components on essential oil. My question is, how to determine the optimal latent variable (LV) in PLS regression? should I do trial and error one by one then choose the LV with the lowest RMSECV?
Thank you
I have 17 different steel samples, in which i identified a peak of Ti 441.492.
But , if you look closely at photo, while some some peak are at 441.492 , some are at the slight offset. eg 441.490 ,441.488 etc
I am currently performing , Univariate Analysis for detection of Ti Concentration.
What could be the reason for this ? Also , while taking measurements , prior wavelength calibration was done.
Can , i choose such peak for Univariate Analysis ?
Also , how can i identify lines , which are prone to self - absorbtion ?
I am new into chemometrics , and currently into the learning phase.
Thanks and Regards,
Rahul P

Hello Everyone,
I was reading an article about HSI (hyperspectral imaging)and came across figures representing surface scores for each PC. What do these figures represent?

If yes, would you please provide examples from literature.
Hi PLS Experts,
I am an absolute beginner at using PLS, and I need your help. I have practised using PLS in R as well as in SPSS.
I am interested in using PLS as a predictive model. I have 2 Dependent Variables (DV) one is continuous while the other is categorical with 3 levels. However, I am not using these in one but model but rather in a separate model, as the categorical variable is also an independent variable in model 1 (the model with Continuous DV).
I am confused with the term Y-Variance Explained (For DV) and its effect on model performance.
Is a low percentage Y-Variance explained (in all components) mean poor prediction by the model?
I recently applied PLS on standardized data with 1 DV and 14 Predictors using R (mdatools package). The cumulative x-variance explained was 100%, but Y-variance explained was only 29% in all 14 components (optimal number of components is 3).
I am unable to explain the reason for such poor performance.
A summary of model is attached in the figure. (Predictions are in bottom right part of image).
Thank you for you time :)
Best
Sarang
#PLS

Dear All,
I'm looking for a suitable and a simple software of Chemometrics technical,
as effective tools and how can i use?
Dear All,
I'm looking for a suitable and a simple software of Chemometrics technical,
as effective tools for application in exploring chemical data in analytical chemistry, what do you recommend for beginners?
Hi there!
I work with infrared spectroscopy (NIR & FTIR) in the field of food science/food chemistry. I'm looking for collaborators with experience in chemometrics (particularly PLS-R & PLS-DA, but other discriminant methods such as SVM or neural networks would also be great). In particular, I'm after people who would be interested in helping with data analysis & writing up some papers based on data I have collected.
If you are interested & have such experience, please contact me & I would love to discuss with you.
Joel
Imagine you have measured a series of curves, e.g. spectra of a dissolved compound of various concentrations, films with different thicknesses etc. Before you can retrieve the data, someone meddles with it, i.e. multiplies it with an unknown factor (or, alternatively, assume that your empty channel spectrum changes within the series), large enough so that it matters, but small enough that the data still seem to make sense.
Do you know a method that not only indicates that the curves have been altered, but also allows to retrieve the original/unflawed data?
I am planning to use PCA and OPLS DA for my study in biochemometrics but i quite tight on budget. I am not sure how much is the SIMCA software although they have a trial version, I am worried if I'll be able to maximize the use of the free version in my data. Are there alternatives that is cheaper or free but will give quality data analyses on PCA and OPLS DA?
I cannot find sources which gives a thorough explanation of PCA and how to assign the principal components 1 & 2 including their computations. Say for example, my study will explore polyphenol profiles of a certain plant from different geographical area and I will test their antioxidant activity too and analyze them using biochemometric approach. Which variables should be included in principal components? I will also integrate data from this PCA to construct my OPLS DA.
I wonder if gas sensor responses can be correlated with non-volatile components?
We are planning to analyze the activity of the group of phytochemicals based on chemometric analysis, we are finding difficulty wrt various formats accepted by the software/ servers, so looking for a software or server which accepts data formats like m/z.
Does it make sense to develop compression methods for large matrices used in chemometrics for multivariate calibration?
The main argument of opponents of this method is “increasing computational power and speed of computers for data processing and unlimited cloud data storage available” do not require compression since the compression slightly reduces the accuracy in multivariate calibration. (Cited from Personal communication).
This preprint compares the most advanced automated commercial analysis approaches for vibrational spectroscopy including Bruker Lumos II in combination with Purency Microplastics Finder R2021a (FPA-FTIR), Agilent 8700 LDIR (QCL) in combination with Clarity, and WITec alpha300 R in combination with Particle Scout.
Its really worth reading.
Taking infrared data, as I would like to calculate the LOD so detailed.
When creating & optimizing mathematical models with multivariate sensor data (i.e. 'X' matrices) to predict properties of interest (i.e. dependent variable or 'Y'), many strategies are recursively employed to reach "suitably relevant" model performance which include ::
>> preprocessing (e.g. scaling, derivatives...)
>> variable selection (e.g. penalties, optimization, distance metrics) with respect to RMSE or objective criteria
>> calibrant sampling (e.g. confidence intervals, clustering, latent space projection, optimization..)
Typically & contextually, for calibrant sampling, a top-down approach is utilized, i.e., from a set of 'N' calibrants, subsets of calibrants may be added or removed depending on the "requirement" or model performance. The assumption here is that a large number of datapoints or calibrants are available to choose from (collected a priori).
Philosophically & technically, how does the bottom-up pathfinding approach for calibrant sampling or "searching for ideal calibrants" in a design space, manifest itself? This is particularly relevant in chemical & biological domains, where experimental sampling is constrained.
E.g., Given smaller set of calibrants, how does one robustly approach the addition of new calibrants in silico to the calibrant-space to make more "suitable" models? (simulated datapoints can then be collected experimentally for addition to calibrant-space post modelling for next iteration of modelling).
:: Flow example ::
N calibrants -> build & compare models -> model iteration 1 -> addition of new calibrants (N+1) -> build & compare models -> model iteration 2 -> so on.... ->acceptable performance ~ acceptable experimental datapoints collectable -> acceptable model performance
Hi Everyone,
I have acquired some plant hyperspectral images (roots, fruit, leaves) from various environmental conditions and now want to explore the data cubes to detect possible differences and plan to study plant physiology and chemometrics in future. The built-in software with the camera (Specim IQ studio) is not serving the purpose.
Any suggestion for easy-to-use and simple interface software or analysis pipeline for such exploration and making classifier models? Preferably open-source but commercial suggestions are also welcomed.
Many thanks in anticipation.
I'm stuck on how to and what constraint to be applied while using MCR for IR spectroscopy data. There are 3 types of constraint : Equality constraint, Unimodality constraint, Non-neagativity, and Closure constraint. Please help me proceed.
There are many tools to build a PLSR model to predict a response variable Y based on a multivariate predictive variable X (reflectance spectra for example). My question is the following: once we have built a PLSR model, is it possible to simulate X for a specific Y? Is it possible to do it with R?
I am validating a method of quantification of adulterants in olive oil by infrared spectroscopy using the multiple linear regression calibration, and for the validation I intend to use limit of quantification and detection.
I have been working on creating a multivariate predictive model (most likely PLS) using FTIR spectra to predict the amount of a component in complex mixtures. In order to calibrate the model I have considered two pathways - creating synthetic mixtures of the component and confounders or using an established technique to give me the true value of the component in real mixtures. Both pathways have big limitations. I am now considering spiking real mixtures with several known (by weight) amount of the component. How can I implement this in a X-Y chemometric model ?
I have three dataset of quantitative and qualitative liquid chromatography, GC-MS and DNA barcoding of few samples.
What would be the best way to discriminate and visualize the data?
In all the textbooks related to chemometrics, PLS has been described as a regression technique that requires a dependent and independent variable for generating a regression line. However, in some chemometric softwares (especially Unscrambler), the while using PLS it shows an score plot (similar to PCA). It seems they are using PLS-DA. I am unable to understand what they are doing as no regression line has been generated.
The following figure corresponds to a study on liquor samples from three different geographical regions. We have applied PLS (it seems it is PLS-DA, not PLSR) in unscrambler. It has classified samples. However, I am unable to understand how to interpret it and whether it is accurate or not.

I want to join a respectable research group that is concerned with Chemometrics
Having reviewed literature on the use of chemometric approaches in quality assessments of medicines (including herbal medicines), I realised that several approaches are adopted. For example, in preprocessing of the data for further analysis, literature reports of methods like normalisation, peak centering, warping, smoothing among others.
Having in mind that the way you preprocess the data may affect the final outcome of the multivariate analysis, I want to find out if there exist any protocol guiding the adoption of any of these tools. For instance, when analysing chromatographic data from HPLC, you may have to correct baseline, then warp and normalise or something. Also, when dealing with FTIR data, you may have to first correct baseline, normalise and smooth (how do you determine the smoothing points?) among others. Are there specific preprocessing tools for specific datasets (that is from different instruments like FTIR, HPLC, LC-MS, etc) and are there specific procedures for the use of such, so that irrespective of who is conducting such analysis, the outcome may always be reproducible?
Thank you.
Having reviewed literature on the use of chemometric approaches in quality assessments of medicines (including herbal medicines), I realised that several approaches are adopted. For example, in preprocessing of the data for further analysis, literature reports of methods like normalisation, peak centering, warping, smoothing among others.
Having in mind that the way you preprocess the data may affect the final outcome of the multivariate analysis, I want to find out if there exist any protocol guiding the adoption of any of these tools. For instance, when analysing chromatographic data from HPLC, you may have to correct baseline, then warp and normalise or something. Also, when dealing with FTIR data, you may have to first correct baseline, normalise and smooth (how do you determine the smoothing points?) among others. Are there specific preprocessing tools for specific datasets (that is from different instruments like FTIR, HPLC, LC-MS, etc) and are there specific procedures for the use of such, so that irrespective of who is conducting such analysis, the outcome may always be reproducible?
Thank you.
What are the formula's and peak fitting types needed e.g. gaussian, lorenzian...
Hello,
I am performing a Support Vector Machine regression using unscrambler software, but when I try to predict using the SVR predict option, I only get the predicted values(not all the plots as I got in PLSR using unscrambler).
Please suggest how can I get R square and Root mean square error of prediction for the results?
There is a need to ensure extraction yield of the soluble coffee process, thus we need a method to analyze carbohydrate degradation products produced after hydrolisys of polysaccharides due to high temperatures used during extraction. It is important to measure in process the extraction efficiency to fine tune the process. Which secondary rapid method could be used to measure carbohydrate degradation products in the liquid phase and how can it be calibrated on primary methods?
Dear to whom it may concern,
I would like to ask people who are interested in univariate analysis in metabolomics. Now, I am proceeding my metabolomics data using univariare analysis, namely p-values and FDR-adjusted p-values.
However, as far as I know, the calculation of a p-value for each feature depends on two factors: (a) distribution of each feature and (b) variance of each feature between case and control group. To be more specific, the first step is that we need to apply a statistical tool (I do not know which tool can help me to check this issue) to check whether one examined feature is normally distributed in both these groups or in only one of them, and of course, there are two scenarios as follows:
1. If this feature is normally distributed in both these group, we proceed to use F-test as a parametric test to check whether the variance of this feature in both these groups is equal or unequal. If it is equal, we can do a t-test assuming equal variance, otherwise, a t-test with unequal variance must be taken into account.
2. If not, a non-parametric test will be applied to obtain a p-value for this feature. In this case, may you please show me which tests are considered as non-parametric tests?
I am unsure that what I mention above is right because I am a beginner in metabolomics. In case, this procedure is right, that means that each feature will be processed under this step by step one to obtain a p-value because all features are expressed differently in the distribution and variance way between these groups (case and control).
I hope that you may spend a little time correcting my idea and give me some suggestions in this promising field.
Thank you so much.
Pham Quynh Khoa.
I am working on a mixture of three different API in my product. Their peaks are merging with each other. I would like to use chemometrics to solve my problem. Please suggest.
In our study, PCA was applied on ATR FTIR data. In the loading plots PC1 and PC2 are showing positive and negative correlation in certain regions of wave numbers. Please suggest how to interpret this positive and negative correlation and what does this signifies.
How can I select the best descriptors to build a QSAR modeling?
What is the development of molecules and drugs?
What is the best computer program for the calculation of stability constants of metal comlexes from UV-Vis spectrophotometry titration data?
I would like to know if we can use a handheld Raman spectrometer and a chemometric technique for analyzing the content of active ingredient in presence of other excipients?
The excipients are coconut oil, beeswax and some flavoring agents.
The spectrometer is based on Spatially offset Raman spectroscopy (SORS) technology.
Can I use it to analyze the active ingredient in the finished product quantitatively?
Also, if I can then which chemometric method is best suited for the analysis?
I have the data of total dissolved soilds of apple as references (y-variable).
I also have near-infared spectra data as predictors (x-variables).
I have the StatSoft Statistica software for the analysis.
Dear fellows
I'm a biologist and I've been assigned the task of aligning the chromatograms of some plant samples I processed in the past.
I've been doing some research and all of the methods used for chromatogram alignment either involve dynamic programming or are made in softwares for which I don't posess a license.
So I wonder if any of you can provide me with some guidance as to how to complete this task (taking into account that I'm not familiar with chemometrics).
Thank you a lot for your time and help,
Vanesa Díaz
How do you prepare graphical abstract, schematic figures and similar extras for your papers? Is there a software with good vector graphic library of shapes?
Besided Corel, Inkscape...
Using this ( http://physics.nist.gov/PhysRefData/ASD/lines_form.html ) website, I am trying to find the element corresponds to wavelength.
For example,
I want to find, the element at the wavelength 521.3891nm but no element is present at the exact wavelength in that website. Then I tried to find the nearest value which could match chemical composition table. I discovered that 521.3841nm represents Fe I element, but the problem is that 'Fe' is present in a very negligible amount (i.e. ppm) in composition table. Cu component is present in abundance and 521.2780nm represent Cu I material.
I am totally confused, what to choose, either 521.3841 as Fe I element or 521.2780 as Cu I element.
Please, let me know, the fundamental principle behind choosing element corresponds to the wavelength in spectroscopy. Let me know if any references are available.
I hope, my question is clear to understand, if not please let me know.
Waiting for healthy discussion.
Thank You very much for your time.
* i have collected soil samples during hyperion pass over the study area and chemical analysis was done.
* Lab spectral signatures has not taken.
* How can i correlate the chemical analysis results to hyperion data after preprocessing?
Please
I want to know which program to use and a step-by-step summary to get a percentage of difference, or a correlation, between the two spectra.
PS: I know that is a lot of things to explaining, but i'm already search on a lot of sites and nothing.
External Parameter Orthogonalisation (EPO) algorithm is used to remove the effect of soil moisture from NIR spectra for the calibration of SOC (Soil Organic Carbon) content. This algorithm is used for pre-processing of spectra of soil taken from spectroradiometer and then PlS (Partial Least Square) regression is apllied.
When imaging protein solutions (1ug/ml,10ug/ml,100ug/ml,1000ug/ml ; diluted with PBS buffer 7.4) on a gold surface, what is the optimal pretreatment(s) to separate out the effect of the buffer (in my case PBS) which interferes with bands of interest for proteins?
10ul of each concentration was dropped on a clean gold slide and allowed to dry for 24hr under Nitrogen purge, followed by FTIR-reflectance imaging.
Since samples with 1000ug/ml are highly concentrated, their signal appears very clear (Amide I, Amide II, Amide III, Amide A, Amide B), however at concentrations below and = 100ug/ml, protein signature is dominated by PBS buffer bands.
What kind of univariate or multivariate methods would you apply to
(a) identify protein pixels (remove interference of buffer, slide background if any)
(b) quantify protein pixels (eg. make PLSR model on 0ug,1ug,10ug,100ug,1000ug) and predict concentration level of an unknown dried protein sample?
Anyone knows about research-related or industrial applications of CNN for HSI data in the food science field? Unlike SVM, KNN, and other "shallow" machine learning algorithms, the CNN enables taking advantages of the spatial information of HSI data.
Most published papers deal with extraction of deep spatial features for architecture classification and other domains. But in the literature, I don't find applications of this deep learning technique on food/agricultural products!
Thank you !
Is only principal component analysis well enough to carry out chemometric analysis with large data?
I would be very grateful to anyone who can send me propolis samples of about 2 g along with any available information about the origin and collection date of the material.
I will analyze the sample using GC/MS to form a database for chemometric investigations connecting propolis content with its place of origin.
I am interested in raw propolis and not its ethanol (nor any other) extract.
For those interested, I would be happy to send the results of my analysis.
Are you aware of industries that already make use of FTIR and chemometrics as their SOP for microbial testing? If not microbial testing, do you know of other industries that use FTIR and chemometrics as their SOP?
I need free software (open source or that can be found in cracked version) and relatively simple one (that doesn't require coding) for doing PCA for medium sample size data (19).
Sample size: 19
Variables: 5, co-related variables.
I was using The Unscrambler software, but it is not free! Now, I am trying with spss. Is there any better software than this? What about origin?
especially for spectrometric analysis
Dear,
I've trasformed my data to avoid problems of non-normality and heteroscedasticity. Then I made the statistical analysis and the post-hoc test. Now, when reporting my data, which data should I report?
The original one or those transformed?
Thanks.
Hello everyone,
Recently I have read some articles using the framework of model population analysis (MPA). I think that MPA works by generating a number of submodels and then use statistical methods to analyse the interested information of the submodels. I am concerned about the following issues:
1. In terms of parameter optimization, what is the difference btween MPA and some tradditional intelligent optimization methods such genetic algorithm and simulated annealing?
2. In terms of Bayesian statistics, is MPA a method yeilding the likehood or posterior of the model population?
I have been using FSCVA for assessing fast dopamine fluctuations of dopamine in the rats brain in vivo. It would be good to analyse the same recordins or new ones on dopamin changes within minutes and hours. But changes in baseline currents and other factors influence the picture. I know the whole accepted way is the principal component regression. But it is too comlex for me to make it in practice. And I do not need absolute concentrations of dopamine, just changes. Are there any ather techniques, perhaps more simple? Or are there free soft for principal component regression?
I am looking to do a Common Components and Specific Weights Analysis which can be done in MATLAB with SAISIR toolbox for example but I wonder if there is any R coded function for this type of analysis.
thanks by advance!
In chemometrics discrimination model using Mahalanobis distance, it would not work if the number of samples is less than the number of variables. From a book written by G. Brereton, he has mentioned that it is because the variance-covariance (matrix C) would not have an inverse.
Could anyone tell me why variance-covariance matrix would not have an inverse if the number of sample is less than the number of variable?
I'm approaching to the development of Hyperspectral data analysis but instead to use multivariate and chemometric methods i would like to use more direct method for the distinction of NIR spectra.
Starting from a reference spectra, i would like to implement some analytical methods to distinguish this reference spectrum to other acquired spectra. Actually i'm using the Pearson correlation coefficient or the calculation of the standard deviation of the difference between the reference and the acquired or also the concept of distance. Anyway i'm looking for other methods and verify the speed of the calculation. Thank's in advance.
I am interested in analyzing the chemistry of oil production waters, with the purpose to refine calculations of mineral scales and its possible correlations with radiolements.
Raman Spectrum Database where carotenoids such as Lycopene, B-carotene, and Lutein can be downloaded in the form of an .spc file so that it can be used into other softwares such as MATLAB.
Hello,
Recently I have got expertise in chemometric analysis. Chemometric is a techniques where statistical tools especially multivariate analysis is applied to chemical or biochemical data to interpret large volume of data in a reduced dimension. During reduction of dimension of data, there is no loss of significant amounts of information. Two method in chemetric are popular such as PCA and AHC. This two tools helps you to classify the genotype, effect of treatment, geographical origin, degree of adulteration in sample based on biochemical data.
Dear author, if you are interested to work on chemetric I will help you.
For further query please contact 09369641602
Tanmay Kr Koley
Hi Scientists!!
I am trying to choose the best model in a experimental arrangement of four factors and three levels.
I don't know what model is more correct? I chose Box-Behken but I am thinking about central-composite design.
I wanna be sure of my election!
Can you help me?
Dear RG Members,
I try to analise multivariate infrared spectroscopy using PERMANOVA approach and I've like to know what is better distance matrix eg. mahalanobis, euclidian or other for this kind of data set? Please some paper that's supports this are welcome.
Thanks,
ASANTOS
What is the most suitable software tool for data processing and chemometrics applied to NIR/IR, Raman, X-Ray hyperspectral imaging? Please could you share your experience.
Hi
The reviewer ask this question
What's the proportion for the peak area of identifiable compounds based on NIST database to the total detected products used GC-MS?
what he mean by this ? and how should I answer him??
''''" The fraction peaks from GC-MS spectrum were identified via National Institute of Standards and Testing (NIST) library. The identification of the major products was based on a probability match equal or higher than 95%". What's the proportion for the peak area of identifiable compounds based on NIST database to the total detected products used GC-MS? This also should be pointed out in the manuscript.''
I possess a GPC 220 PL with a triple detection : RI - Visco - LS( 15° and 90°)
thanks for your help.
What type of chemometric method (or methods) the researchers involved in the project intend to use to model the spectral data obtained ?. The project is very interesting and research will be the goal in a few years, from my point of view.
Regards
I know these diagrams are very common in water research - a tool is required to plot fluorides, chlorides and sulfates contents in the same graph.
I have one question on using Chemospec package in R program. When doing Infrared data analysis, which kind of normalization method should be use when I did peak normalization (normalize the spectra to a peak that is not the most intense)?
See the attached file in page 8 about the normalization explanation, but I am still a little confused
I am looking for a large dataset of NIR spectra (more than 500-1000 samples) including sample parameters and the results of the calibrations (errors in determining these parameters). Maybe some data have been published and the dataset is in supplementary materials?
In the paper "A statistical approach to determine fluxapyroxad and its three metabolites in soils, sediment and sludge based on a combination of chemometric tools and a modified quick, easy, cheap, effective, rugged and safe method", the authors found that the intensity of compound were affected by the prepared solution. So, what is the mechanism? In addition, the mobile phase compositions (exclude the radio of mobile phase of A, B...) could affect the retention behaviors and intensity. How?

for instance I prepared 20 mixtures of two different dyes and got there absorbance at 10 selected wavelengths to use it for ANN but I don’t have enough experience in using- Matlab-for this task. I have problem in : 1. Preparing data matrix (10 selected wavelengths) for input and the concentration for output. 2. Extracting the model algorithm so I can use it as calibration curve between my absorbance and concentration and calculate the remaining dye concentration after treatment of water by different coagulants.
Greetings,
Can anyone please suggest a source for finding the IR spectra of solvated sulfur mustard, specifically in water. I need the spectra for comparison and identification purposes.
Thank you in advance,
Ema Sh
Does anyone knows how to open/run DPT files (NIR spectroscopic data) in The Unscrambler software?
Actually, I am submitting the job into the Polyrate 8.0 software for my reactants and products to calculate the rate coefficients for a particular reaction.
But, in the output, I am getting some non-zero imaginary frequencies. In principle, for reactants and products, there should not be any negative frequency(NImag=0). I have made sure that my inputs are correct including the Z-matrix too for the reactants and products.
I am unable to understand as to why this is happening. Can somebody please help me with this?
I am also attaching the i/p files for your reference.
The files "r1.dat" and "r1.71" are the i/p files for Polyrate. And, the file "esp.fu82" is the Gaussian o/p file from the Polyrate software.
The Mass Spectra profile of 1-Amino-2-naphthol shows three major peaks at 159 m/z (which is the molecular weight of the compound having 100% intensity), 130 m/z (70% intensity), and at 103 m/z (15% intensity). What is the mechanism of fragmentation of the compound? What are the compounds that are being formed after fragmentation at 130 and 103 m/z?
Kindly elucidate the fragmentation mechanism so that I can correlate with other naphthalene derivatives.
Thank You
I found that almost all the applied cases related to pharmaceutical analysis through UV spectroscopy obey beer's lambert law which is a linear problem. Is there any application in the pharmaceutical analysis field where non-linear models should be used rather than linear models in UV spectral data ? thanks
As there are many different techniques available - it is difficult to understand which technique is a good one given a certain kind of data.
I tried writing the script as follows.
from rdkit.Chem.AtomPairs import Pairs
from rdkit.Chem import MACCSkeys
from rdkit import DataStructs
from rdkit.Chem.Fingerprints import FingerprintMols
suppl = Chem.SDMolSupplier('cdk2.sdf')
ms = [x for x in Chem.SDMolSupplier('cdk2.sdf')]
for m in suppl:
... if m is None: continue
... fps = [FingerprintMols.FingerprintMol(x) for x in ms]
... DataStructs.FingerprintSimilarity(fps[i],fps[j])
...
But it is showing the following error.
Use integer in not tuples in place of i, j.
I was wondering if it is possible to put all the molecules in a loop so that I don't have to give separate entries for different molecules.
Also, I want to calculate all the fingerprints simultaneously if it is possible.
Kindly, help me with this.
Would it be the number of components, how to handle scattering, fixing outliers or something completely different?
We are working on a device to measure fat in milk through NIR Spectroscopy. But no light (12W tungsten lamp) could pass even through 0.5mm milk. Is there any way to reduce the turbidity of the milk by dissolving proteins?
I am aware of the trypsin method to hydrolyze proteins, but is there any other inorganic chemical (which is more robust, insensitive to low temperature and fast)? Is EDTA method superior to Trypsin in terms of dissolving proteins?
I am currently working on non-invasive blood glucose measurement using photoacoustic spectroscopy in near IR region(905nm). While using the laser source, what optical power is advisable? Is there a limit on usage of laser on skin?
I'm Katrul. Currently, I do SIMCA for classification analysis in chemometric using MATLAB. There are 2 class for my data: 15 data for class 1 and 150 data for class 2.When I did SIMCA, I obtained Q and T2 for each class. In order to represent the result of classification analysis, it is suggested to use Q vs T2 or Coomans plot. For Q vs T2 plot, I should have two plot but for class 1, I only have 15 data of Q and T2. Meanwhile, when I refer to a journal, all the dataset should be included in each class. What is your opinion regarding to this matter?
When I want to do the Coomans plot, I have to calculate sample to distance model. Do anyone have an alternative to do Cooman plot in SIMCA such as Matlab script?
Is it better to have one compared to the other?
In the multivariate significance test there are the hypotheses of Wilk, Pillai, Hotellng and Roy. Which is the statistical significance of each one? What should I use for my data?
I wish i could get reference for all calculations pertaining to PLS PCR CLS and other methods.
Dear All,
I'm looking for a suitable and a simple software of Chemometrics technical,
as effective tools for application in exploring chemical data in environmental analytical chemistry, study about an investigation of soil pollution (identification of sources of pollutant,......).
.
I would like to be able to overlay a variety of Theoretical IR spectrum with a single Experimental IR spectrum and be able to determine the best matching theoretical spectrum. Is there a way to do this quantitatively on excel, sigmaplot, or any other plotting program?
Attached is an example spectrum where the black line represents the experimental spectrum. The red and blue areas/lines represent two different theoretical spectra. Both provide reasonably good matching, so I would like to quantitatively determine which matches the best.

Turn-on fluorescence sensors, as you may know, are the type of sensors which do not show any fluorescence emission peaks in a specific wavelength till they bind with the target analyte and that cause appearing a peak at mentioned wavelength.
All of us are familiar with 3S/m formula, but the question is when you do not have any emission peak for blank, then what formula can be used to measure LOD?
(I do not want to use graph to find LOQ and then convert it into LOD)
I'll be appreciated if you share any papers or references about this topic with me.

Hello,
We want to use upconversion nanoparticles that convert from the visible or NIR range into the UV range for a series of experiments. If possible, we would like to avoid synthesizing and characterizing the particles as our lab is ill-equipped to do so. I've looked around for a commercial source for purchasing UV upconversion nanoparticles. I found three companies - Mesolight, American Elements, and Nanograde - that seemed to have what we need, but all either can't provide the particles or can only offer very small quantities.
I was hoping someone here has a source for these particles that they can point me to.
Thanks!
In validation studies we are obliged to work with a special number of calibration levels
I need to know the binding energy, vertical ionization potential, vertical electron affinity and homo-lumo gap of 7 atoms lithium clusers of decahedral shape, both the theoretical and experimental data
Can anyone pls help?
I am working on a dimer system recently. And I use the molecular dynamics package AMBER to perform the REMD simulation during my work.Is there any one can help me with the question on how to confine the dimer into an imaginary sphere to avoid the molecules from flying apart from each other ?
Preprocessing methods in NIR spectroscopy.
I have been analysing Raman Spectroscopy data as a predictor of meat quality and in my latest data I have been getting R^2cv values which are significantly and consistently higher than the R^2cal values. For example I've gotten an R^2 Cal: 0.00102829 and
R^2 CV: 0.288515. I have been using MatLab Software with the PLS toolbox, leave one out cross validation and 20 maximum latent variables.
Any ideas as to why it's happening would be appreciated.
A membrane consists of excipients and solvents blended together.
I would like to get more literature on chemometrics.
I want to extract a signal of given protein from FTIR spectral data of mixture of proteins. As the IR signature of different chemical molecules are different, so will the signature of a protein be also be different.