Questions related to Algorithm Development
Fragmentation trees are a critical step towards elucidation of compounds from mass spectra by enabling high confidence de novo identification of molecular formulas of unknown compounds (doi:10.1186/s13321-016-0116-8). Unfortunately, those algorithms suffer from long computation times making analysis of large datasets intractable. Recently, however, Fertin et al. (doi:10.1016/j.tcs.2020.11.021) highlighted additional properties of fragmentation graphs which could reduce computational times. Since their work is purely theoretical and lacks an implementation, I'm looking to partner up with someone to investigate and implement faster fragmentation tree algorithms. Could end up being a nice paper. Anyone interested?
I have a new idea (by a combination of a well-known SDP formulation and a randomized procedure) to introduce an approximation algorithm for the vertex cover problem (VCP) with a performance ratio of $2 - \epsilon$.
You can see the abstract of the idea in attached file and the last version of the paper in https://vixra.org/abs/2107.0045
I am grateful if anyone can give me informative suggestions.
A tunable clock source will consist of a PLL circuit like the Si5319, configured by a microcontroller. The input frequency is fixed, e.g. 100 MHz. The user selects an output frequency with a resolution of, say, 1 Hz. The output frequency will always be lower than the input frequency.
The problem: The two registers of the PLL circuit which determine the ratio "output frequency/input frequency" are only 23 bit wide, i.e. the upper limit of both numerator and denominator is 8,388,607. As a consequence, when the user sets the frequency to x, the rational number x/108 has to be reduced or approximated.
If the greatest common divider (GCD) of x and 108 >= 12 then the solution is obvious. If not, the task is to find the element in the Farey sequence F8388607 that is closest to x/108. This can be done by descending from the root along the left half of the Stern-Brocot tree. However, this tree, with all elements beyond F8388607 pruned away, is far from balanced, resulting in a maximum number of descending steps in excess of 4 million; no problem on a desktop computer but a bit slow on an ordinary microcontroller.
F8388607 has about 21*1012 elements, so a balanced binary tree with these elements as leaves would have a depth of about 45. But since such a tree cannot be stored in the memory of a microcontroller, numerator and denominator of the searched Farey element have to be calculated somehow during the descent. This task is basically simple in the Stern-Brocot tree but I don't know of any solution in any other tree.
Do you know of a fast algorithm for this problem, maybe working along entirely different lines?
Many thanks in advance for any suggestions!
I am a research assistant on hydrology and fluid dynamics, with a background in Civil Engineering, Currently, our research team have been working on evaluating Data Assimilation techniques for rainfall-runoff predictions.
I would like to know if you have suggestions of studies or books about data assimilation especially focused on the applications in hydrology and hydraulics. Are there any similar studies to recommend? Does anyone have a solved numerical example of rainfall-runoff predictions by means of data assimilation techniques?
Thank you in advance.
I have a 1489 spike protein sequence file. I want to extract codon sequences, of 6 amino acids from this with their respective header. I don't know any sort of programming, so can anyone help me with this?
A big thank you in advance.......
AI is the intelligence exhibited by computers and software representing the third era of automation. AI is currently used in robotics, machine learning, analytics, decision-support, and virtual personal assistance. AI is expected to transform many industries all over the world. The success of companies is increasingly defined by how they manage the integration between human workforce and automation. Businesses need to benefit from new AI technologies and systems to enhance productivity and increase human intelligence. However, AI is not seen by all as a tool that complement rather than replace manpower. Many humans fear that smart robots will substitute their human counterparts in the labor force. What is your opinion on the role of AI in job enhancement and boosting of revenue?
I have in mind, that Logic is mainly about thinking, abstract thinking, particularly reasoning. Reasoning is a process, structured by steps, when one conclusion usually is based on a previous one, and at the same time it can be the base, the foundation of further conclusions. Despite the mostly intuitive character of the algorithm as a concept (even not taking into account Turing and Markov theories/machines), it has a step by step structure, and they are connected, even one would say that logically connected (when they are correct algorithms). The different is, of course, the formal character of the logical proof.
I have a graph as in attached figure. I have to extract connected nodes from graph based on edge weights. If edge weight is less than a certain threshold, we have to consider that there will be not be any connectivity between that nodes. I have attached expected subgraph with this mail. Is there any efficient algorithms available to extract these type of nodes? In this attached sub graphs, all nodes are grouped if the edge weights are above 1
What kind of software or what kind of method could be used to manage the huge amount of paper, so that you could find whatever paper you have read fast.
If I have some nodes coordinates and I have to group these nodes together if there is path exist between them. Is there any algorithm available?
The main graph structure looks as in attached figure.
By looking at the main graph,
If I have co-ordinates of nodes 4,5,6,7 and 3 in the graph generated using above code. I have to group 4,5, and 6 together and 3 and 7 as another group. Is there any algorithm available for this kind of grouping?
Thanks in advance,
Recently, I have seen in many papers reviewers are asking to provide computation complexities for the proposed algorithms. I was wondering what would be the formal way to do that, especially for the short papers where pages are limited. Please share your expertise regarding the computational complexities of algorithms in short papers.
Thanks in advance.
I would like to compare my algorithm (the improved LPA) with Louvain, infomap and CNM (fast greedy) algorithms (available in Mat lab toolbox community) that has been implemented on LFR dataset.
I confronted with a problem when I cannot use the outputs of the algorithms for NMI criteria.
I will gratitude everyone could guide me about the matter!
Any new innovations other than First-Come, First-Served (FCFS) / Shortest-Job-First (SJF)/ Round Robin (RR) or mix-development of those?
I am interested in global convergence and application of that algorithm to different area.
Any help is highly appreciated
From Simulink subsystem after generating C code using Embedded coder wanted to find the number of lines the code will be executed considering the worst path ( lengthy path it the code can follow. Any algorithm or methods references available which can be used for the purpose ?
for e.g. :
1. If (a>b)
in this case possible maximum length is 1-->2-->3--4 .
maximum number of executed lines will be equal to = 4 in this case , considering if condition true .
If else condition get executed since number of lines under else condition is one total number of executed lines will be equal to 3
So the algorithm should return maximum number =4
There is an idea to design a new algorithm for the purpose of improving the results of software operations in the fields of communications, computers, biomedical, machine learning, renewable energy, signal and image processing, and others.
So what are the most important ways to test the performance of smart optimization algorithms in general?
Its going to be a huge shift for marketers, tracking identity is tricky at the best of times with online/offline and multiple channels of engagement - but when the current methods of targeting, measurement and attribution get disrupted, its going to be extremely difficult to get identity right to deliver exceptional customer experiences whilst getting compliance right.
We have put our framework and initial results show promising measurement techniques including Advanced Neo-classical fusion models (borrowed from Financial industry, Biochemical Stochastic & Deterministic frameworks) and applied Bayesian and Space models to run the optimisations. Initial results are looking very good and happy to share our wider thinking thru this work with everyone.
Link to our framework:
Please suggest how would you be handling this environmental change and suggest methods to measure digital landscape going forward.
#datascience #analytics #machinelearning #artificialintelligence #reinforcementlearning #cookieless #measurementsolutions #digital #digitaltransfromation #algorithms #econometrics #MMM #AI #mediastrategy #marketinganalytics #retargeting #audiencetargeting #cmo
For Example, I have a South Carolina map comprising of 5833 grid points as shown below in the picture. How do I interpolate to get data for the unsampled points which are not present in 5833 points but within the South Carolina(red region in the picture) region? Which interpolation technique is best for a South Carolina region of 5833 grid points?
I have located the MECP using the algorithm developed by Harvey. Since the MECP is not a stationary point in the full 3N-6 dimensions of either of the PESs standard frequency calculation would be erroneous. Can anyone suggest how to perform frequency calculation at MECP?
I have a large time series with two variables ( In reality more dependent variable, but now focusing on only two) in which variable2 is dependent on variable1. Algorithm need to identify whenever variable2 differ from its ideal path in dependent to variable1. Would like to know any efficient machine learning method suited for this application ( Which can be adapted in future for multiple variable ) in Matalb .
Simple plotting and visually inspecting is laborious because of the large data set.
It's no longer a surprise to realize a wide gap between advances in academic research and practicality in Industry. This discussion is about exploring this gap for a particular domain, which is Time-Series Forecast. The topic has had great many research advances in recent years, since researchers have identified promises offered by Deep Learning (DL) architectures for this domain. Thus, as evident in recent research gatherings, researchers are racing to perfect the DL architectures for taking over the time-series forecast problems. Nevertheless, the average industry practitioner remains reliant on traditional statistical methods for understandable reasons. Probably the biggest reason of all is the ease of interpretation (i.e. interpretability) offered by traditional methods, but many other reasons are valid as well, such as: ease of training, deployment, robustness, etc. The question is: If we were to reinvent a machine learning solution solely for industrial applicability, considering the current and future industry needs, then what attributes should this solution possess? Interpretability, Manipulability, Robustness, Self-maintainability, Inferability, online-updatability, something else?
Suppose we have the following single nucleotide polymorphic variant:
Reference DNA sequence
S0 = G T G A C T G A G C C T
Variant DNA sequences
S1 = A T G A C T G A G C C T
S2 = G A G A C T G A G C C T
S3 = G T A A C T G A G C C T
S4 = G T G T C T G A G C C T
S5 = G T G A A T G A G C C T
S6 = G T G A C A G A G C C T
S7 = G T G A C T A A G C C T
S8 = G T G A C T G T G C C T
S9 = G T G A C T G A A C C T
S10 = G T G A C T G A G A C T
S11 = G T G A C T G A G C A T
S12 = G T G A C T G A G C C A
S13 = A T G A C T G A G C C T
S14 = T T G A C T G A G C C T
S15 = C T G A C T G A G C C T
S16 = G T G A C C G A G C C T
S17 = G T G A C A G A G C C T
S18 = G T G A C G G A G C C T
S19 = G T G A C C G A G C T T
S20 = G T G A C T G A G C A T
S21 = G T G A C T G A G C G T
Is there any computational algorithm suitable for this problems?
I noted that following these algorithms: Needle, Matcher, Stretcher, and Water, the results of pairwise comparison between S0 and any of S1 to S21 is either 91.70% (with global alignment) or 100% (with local alignment) similar or identical.
A unit matrix will lead to a digraph with all variables on the same level, thus defeating the purpose to find a structural relationship between them.
We developed a subpixel image registration algorithm for finding sub-pixel displacement and I want to test that against existing methods. I have compared that with subpixel image registration algorithm by Guizar et al. and also the algorithm developed by Foroosh et al. Does anyone knows any other accurate algorithm for subpixel image registration (preferably with an open-source code)?
Fiscal Frequency is simulated from a structure, to detect damage there are many vibration based algorithm for damage detection. However, most of them are written by Matlab code. Is there any explicit reference for algorithm development by Python for frequency response function (FRF) Analysis?
I would like to test the performance of a modified algorithm developed to solve a real-world problem that has these characteristics : (1)Discrete (2)Multi-Objective (3) Black-box (4)Large-scale.
How we can do this? and if there are no such test problems, is it sufficient to show its performance on the real-world problem only? (where the true Pareto Front is unknown)
We have some research works related to Algorithm Design and Analysis. Most of the computer science journals focus the current trends such as Machine Learning, AI, Robotics, Block-Chain Technology, etc. Please, suggest me some journals that publish articles related to core algorithmic research.
Hi, I have little experience with Genetic algorithm previously.
Currently I am trying to use GA for some scheduling where I have some events and rooms which must be scheduled for these event each event has different time requirements and there are some constraints on availability of rooms.
But I want to know are there any other alternatives for GA since GA is a little random and slow process. So are their any other techniques which can replace GA.
Thanks in advance.
We know that reflectance is influenced both by solar angle and satellite viewing angle, namely BRDF effects (Bidirectional Reflectance Distribution Function). But few researchers conducted BRDF corrections for water quality remote sensing during model devlelopment and intercomparison between two sensors. Whether or not this issue should be considered? How much accuracy can be improved for water quality estimation if BRDF was corrected?
I read some scientific papers and most of them are using data dependency test to analyse their code for parallel optimization purpose. one of the dependency test is that, Banerjee's test. Are there other tests that can provide better result testing data dependency? and is it hard to do a test on control dependency code, and if we can, what are some of the technique that we can use?
Everyone, at some point, has probably thought about the problem artificial intelligence(AI) may represent in the future when the rate of unemployment is rising at an alarming speed thanks to robots that do a much better job than any human ever did but the question is : how, when, and where will the impact of artificial intelligence hit hardest?!
I am searching for an implementation of an algorithm that constructs three edge independent trees from a 3-edge connected graph. Any response will be appreciated. Thanks in Advance.
Given a set of m (>0) trucks and a set of k (>=0) parcels. Each parcel has a fixed amount of payment for the trucks (may be same for all or may different for all) . The problem is to pick up the maximum number of parcels such that the profit of each truck is maximized. There may be 0 to k number of parcels in the service region of a particular truck. Likewise, a parcel can located in the service region of 0 to m trucks. There are certain constraints as follows.
1. Each truck can pick up exactly one parcel.
2. A parcel can be loaded to a truck if and only if it is located within the service region of the truck.
The possible cases are as follows
Case 1. m > k
Case 2. m = k
Case 3. m < k
As far as I know, to prove a given problem H as NP-hard, we need to give a polynomial time reduction algorithm to reduce a NP-Hard problem L to H. Therefore, I am in search of a similar NP-hard problem.
Kindly suggest some NP-hard problem which is similar to the stated problem. Thank you in advance.
I want to find existing algorithms which have developed to find similarities in non-English documents using word mover's distance.
I prefer python base answer.
I am doing some research on Fake News detection for this I would like to develop a measure that tells the difficulty level of the article. Actually I would like to analyze the webpage content.
Could someone give some good ideas to start with? I am looking for a Machine Learning kind of algorithms to develop the measure?
Note: I have gone through the indexes like ARI, SMOG, etc... but they are just based on number of words/sentences in the article and mostly they do not concentrate on number of complex words.
I have an algorithm for hamilton cycle and I worked on it for many years. Now I really think it is done and my paper is complete and clear. It can be understood. But my English is limited, I need your help.
please carefully read it and think. can it be understood? How to revise the expression?
first please read my remarks in the paper.
thank you very much!
I have written a DMD code, and I need to validate it. I need simple test problems, without involving to do any simulations, that I can reproduce with my code.
I want to use optimization techniques through simulation tools with control algorithm for developing a customised FDM 3D printer
I have encountered a difficulty and I am eagerly seeking your instructions and advice.
When resources are sufficient, i.e., resource constraints in master problem can easily be satisfied, the algorithm will convergent to a linear relaxation upper bound, see fig 3;
When resources are lesser, the algorithm will never convergent, see fig 1 and 2;
Can anybody tell me whether it is the "tail-off" or degeneration?
In the SIM reconstruction algorithm, it is necessary to separate the measurement results, but separation need to know the exact translation phase of illumination stripes. I wrote a phase calibration procedure according to the method of literature, but the calibration results are not accurate. I would like to know whether there is a large necessary to calibrate the phase? If necessary, what needs to pay attention in the calibration algorithm? How to remove the measurement error superimposed on the results?
I tried to screen for the receptors of a ligand. I got a list of 5K proteins, and lots of them are not in the plasma membrane, e.g either in the mitochondria or ER membrane. Just wondering if anyone know if there is a way to sort out the plasma membrane proteins?
We assume that all the channels to be Rayleigh fading channels, including the residue interference channel. So the SINR at the relay should be
where h_sr and h_rr are the channel gains of the S->R and R->R links, respectively. and N0 is the variance of the AWGN. residue interference are independent with AWGN.
The confusion is here, in some paper I have read thar the average SINR can be calculated as
but in my understanding, the average SINR should be
the result will be different. I am a little confused about that.
How we can make the mutation part of an algorithm adaptive? Is there any code for that I can understand the difference?
I am looking for a method to compare the dataset represented in blue with the one in red and I need to extract one single value from this comparison. The idea is to compare datasets generated with different combinations of parameters to a 'optimal' dataset and get a 'score' of each one so I can see which combination of parameters is the closest to the optimal model. I came up with a few options like Fréchet distance, Hausdorff distance and MSE but I don't know which one would work best for me.
Does anybody have any suggestion or another method that could work?
I'm developing a recommender system for movies using a matrix containing data of films found in DBpedia. However, my matrix contains a lot of missing information (e.g. some films does not contain the year attribute, others the writer etc). So my question is if is it possible to use an imputation algorithm taking into account the features of DBpedia, which is mainly built from a collaborative scenario and therefore prone to errors, and that my information is mainly textual (only the year attribute is numeric). My attributes are Name, Country, Directors, Producers, Starring, Writers, Year, and Subject. The missing data is denoted with a question mark (?). I've also attached an example of my matrix with only 100 films (the full matrix contains around 101000 films).
At present, software tools are used placement & routing in chip design. Still any new algorithm development is going on for routing?
I need to invert a symmetric banded hessian matrix H, which has 7 diagonals.
H^(-1) can be considered as banded, and thus, I do not need to compute the complete inverse matrix, but rather its approximation. (It could be assumed to have 11 or 13 diagonals for example.)
I am looking for a method which does not imply parallelization.
Is there any possibility to build such an algorithm with R, in linear time ?
Thank you for your help.
I am now working on an RLS algorithm which is used to estimate the eigen-frequency of vibration. Most of time, if the noise level is low, the filter works well. But if there are impulsive-like disturbance at filter input, then this causes large estimation offset, which takes long until recovery for the filter.
Does someone know which methods can help to avoid the erroneous disturbance? In the attached pic there is a sample.
Does anyone have an implementation of one of the four algorithms Pascal, Close, MaxMiner or Apriori?
I have an image transmitted to the input (pin type: cVideoPin ) of my ADTF plugin. I have also created a buffer that can hold the input image. I need to make the image available in this buffer be available to Opencv computation preferably through a Mat container.
I need suggestions on how to make the image in ADTF buffer compatible with opencv Mat.
Thanks in advance.
Does anyone know how to modify the order in a sample to modify Kendall's tau value(s) ? Let me clarify what I am looking for. Consider that we have N realizations of K random variables. Each realizations of the group of K variables is independent from the other ones. Inside this realization, the K variables might be independent or not, we do not really care. They even can follow different distributions. The question is the following : from any sample (size N x K) that we call M, can I exchange the places of M[i_1,1] with M[i_2,1], M[i_3,2] with M[i_4,2] and so on, possibly coming back to the first column with an interative algorithm to finally obtain as a result a new (rearranged) sample M' where tau[1,2] = first wanted value, tau[1,3] = second wanted value, and so on, i.e. can I get rearranged data to get a desired tau-matrix. If yes, any algorithm to suggest ?
I used Cholesky method to do the same thing to reach a desired Pearson-correlation matrix, but I have to admit I'm facing a wall on this issue right now.
Any help is welcome !
In order to test any event detection algorithm, it is common practice to compute a confusion matrix, in order to get performance parameters (e.g. sensitivity, and specificity).
It is well known that the confision matrix must be built (identifing and) counting the number of negatives/positives true/false detections, according to a pre-annotated dataset (Gold Standard).
Now, here is my question: how can i identify a true/false positive?
Let me show you a little example.
My algorith wants to detect QRS-Complex's peak, in order to get R-R intervals of a ECG recording. My pre-annotated dataset identify QRS complex not on QRS peak, but some samples before.
It is obvious that, testing my algorithm, i cannot identify a true positive detection "when my algorithm identify the same point as the gold standard", since there's always a time-shift between them. At the other side, all of my detections (because of the time-shift) would be identified as false positives (no time-coincidence with Gold Standard).
An easy solution would be to consider a "coincidence-window" around each point in the Gold Standard but, if i fix the problem with true/false positives, the problem with true/false negatives remains still unsolved.
Has anybody an idea? (Any reference would be great for me).
Thanks a lot.
I am working with 3 others to solve a real life problem for an remote Australian company. The problem is electricians or mechanical fitter travel daily to repair or maintain assets. However they're not aware there could be another near by task they could also do that meats their skill set. So we developed an app to do so. Now i am working on developing a formula for optimization of the daily trip.
I thought Hamiltonian circuit would be best because they have to return to base at the end of their shift. Also another constraint is they work 9 hours including travel time to and from base.
Also some of the tasks can take up to 1 or 2 or 3 or 6 or more hours to complete and the travel time could be 2 hours or more.
I thought they could go out and the jobs that take up the most amount of hours first and on their way back pick up shorter ones.
Could anyone suggest anything i could do better or use?
I have the latitude and longitude could anyone suggest how i could simulate them specially how i could represent 6 hours on a node?
The formula is attached as a picture, feel free to suggest anything.
I have results of approximation ratio of returned by different algorithms after running on benchmarks. Now i want to visualize these results as much better as i can so that outliers and other statistically important information are visible. In addition that the mechanism must be something which i can add in my article later on if i want to publish.
I need code in any language (preferably MATLAB, Java, C++, C#) regarding rule based classification using metaheuristic algorithm. Any help will be highly appreciated.
i working in searchable encryption and i want to develop algorithm for this use symmetric encyrption combine with attribute based encryption which use public key encryption how i can do that?