Analysis of Tata-1mg data for Covid-19 2nd wave prediction in India
Rajat Jain 1, Utkarsh Gupta 2, Sethuraman TV∗3, Rohan Sukumaran ∗4, Christin Glorioso MD
1Data Scientist, Tata-1mg 2Head of Data Science and AI, Tata-1mg 3Data Science Researcher,
Pathcheck Foundation 4Research Manager, Pathcheck Foundation 5Head of Research, Data
Informatics Center for Epidemiology, Pathcheck Foundation
Objective: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongoing Covid-19 pandemic,
which is having devastating effects around the globe. Identifying early indicators of case surges is pivotal for effective
pandemic preparedness and response. This paper looks at Tata-1mg users’ medicine and symptom search data in India and
studies its potential to provide early warning of upcoming waves in the current pandemic.
Methods and Materials: Tata-1mg is an online healthcare brand present in India, with 50 million monthly active users,
that allows users to search and order medicines. We segment different search terms with the help of clinical practitioners used
by the customers based on their association with illness and severity and then assess their correlation with reported Covid-19
case numbers for Indian cities.
Results: We found that the search terms relating to ﬂu/antiviral medication had the highest leading correlation among all
seven search tags tested. We also perform a granular, city-level cross-correlation analysis for 75 Indian cities. We show that
the search terms had up to an average of 19 days prior lead (ranging between 0, 40 days) with signiﬁcant Pearson correlations
(R=0.7, p<0.01) with reported Covid-19 case numbers for most cities.
Conclusion: We can use information from search data to formulate better healthcare policies to control the coronavirus
pandemic outbreak in the future as well as stock adequate resources. We highlight the ability of search trend data of online
pharmaceutical e-commerce platforms to serve as an early warning indicator for future waves.
Keywords: Infodemiology, Covid-19, Correlation Analysis, Epidemiology
The novel SARS-CoV-2 Coronavirus was ﬁrst identiﬁed in Wuhan City of China on December 31st, 2020. Since its ﬁrst
occurrence, the virus has spread throughout the world, impacting a majority of the population. Even after a year, the pandemic
is still raging. After an apparent drop in many countries, more infections and deaths are being reported due to the impact of
subsequent waves. India is one of many countries that have been impacted by the pandemic, particularly during the second
wave. As of July 1st 2021, the total number of cases were more than 30 million, while deaths was over 400,000. With
delayed (or partially ineffective) quarantining, and the inception of newer, more transmissible, variants, the 2nd wave had a
devastating impact on the health infrastructure as well as the livelihood of people in India (and globally). This resulted in lives
lost, economic instability, and a general sense of disharmony. To tackle the spread of Covid-19 cases in India, the government
established an organization called the Integrated Disease Surveillance Project (IDSP), whose aim is to strengthen/maintain a
decentralized IT- enabled laboratory for epidemic disease-surveillance. They monitor disease trends and detect and respond
to outbreaks in the early rising phase through trained Rapid Response Team(s). However, the system captures data only
when people access healthcare services . Using such direct sources of data that might be delayed or under-report the true
magnitude of cases makes it difﬁcult to estimate the actual magnitude of the spatial and temporal evolution of Covid-19 cases.
Over the past several years, numerous scientiﬁc studies have shown that user interactions with web applications generate
latent health-related signals, reﬂective of an individual as well as community level trends. Infodemiology (information epi-
demiology), introduced by Gunther Eysenbach, suggests using online data sources to inform public health and policy ,
as these approaches have been suggested to support the monitoring and forecasting during past outbreaks and epidemics
- such as Ebola, Zika , inﬂuenza, , measles and mental health . During the Covid-19 pandemic, there
has been an abundance of online activity on various platforms. Popular infodemiology tools like Google Trends, Twitter, and
Facebook can provide the information necessary to analyze the pandemic in different parts of the world. The major beneﬁt
of search/browsing (trends) data is that it has the potential to offer community level insights that current monitoring systems
are unable to obtain due to limited testing capacity , and conﬁnement measures that limit people from interacting with
healthcare services. Hence, search trend data can act as a supplemental surveillance tool for national and city level monitoring
of the pandemic. More importantly, unlike invasive testing or surveys that require people to ﬁll in information explicitly, these
search trends data are passively generated by people. Apart from proprietary company-speciﬁc insights, these signals provide
a more important secondary function, the ability to predict the trajectory of the pandemic. There has been a myriad of research
around this topic where Amaryllis Mavragani et. al. used Google Trends to compare case and death data for US states .
Samira Youseﬁnaghani et. al. used Google Trends along with Twitter data to establish correlations with case data for the US
Google searches can be used for exploration, knowledge gathering, and transaction related purposes. In contrast to search
trends, such as Google Trends, collected by general purpose search engines where the users can casually search for informa-
tion on a topic, searches on a medicine related search engine are closely linked with the intent of purchasing medicines. This
is particularly important in order to serve as a strong signal for early pandemic prediction. Furthermore, the data retrieved in
google trends is normalized over the selected period; thus, the exact queries volumes are unknown to third parties, thereby
limiting data processing and analytical capabilities. Also, the actual algorithm by which Google Trends detects query data is
undisclosed making it difﬁcult, if not impossible, to identify the causes of any phenomena. Further, the mainstream media
heavily inﬂuences Google search results, while a good indicator for any epidemic spike has to be inﬂuenced by personal need
. Hence, this paper explores alternative signals, namely search data from Tata-1mg (an online healthcare platform), for
predicting future case outbreaks.
Tata-1mg is one of India’s largest online healthcare platforms, where users can search for medicines and place orders for
prescriptive and non-prescriptive medication. During the second wave, India went through a supply crunch for even
normal drugs, such as paracetamol, in local pharmacies. This led online healthcare brands like Tata-1mg, to experience
an unprecedented growth of over 80% in the consumer base from January 2021 till June 2021. Consumers have increasingly
turned to online healthcare ﬁrms to check availability and order medicines, even critical ones, unavailable in their vicinity. The
company has eight major warehouses located across India, which serve critical as well as general direct-to-home medicine
delivery across the country. Appropriately stocking warehouses is a core initiative at Tata-1mg. It allows to not go out of stock
and lose on customer demand and maintain reasonable costs associated with an inventory such as holding fees, transportation
costs, and storage costs. Building better prediction techniques and detecting early warning signs through changes in user
search pattern are helpful strategies to stock an appropriate amount of life saving medications. Diagnostic lab tests and e-
consults with physicians can also be performed, giving information on the Covid-19 positivity rate in a region and providing
disease dynamics based on the history of patient illness. The healthcare platform is still a growing venture with almost 50
million unique active users per month across India, giving a signiﬁcant user base to derive conclusions on. Further, Tata-1mg
is serving orders in almost 1000+ cities in India, including Tier 1, 2, and 3 cities. This paper used this prior information
to analyze the lead/lag relationship between multiple search term categories (based on different stages of the disease) and
the ofﬁcial Covid-19 cases. We show that the search trend data can be used as an early indicator for the pandemic as well
as for alerting the administration about the rising demand for medicines. This would act as a supplementary data source to
appropriately stock up warehouses across the country to aid a proper supply of drugs in areas with increasing caseloads and
meet the supply-demand gap.
Users can search for drugs, such as paracetamol salt for fever, using the term "paracetamol" and will get all available medicine
suggestions in accordance with the term searched. The search term is then stored for each search and anonymized for ag-
gregated analysis on Andriod, iOS, and desktop applications, along with the city information from where the search term
originated. The recorded aggregated count of each search term can be used as a signal similar to Google trends and enable
a deep-down analysis. We only use and present aggregated results on search term count and location for maintaining user
anonymity, and no user-level analysis is performed in this paper. For this paper, we use the data from 1st January 2021 till the
end of May 2021. To further maintain the data privacy rules, we standardize the scale of the data to between 0 and 1.
Further, the search terms are divided into seven categories based on the symptoms/medication stages and severity clues
from medication, with the help of a panel of 4 clinical experts at Tata-1mg. The panel consisted of three general practitioners
and one internal medicine specialist treating Covid-19 patients on a daily basis since the start of the pandemic. The search
terms were selected after a unanimous consensus of the panel of 4 doctors and alignment with the guidelines provided from
the Covid-19 National Task Force / Joint monitoring group spearheaded by apex medical bodies in the country, including the
All India Institute of Medical Sciences (AIIMS) and Indian Council of medical research (ICMR).
During the initial stages of the second wave, the treatment protocol for managing Covid-19 patients was still evolving,
and continuous changes based on advancing evidence were being made in the treatment strategy. Here we deﬁne severity
based on the disease progression and medication used for the treatment at various stages. A guideline from the apex medical
institutes, AIIMS and ICMR, was ﬁrst published on 22nd April 2021 and revised on 19th May 2021, where the medications
and other supportive treatment guidelines were laid out. As mentioned above, a certiﬁed pool of medical practitioners
were consulted to classify the search terms based on practical clinical expertise and the trend of the second wave in India.
In the mild form of the Covid-19 disease, physicians generally advised on fever/ cough medications and immunity boosters
such as Zinc and other multivitamins. These drugs used for symptomatic relief or boosting the patients’ immunity are primar-
ily over-the-counter drugs and do not require a prescription. To treat the moderate cases, Fabiﬂu and other antiviral agents
and steroids are used, and for cases with symptoms of severe respiratory distress and/or damaged lungs - targeted respira-
tory medications and steroids like steroids Methylprednisolone and blood thinners are recommended. The medicines used to
manage moderate and severe cases of Covid-19 require a valid prescription from a registered medical practitioner. For each
particular search term, we derive all possible medicine types based on its primary use case linked to the search term and aggre-
gate the searches for each search type across the district as well as national level. We use the knowledge provided by clinical
practitioners during the second Covid-19 wave to build a list of search terms that we compare against the conﬁrmed cases data.
The case data is obtained from the covid19india website through their API. The API offers granular geographical units
data, which helps localize the search patterns at a district level. Further, the API provides data of daily cases, average 7-day
cases, along with deaths, vaccination data, and recovery statistics.
We fetch the data for search volumes, minimum and maximum search levels for seven search tags obtained. We compute the
Pearson correlation coefﬁcient (r) and the p-value, and plot cross-correlation plots at multiple time lag to adjudge the power
of multiple search tags to act as an early indicator of the pandemic waves, at a national as well as city level. We also compute
the early indication factor in terms of medication severity. The analysis was done on Python 3.8 and the Numpy was
used to calculate the correlations and signiﬁcance. r > 0.7is considered as a signiﬁcant correlation, and a p−value < 0.05
is considered statistically signiﬁcant. The P-value for the calculation of the Pearson correlation coefﬁcient was below 0.01 in
all mentioned results.
Search terms were classiﬁed in table 1 in terms of their severity. We have four mild, one moderate, and two severe search
terms based on the strength of medicines and at what stage of Covid-19 disease progression are they required. According
to governmental guidelines, Steroids and Nasal/ Respiratory medications can be classiﬁed as moderate as well as severe
Term Severity Prescription required min_searches max_searches avg_searches std
Early Symptoms Mild No 8823 58469 19296 11725
Vitamin and Minerals Mild No 10091 109056 30604 25375
Antibiotic Mild No 1912 9296 3784 1769
Fever Related Mild No 7069 52948 16697 10884
Flu/ Antiviral Moderate Yes 1066 was 9727 3732 1602
Nasal/Respiratory Moderate/ Severe Yes 1923 14571 4633 3124
Steroid Moderate/ Severe Yes 2979 62639 9516 10008
Table 1. Exploratory Data analysis on segregated search types based on physician’s recommendation for pan-India (1st Jan -
28th May ’21)
Term Search Terms
Early Symptoms Fever,Cough,Body Ache,Body Pain,Sore throat,taste,smell
Vitamin and Minerals A to Z, Multivitamins, Vitamins, Calcirol, Uprise-D3, Limcee, Celin, Zincovit, Zin-
conia, Zinc, Vitamin c
Antibiotic Azithral, Antibiotic, Augmentin ,Ceftum, Claribid, Zadocef
Fever Related Paracetamol , Dolo , Crocin, Calpol, Saridon, Lanol
Flu/ Antiviral Fabilﬂu,Fluguard,Antiﬂu,Fluvir, Remdesivir, Antiviral
Nasal/Respiratory Levolin, Respiratory distress ,Budecort,Duolin,Levolin, Karvol Plus, Seroﬂo, Bude-
Steroid MethylPrednisolone, Prednisolone, Blood Thinners, Medrol,Dexamethasone, Omna-
cort,Wysolone, Ecosprin,Clexaane, Decmax
Table 2. Search Terms used for each segregated category related to Covid-19 epidemic
depending on case to case basis. However, here we have classiﬁed them as severe based on the maximum severity limit for
which the drug can be used. Table 2 represents the search queries used for each classiﬁcation category in detail.
Results were observed on the normalized 7-day average search data across India for the 7 search tags mentioned in Table. 1 We
further group search terms into 2 categories, Mild/ Moderate that includes medication search categories: "Early symptoms",
"Fever related", "Vitamins/ Minerals", "Antibiotic" and "Flu"/ "Antiviral drugs"; while the severe category included "Nasal/
Respiratory" and "Steroid" medication.
Figure 1a and 1b represent the normalized PAN-India search term variations with the conﬁrmed Covid-19 case data. Mild
and moderate searches (Figure 1a) are leading signals compared to the case data with ﬂu/antiviral search terms showing the
maximum lead. A prescription is not required on purchase for most of this category. Also these medicines are majorly for
treating symptoms of Covid-19. The peak of severe search terms (Figure 1b) coincides with reported cases, showing no lead
Figure 2 below shows the cross-correlations for each search term with the reported Covid-19 case data. The horizontal axis
shows increasing lead times, from left to right, and we mark a horizontal line with the minimum correlation threshold of r=0.7.
We ﬁnd the highest lead, 20 days, in Fabiﬂu/ Antiviral searches, still with a signiﬁcant correlation (r = 0.7). Antibiotics, early
symptoms, and fever-related search terms also display approximately similar leads with 19, 18, and 18 days out, respectively.
As observed earlier, Severe search terms, including Respiratory and nasal distress medication along with steroids, are unable
to show a signiﬁcant lead due to the late onset of severe symptoms. Also, all the displayed Pearson correlations have a p-value
Figure 1. (a) Mild and Moderate search terms against the conﬁrmed daily Covid-19 cases for PAN-India (b) Severe search
terms against the conﬁrmed daily Covid-19 cases for PAN-India
City Level Analysis
We take 75 cities in terms of search volume from January 1st to May 28th at 1mg and replicate the above analysis to ﬁgure
out the best individual search terms and leads for these 75 cities in India. The 75 cities used in the analysis have been ﬁltered
based on a signiﬁcant search volume of Covid-19 related searches. All search terms mentioned in Table 1 are considered, and
the cities with the combined average search volume greater than 500 for the period of January to May, 2021 are selected for
this analysis. We plot ﬁgures similar to Figure 1 for the top 4 Indian cities concerning search volume, namely: New Delhi,
Mumbai, Bangalore, and Kolkata. We plot the pattern for both the mild/moderate and severe searches in Figure 3 and Figure 4.
Table 3 (in appendix) represents the maximum lead at which a signiﬁcant correlation was observed for each city. Further,
Figure 2. Cross Correlation plots for selected search terms for PAN-India analysis with conﬁrmed Covid-19 cases. Search
terms described in the ﬁgure are as follows: (a) Antibiotic (b) Flu/Antiviral (c) Fever Related (d) Early symptoms (e) Nasal/
Respiratory distress (f) Steroidal
the table also depicts speciﬁc search terms that can possibly serve as an early indicator of the next peak across cities. Cross-
correlation heat maps are also plotted (in appendix) with correlations with conﬁrmed cases for each of the 75 cities for every
search term. Out of 75 cities, almost 50 show a signiﬁcant lead of more than 2 weeks for one of the 5 moderate/ mild search
terms. As observed in the case of PAN-India, the selected cities which are able to capture different zonal information along
with variability in searches also present a lead in mild/ moderate search terms against the observed cases. Consistent with the
Figure 3. Mild/ Moderate search terms against the conﬁrmed daily Covid-19 case data for the following major cities in
India: (a) New Delhi (b) Mumbai (c) Kolkata (d) Bangalore
national level variation, severe searches present little to no lead in terms of early predictability.
Figure 4. Severe search terms against the conﬁrmed daily Covid-19 case data for the following major cities in India: (a) New
Delhi (b) Mumbai (c) Kolkata (d) Bangalore
As observed in Figure 5a and 5b, we ﬁnd that cities in the west and south are able to display average correlations of r >
0.7 as well as far out lead > 20 days. North as well as east coast, compared to south and west, is presenting a lower lead and
Figure 5. (a)Average cross-correlation across multiple cities in India upto 40 days (b) )Maximum days lead till which a
signiﬁcant (r>0.7) correlation is observed across multiple cities in India
We observe a signiﬁcant difference in the behavior of mild/ moderate medication as well as severe medicines. Medicines
related to providing symptomatic relief, mainly mild/ non-prescriptive medicines, along with Antivirals’ category, including
medicines such as Remdesivir and Fabiﬂu, preceded the peak by 15-20 days. Prescription medicines or more targeted medica-
tions that require more technical expertise from a physician coincided with the peak. Antivirals, while requiring a prescription,
are in the mild/ moderate category in terms of treatment and might have been prescribed in abundance to patients all across
India. Since a ﬁnal objective of this study can be the appropriate stocking of warehouses, from a supply chain perspective,
it takes almost 5-7 days for the medicine to be received at the warehouse from the supplier (known as "Lead Time" in the
supply chain ﬁeld) for mostly all the medicine types used for search term analysis for all warehouses. Thus having a higher
lead in search data, greater than almost twice the supply chain speciﬁc lead time, will give the ability to adequately stock
warehouses with critical life-saving medication beforehand.
Mild and moderate searches seem to lead the severe search terms by an average of almost 12-18 days. This time approxi-
mately lies in the combined date range of symptom onset, which is typically within four or ﬁve days after exposure. This
coupled with the time it takes for hospitalization after the development of initial symptoms, which is in the median range of 3
and 10.4 days (longest delay in the age group 20–60 years). The search pattern hence gives an approximate indication of
the dynamics in disease progression. An immediate spike in the correlation of the mild/ moderate search terms can also signal
to be prepared for severe medications like steroids, blood thinners, and medical oxygen in advance and notify authorities of a
possible spike in the requirement of hospital beds and increased medical staff. We can also see a similar trend for major Indian
cities, as shown in ﬁgure 3. This gives us insight into the nature of the Covid-19 disease and acts as a conﬁrmatory data point
for understanding the disease dynamics in multiple regions.
While analyzing the 75 Indian cities, a longer lead is observed in India’s western and southern regions compared to India’s
Northern and Eastern parts. This can be attributed to the fact that the wave ﬁrst hit parts of Maharashtra and Kerala (west and
south regions) during the early onset of the second wave during March. Due to the migration pattern and spread of the disease
and increased travel from these early affected parts to the later affected regions lead to disturbance in the disease pattern. Also,
the lockdown was implemented swiftly in other regions after the early markers in Western and Southern parts. Hence, it might
have affected the predictable nature of the disease along with increased media frenzy might be a reason for lower leads and
differences in the behavior of the regions.
Medical authorities do not recommend antibiotic medicines selected as a search term in this analysis across the world as
it leads to signiﬁcant development of antibiotic resistance. However, the use of excessive antibiotics and a spike in sales of
Hydroxychloroquine (HCQ) along with Antibiotics was seen in India in other studies apart from our own data. Since the
treatment and diagnostic methods were still evolving during this second wave and due to the prevalent use of the medication
in this particular part of the world, antibiotics have been included in the analysis. A possible limitation of the study is that
the penetration of e-commerce platforms in India is still low in remote rural areas and denser towards urbanized sections of
the country. Almost 65% of the countries population resides in rural areas, and hence the generalizations in the studies can be
made on limited geographies. Because of a large population, the urban population is still very signiﬁcant for India. The search
volume used for certain city geographies is limited, and results are based on observations limited to the Tata-1mg platform.
The case numbers considered for analysis can be underestimated due to possible reporting errors and the asymptomatic nature
of Covid-19. Despite the limitations, search data can be a guiding indicator towards predicting possible outbreaks. Similar
strategies and approaches have been demonstrated to help in being better prepared in a myriad of previous works. The
approach can also be utilized by other scaled online medical and healthcare platforms on their data to build even a more robust
system and ensure being better prepared for another Covid-19 wave when and if it arrives.
FUTURE AREAS OF WORK
An early warning system can be built using similar search trends data for predicting the next wave in a region. Building
infrastructure to alert administrative authorities as well as planning the implementation of a lockdown to curb the spread of
the disease can be a possible use of the study.
Improving the existing supply chain from the Tata-1mg perspective and allocating life-saving medications, and stocking
oxygen before the next wave might hit can be another use case of this study. Prediction models trained on the past searches
and running Covid-19 cases in multiple cities and searches used as a leading feature in a time series model can automatically
detect spikes in these search terms. This can lead to accurate demand prediction and stocking the appropriate warehouses in
times of shortage and dearth of critical medication.
Apart from medical searches and doorstep delivery, 1mg also conducts multiple lab tests, including RT-PCR and Antigen
tests. There is also an abundance of prescription data, both handwritten as well as digitized, since Tata-1mg receives more
than 10 million prescriptive as well as non-prescriptive orders annually. A prescription is a doctor’s order which stipulates the
administration of drugs in the speciﬁed amount, duration, and frequency, and contains details of the patient such as name, age,
and gender, and also the details of the doctor who writes the prescription. This data can be anonymized, aggregated, and
analyzed for different demographics. Leveraging this information can provide insights on the spread of Covid-19, the effect
of new variants, symptom progression, trajectory, and the Spatio-temporal impact of the virus on various age groups/ gender.
This study can aid in ﬁghting the pandemic to the best of our ability and prevent further loss of life.
 Mavragani A. Tracking COVID-19 in Europe: Infodemiology Approach. JMIR Public Health Surveill. 2020
Apr;6(2):e18941. Available from: http://publichealth.jmir.org/2020/2/e18941/.
 John Hopkins University. NEW COVID-19 CASES WORLDWIDE;. https://coronavirus.jhu.edu/data/new-cases.
 Ministry of Family Health and Welfare, Govt of India. Integrated Disease Surveillance Programme;. https://idsp.nic.in/.
 Venkatesh U, Gandhi P. Prediction of COVID-19 Outbreaks Using Google Trends in India: A Retrospective Analysis.
Healthcare Informatics Research. 2020;26:175 – 184.
 Lampos V, Moura S, Yom-Tov E, Edelstein M, Majumder M, McKendry RA, et al. Tracking COVID-19 using online
search. CoRR. 2020;abs/2003.08086. Available from: https://arxiv.org/abs/2003.08086.
 Mavragani A, Gkillas K. COVID-19 predictability in the United States using Google Trends time series. Scientiﬁc
reports. 2020 Nov;10(1):20693–20693. 33244028[pmid]. Available from: https://pubmed.ncbi.nlm.nih.gov/33244028.
 Mavragani A. Infodemiology and Infoveillance: Scoping Review. J Med Internet Res. 2020 Apr;22(4):e16206. Available
 Bernardo T, Raji´
c A, Young I, Robiadek KM, Pham M, Funk J. Scoping Review on Search Queries and Social Media
for Disease Surveillance: A Chronology of Innovation. Journal of Medical Internet Research. 2013;15.
 Eysenbach G. SARS and Population Health Technology. J Med Internet Res. 2003 Jun;5(2):e14. Available from:
 van Lent LG, Sungur H, Kunneman FA, van de Velde B, Das E. Too Far to Care? Measuring Public Attention and Fear
for Ebola Using Twitter. J Med Internet Res. 2017 Jun;19(6):e193. Available from: http://www.jmir.org/2017/6/e193/.
 Farhadloo M, Winneg K, pui Sally Chan M, Jamieson KH, Albarracín D. Associations of Topics of Discussion on Twitter
With Survey Measures of Attitudes, Knowledge, and Behaviors Related to Zika: Probabilistic Study in the United States.
JMIR Public Health and Surveillance. 2018;4.
 Mavragani A, Ochoa G. The Internet and the Anti-Vaccine Movement: Tracking the 2017 EU Measles Outbreak. Big
Data and Cognitive Computing. 2018;2(1). Available from: https://www.mdpi.com/2504-2289/2/1/2.
 Du J, Tang L, Xiang Y, Zhi D, Xu J, Song HY, et al. Public Perception Analysis of Tweets During the 2015 Measles
Outbreak: Comparative Study Using Convolutional Neural Network Models. J Med Internet Res. 2018 Jul;20(7):e236.
Available from: https://doi.org/10.2196/jmir.9413.
 McClellan C, Ali MM, Mutter R, Kroutil L, Landwehr J. Using social media to monitor mental health discussions
evidence from Twitter. Journal of the American Medical Informatics Association. 2016 10;24(3):496–502. Available
 Roser, M Ritchie, H Ortiz-Ospina, E Hasell, J. Statistics and Research-Coronavirus Pandemic (COVID-19);. https:
 Youseﬁnaghani S, Dara R, Mubareka S, Sharif S. Prediction of COVID-19 Waves Using Social Media and Google
Search: A Case Study of the US and Canada. Frontiers in Public Health. 2021;9:359. Available from: https://www.
 Lampos V, Majumder MS, Yom-Tov E, Edelstein M, Moura S, Hamada Y, et al. Tracking COVID-19 using online
search. npj Digital Medicine. 2021 Feb;4(1):17. Available from: https://doi.org/10.1038/s41746-021-00384- w.
 Tata-1mg. Tata-1mg;. https://1mg.com.
 TimesOfIndia. TOI;. https://timesoﬁndia.indiatimes.com/city/delhi/panic-buying-and-lack-of-supply- causing-
 MOHFW. MOHFW;. https://www.mohfw.gov.in/pdf/COVID19ClinicalManagementProtocolAlgorithmAdults19thMay2021.
 Python org. python3.8;. https://www.python.org/downloads/release/python-380/.
 numpy org. numpy;. https://numpy.org.
 BusinessStandard. Antiviralsales;. https://www.business- standard.com/article/companies/fabiﬂu-numero-uno-in-
 HarvardHealthedu. Harvardhealth;. https://www.health.harvard.edu/diseases-and-conditions/if-youve-been-exposed-
 Faes C, Abrams S, Van Beckhoven D, Meyfroidt G, Vlieghe E, Hens N, et al. Time between Symptom On-
set, Hospitalisation and Recovery or Death: Statistical Analysis of Belgian COVID-19 Patients. International jour-
nal of environmental research and public health. 2020 Oct;17(20):7560. PMC7589278[pmcid]. Available from:
 Sulis G, Batomen B, Kotwani A, Pai M, Gandra S. Sales of antibiotics and hydroxychloroquine in India during the
COVID-19 epidemic: An interrupted time series analysis. PLOS Medicine. 2021 07;18(7):1–18. Available from:
 Gupta M, Soeny K. Algorithms for rapid digitalization of prescriptions. Visual Informatics. 2021. Available from:
City Search Type Maximum searches per day Max-day Signiﬁcant lead City Search Type Maximum searches per day Max day-Signiﬁcant lead
Nashik antibiotic 180 40 Kolhapur ﬂu 24 16
Jalgaon fever 446 40 Durg vit 284 16
Vadodara symptom 276 40 Faridabad antibiotic 204 16
Buldhana fever 234 39 Prakasam steroid 118 16
Pune antibiotic 424 37 Chennai fever 938 15
Ahmednagar fever 452 37 East Godavari steroid 210 15
Amravati fever 220 36 West Godavari steroid 28 15
Indore antibiotic 262 33 Ranchi antibiotic 160 15
Beed vit 216 33 Chittoor vit 194 14
Satara fever 144 31 Mysore vit 362 14
Solapur symptom 476 31 Jodhpur fever 238 14
Surat symptom 468 31 Varanasi antibiotic 188 14
Ludhiana fever 168 28 Anantapur nasal 158 13
Bhopal antibiotic 256 27 Palghar vit 94 13
Kolkata ﬂu 110 26 Lucknow antibiotic 680 12
Ahmedabad fever 904 25 Kannur vit 128 12
Nagpur antibiotic 438 24 Raipur fever 606 12
Yavatmal vit 182 24 Srikakulam fever 118 12
Latur vit 244 23 Kurnool nasal 216 12
Wardha fever 84 23 Kanpur antibiotic 210 12
Thane antibiotic 202 22 Ernakulam fever 144 11
Visakhapatnam fever 742 21 Coimbatore vit 586 11
Nanded vit 388 21 Kozhikode vit 216 10
South 24 Parganas fever 196 21 Hassan vit 90 10
Howrah fever 466 20 Allahabad fever 882 10
Mumbai fever 2134 19 Thrissur vit 218 9
Jaipur ﬂu 54 19 Palakkad vit 186 9
Guntur fever 298 19 Erode vit 148 9
Bangalore ﬂu 178 18 Madurai vit 280 9
Patna antibiotic 502 18 Kottayam vit 124 8
Sangli vit 226 18 Tiruchirappalli vit 210 6
Chandrapur fever 152 18 Kollam vit 110 5
New Delhi ﬂu 1220 17 Pathanamthitta symptom 56 3
North 24 Parganas fever 316 17 Tiruppur vit 88 3
Bhubaneshwar fever 492 17 Malappuram antibiotic 54 0
Nellore fever 218 17 Alappuzha antibiotic 26 0
Gurugram antibiotic 744 16 Kasaragod antibiotic 36 0
Dehradun fever 492 16
Table 3. Multiple Indian cities and data for maximum day of signiﬁcant (r>0.7) leading correlation with conﬁrmed cases
data in the mentioned search type
Figure 6. Pearson correlation heat map between conﬁrmed cases and search volume of Antibiotic medicines for 75 Indian
Figure 7. Pearson correlation heat map between conﬁrmed cases and search volume of Fever medicines for 75 Indian cities
Figure 8. Pearson correlation heat map between conﬁrmed cases and search volume of Vitamin/ Mineral supplements for 75
Figure 9. Pearson correlation heat map between conﬁrmed cases and search volume of Early symptom medication for 75
Figure 10. Pearson correlation heat map between conﬁrmed cases and search volume of Flu/ Antiviral medicines for 75
Figure 11. Pearson correlation heat map between conﬁrmed cases and search volume of Nasal/ Respiratory distress
medicines for 75 Indian cities
Figure 12. Pearson correlation heat map between conﬁrmed cases and search volume of Steroidal medicines for 75 Indian