India’s AI future depends on data—specifically, high-quality, localized datasets that reflect the country’s linguistic, cultural, and socio-economic diversity. With Prime Minister Modi highlighting the need for bias-free data at the AI Action Summit in Paris, and initiatives like Data Daan gaining momentum, India is taking bold steps to democratize AI development. In this edition, we explore how government policies, industry collaboration, and global recognition are driving India’s push for AI sovereignty.
The Future of AI in India: Building Localized Data Sets for a Digital Revolution
As India accelerates its AI journey, one critical aspect stands at the heart of its success: data. In a rapidly digitizing world, AI systems are only as effective as the data that powers them. Recognizing this, Prime Minister Narendra Modi, in his address at the AI Action Summit in Paris (2025), emphasized the importance of building "quality data sets free from biases" and ensuring that AI models are trained on locally relevant data to be effective and useful.
India’s AI aspirations cannot rely solely on datasets generated in Western or Chinese contexts. The nation's linguistic diversity—22 official languages and hundreds of dialects—demands AI models that understand and cater to local populations. Similarly, India’s unique economic, social, and environmental conditions necessitate AI solutions trained on Indian data. This approach is fundamental for developing robust AI models that address challenges in sectors such as agriculture, healthcare, education, and governance.
Bridging the Data Gap with Policy and Innovation
Minister Jitendra Singh, in his National Science Day address, reiterated that India's approach to AI must be deeply rooted in localized data, ensuring that innovation does not exclude the very people it is meant to serve. He highlighted the importance of public-private partnerships in creating open-access data repositories that allow researchers and startups to build foundational models tailored to India’s needs.
This sentiment was echoed by Amitabh Kant, India’s G20 Sherpa, in his recent interview with StratNews Global. Kant underscored that India’s strength lies in its demographic advantage and digital infrastructure, but AI development must be democratized. He argued that opening government data for Indian startups, academia, and research institutions is crucial for fostering an AI ecosystem that is both competitive and locally relevant.
In the most recent Mann ki Baat episode, Prime Minister Modi has also stressed the transformative role of AI in key sectors such as healthcare, agriculture, and education, stating that AI is “writing the code for humanity” in this century. He highlighted India’s ability to harness AI at the grassroots level, citing the work of Thodasam Kailash, a schoolteacher in Telangana’s Adilabad, who is using AI tools to preserve tribal languages. By composing songs in languages like Kolami, Kailash is ensuring linguistic heritage is documented and remains accessible to future generations. Such innovative applications showcase how localized AI models, trained on India-specific data, can drive social progress and cultural preservation.
Additionally, initiatives like Bhashini, India’s AI-powered language translation platform, continue to break linguistic barriers by making digital content accessible in multiple Indian languages. This aligns with India’s broader vision of inclusive AI adoption, ensuring technology reaches and benefits citizens across all regions.
The Role of Indian Startups and Initiatives
Shashi Shekhar Vempati, co-founder of AI4India, has been vocal about the need for AI models that go beyond Generative AI and Large Language Models (LLMs). Speaking at the National Science Day Lecture at Vigyan Bhawan, he highlighted how Indian startups are already pioneering AI applications tailored to local needs—whether it be AI-driven analytics for agriculture, computer vision for manufacturing, or multimodal AI for regional languages.
At the Indian Society of Advertisers (ISA) CEOs Conference, Vempati outlined how AI is shaping the future of business and why India must chart its own course by investing in domain-specific models trained on Indian datasets. He stressed that while global tech giants dominate LLMs, India's AI advantage will come from real-world applications that solve local challenges.
A key challenge facing Indian AI development is the lack of readily available, high-quality datasets for training and validation. Many AI models today are trained on predominantly English-language data sourced from the internet, making them less effective in Indian contexts where linguistic, cultural, and demographic diversity play a crucial role. Indian startups and researchers have called for more structured efforts to curate and share local datasets to ensure that AI solutions can address the specific needs of Indian users.
Data Daan: A National Movement for Data Sharing
One initiative that is driving this vision forward is Data Daan, an effort spearheaded by AI4India to encourage businesses, institutions, and individuals to contribute anonymized datasets for AI training. Inspired by the need to build India-specific AI models, the Data Daan movement aims to make diverse and representative datasets available to researchers and developers while maintaining stringent privacy safeguards. The initiative focuses on datasets across multiple domains, including agriculture, healthcare, education, governance, and linguistics, ensuring that India’s AI ecosystem has the necessary resources to build truly effective solutions.
With AI increasingly influencing decision-making processes across industries, Data Daan seeks to create a culture of responsible data sharing. By making high-quality, domain-specific data publicly accessible, this initiative supports the growth of AI applications that cater to India’s unique challenges, from improving crop yield predictions for farmers to enhancing natural language processing capabilities for regional languages.
Global Recognition and the Path Forward
As India positions itself as a global leader in AI, the success of these efforts will hinge on widespread collaboration between the government, private sector, academia, and the startup ecosystem. The push for localized datasets has already begun to gain international attention. A recent episode of the Top 5 at 5 podcast by BFM Media featured Khalil Nooh, CEO of Mesolitica, who highlighted India’s Data Daan campaign as a significant effort in AI sovereignty and localized dataset development.
About the episode, Nooh noted on X, “Apart from talking about DeepSeek, took the opportunity to highlight AI4India’s effort with ‘DataDaan,’ which is something I’ve been promoting locally under our AI Sovereignty agenda — a central data repository for Malaysia LLM training." This acknowledgment underscores how India's approach to open-access data repositories is resonating beyond its borders, inspiring similar initiatives in countries looking to build representative AI models suited to their local needs.
By fostering data-driven collaborations across national and international boundaries, India is paving the way for an AI ecosystem that is inclusive, ethical, and sustainable. To explore these insights and understand how Data Daan is shaping the AI landscape globally, listen to the full podcast here: Podcast Link.
#DataDaan - Donate for a Digital India
Aligned with the IndiaAI Mission and MeitY’s efforts to ensure the availability of AI-usable data, the #DataDaan campaign by AI4India is now live on DataDaan.org. This initiative invites individuals and organizations to contribute valuable datasets, enriching India’s AI ecosystem and driving innovation across sectors. The platform provides a streamlined process for data contribution, ensuring responsible and impactful AI development. Visit DataDaan.org to explore the initiative and be part of this transformative effort.
NOTE: The views expressed by the authors are their own. AI4India as a forum does not endorse any comments on specific brands, products, platforms or companies.
Join our AI4India.org forum to be a part of the AI revolution in India by visiting our site now.
Follow us on our X and LinkedIn to receive interesting updates and analysis of AI-related news