Harvard's Efforts on Ethical AI, Google's Project Astra, Large Concept Models (LCMs), and More
The AI4India Weekly #31
This week, we delve into India’s transformative journey in the AI landscape, highlighting pioneering efforts like the #DataDaan initiative, which champion inclusivity and grassroots innovation. From breakthroughs in generative AI and longevity research with AgeXtend to Meta AI’s concept-centric LCMs and Google’s multimodal assistant, Project Astra, global advancements are shaping the future of AI. Meanwhile, fostering ethical frameworks, and leveraging high-performance computing platforms underscore its commitment to bridging innovation with diversity, sustainability, and societal impact.
Harvard's Public-Domain Dataset: A New Chapter in Ethical AI Development
Harvard University’s Institutional Data Initiative, funded by Microsoft and OpenAI, has announced a groundbreaking dataset of nearly one million public-domain books to advance ethical AI development. This high-quality collection, sourced from the Google Books project, includes diverse works spanning centuries, languages, and genres. The initiative aims to democratize access to AI training materials, leveling the playing field for researchers and smaller AI companies who lack the resources of tech giants. The dataset is positioned as a foundational resource for creating language models while avoiding the contentious use of copyrighted material—a practice under intense legal scrutiny.
This project aligns with a broader movement toward public-domain AI training datasets, including the French startup Pleias’ Common Corpus, which has seen widespread adoption and compliance with the EU AI Act. By supporting open, ethical AI development, these initiatives challenge the argument that scraping copyrighted content is essential for building advanced AI systems. Harvard’s effort, complemented by collaborations with institutions like the Boston Public Library, underscores a shift toward transparent and inclusive AI practices that prioritize public benefit and intellectual property ethics.
AgeXtend: Screens 1.1 billion Molecules, Achieves Anti-Ageing Breakthrough
Researchers at IIIT-Delhi have developed AgeXtend, an AI-driven platform revolutionizing the search for molecules with geroprotective, or anti-aging, properties. Published in Nature Aging, the study screened over 1.1 billion compounds in two years, identifying a fraction with potential longevity-promoting effects. AgeXtend predicts and validates molecules using biological models like yeast, C. elegans, and human cells, while offering insights into their mechanisms of action—a key feature that sets it apart from conventional tools.
This "discovery engine" has already confirmed the benefits of known compounds like metformin and taurine, emphasizing its accuracy and potential. The largest study of its kind, AgeXtend scanned compounds from diverse sources, including commercial drugs, ayurveda, and FDA-approved molecules. Open-sourced for researchers, the platform’s data and Python package are accessible to foster collaboration, with pharma companies being invited to explore viable candidates further. This innovation promises to accelerate aging research and guide the development of treatments for healthier, longer lives.
Harnessing AI for India's Future: Insights from Alok Agrawal on Policy Musings
In the latest episode of Policy Musings, Alok Agrawal, Co-Founder of AI4India.org, explores the transformative potential of artificial intelligence (AI) in shaping India’s future. He discusses the rapid rise of generative AI tools like ChatGPT, which evoke both fascination and fear, while examining the ethical challenges, data privacy concerns, and employment shifts brought about by AI. Agrawal highlights India’s Data Protection Act and its impact on AI governance, calling for clearer regulatory frameworks to ensure responsible innovation. He also introduces the "Data Daan" campaign, a pioneering initiative to democratize access to free data for AI research, aimed at making AI more inclusive and beneficial for society.
The conversation emphasizes the government’s pivotal role in fostering an AI ecosystem that balances innovation with public safeguards. Agrawal delves into AI's potential to address accessibility challenges, create new employment avenues, and drive economic growth. He also discusses the geopolitical stakes of AI dominance and the importance of ethical practices in shaping India's global leadership in AI. This episode is essential listening for anyone seeking a deeper understanding of AI's evolving role in India’s development.
Meta AI Introduces Concept-Centric Models for Next-Gen Language Understanding
Meta AI has unveiled Large Concept Models (LCMs), a transformative approach to language processing that moves beyond traditional token-based methods. By operating in a high-dimensional embedding space called SONAR, LCMs process semantic concepts like sentences and ideas, enabling language- and modality-agnostic modelling across 200+ languages and multiple data types, including text and speech.
With hierarchical architecture and diffusion-based generation, LCMs improve coherence, scalability, and efficiency, addressing challenges of long-context understanding and multilingual applications. Experimental results show LCMs outperforming baseline models in multilingual summarization and novel tasks like summary expansion, highlighting their strong zero-shot generalization and adaptability. As Meta AI continues refining this innovative framework, LCMs promise a scalable and versatile future for AI-driven communication.
Google’s Project Astra: The Future of Multimodal AI Assistance
Google DeepMind has unveiled Project Astra, an ambitious AI-powered "universal assistant" leveraging the capabilities of Gemini 2.0, its latest multimodal large language model. Astra integrates text, speech, image, and video inputs, combining these with Google’s tools like Search, Maps, and Lens to assist users in real-world tasks. Demonstrations showcased Astra reading recipes, identifying compatible wine pairings, and answering art-related queries, offering a glimpse of its potential as a seamless AI assistant. Gemini 2.0, the backbone of Astra, boasts advancements in processing efficiency, zero-shot generalization across languages and modalities, and a hierarchical architecture for long-range context reasoning.
While Astra’s ability to recall past interactions and interpret multimodal inputs marks a significant leap, challenges remain. The technology, still glitchy, requires refinement to become consumer-ready, and Google has yet to announce a release date. Researchers highlight the need for transparency in how Astra functions, particularly regarding privacy and user data. With its ability to merge modalities and enable intuitive interactions, Astra signals a transformative step in AI-driven assistance, potentially becoming a defining application in the generative AI landscape.
GenCast: Google’s Advanced AI Model Redefines Weather Forecasting
Google’s GenCast, a state-of-the-art AI ensemble model, is revolutionizing weather forecasting by offering unprecedented accuracy and efficiency in predicting weather scenarios, including extreme events, up to 15 days in advance. Unlike traditional deterministic models, GenCast generates probabilistic ensemble forecasts, providing a range of possible weather outcomes with associated probabilities. Trained on four decades of high-resolution data, GenCast consistently outperforms the leading operational forecasting system, ECMWF’s ENS, in accuracy and reliability, particularly for extreme conditions like heatwaves, high winds, and cyclones.
Leveraging diffusion-based AI technology, GenCast processes data with remarkable efficiency, producing 15-day ensemble forecasts in just 8 minutes using a single TPU. It excels in predicting tropical cyclone tracks and wind power outputs, enhancing disaster preparedness and renewable energy planning. As part of Google’s broader suite of AI-driven weather models, GenCast bridges AI and traditional meteorology by combining the strengths of both. Released as an open model with code and weights, GenCast invites collaboration with researchers, meteorologists, and industries to advance climate understanding and societal resilience.
In the News
Meta Introduces Meta Video Seal: Meta Video Seal has been introduced to apply imperceptible watermarks to AI-generated videos, aiming to combat the rise of deepfakes. Released as open-source, it integrates into existing software and provides a more robust solution for detecting AI-generated content. Read More
Mawari DePIN Transforms Immersive Experiences: The platform utilizes spatial streaming technology to make the 3D internet and immersive AI-driven experience more accessible, scalable, and cost-effective. It addresses challenges like high bandwidth demands and infrastructure limitations. The platform has successfully demonstrated real-world applications, including AI-powered digital humans and live augmented reality events. Read More
5 Ways GenAI Can Transform Clinical Trials: Generative AI (genAI) has the potential to revolutionize clinical trials by improving efficiency, reducing costs, and enhancing trial data representation. It can transform key areas such as trial design, patient recruitment, data analysis, and regulatory submission. However, challenges like data sharing, regulatory complexities, and trust issues must be overcome to unlock its full potential. Read More
#DataDaan - Donate for a Digital India
Aligned with the IndiaAI mission and the efforts from MeitY to ensure the availability of AI-usable data across govt, the #DataDaan campaign by AI4India.org is inviting participation from individuals and organizations to contribute valuable data for AI development in India. This initiative aims to gather diverse datasets to enrich the national data repository, supporting innovation across various sectors. We encourage everyone to read the detailed white paper on the campaign - https://bit.ly/DataDaan and fill out the form - http://bit.ly/3XfxD3H to contribute to this transformative effort.
We’ve also created a podcast on #DataDaan using Google’s NotebookLM, which transforms documents into engaging audio conversation. Listen to the podcast below!
This concerted push to make AI work for India highlights how data accessibility can unlock the potential for large-scale socio-economic transformation, driving innovation, and ensuring India’s competitive edge in the AI-driven global economy.
NOTE: The views expressed by the authors are their own. AI4India as a forum does not endorse any comments on specific brands, products, platforms or companies.
Join our AI4India.org forum to be a part of the AI revolution in India by visiting our site now.
Follow us on our X and LinkedIn to receive interesting updates and analysis of AI-related news