Enhancing Data Ecosystem Connectivity: MISSION KI Develops Innovative Dataset Search Engine
Artificial intelligence (AI) relies on high-quality data for effective training and accurate predictions. Despite the vast amounts of data generated daily, only a fraction is available in a curated and usable form. A significant challenge lies in the lack of a search function that can purposefully look for datasets across various data spaces and portals. Additionally, datasets often lack quality descriptions, known as dataset profiles.
MISSION KI addresses these challenges by developing the so-called Landing Page, an innovative search engine that enables comprehensive searches across datasets from public and private data portals and spaces while simultaneously analyzing data quality. A first demonstrator of the Landing Page is being presented today at the “Data Markets 2024” conference hosted by the Federal Ministry for Digital and Transport (BMDV) in Berlin and is accessible via the following link: Daseen - Large Dataset Search Engine
Moreover, MISSION KI, in collaboration with start-ups beebucket, nexyo, eXXcellent solutions, and deltaDAO, is developing a decentralized service for the automatic cataloging of data quality. This "Extended Dataset Profile Service" (EDPS) is based on an open-source software solution and is centrally accessible. The new service allows data providers to automatically catalog and curate their data from various sources, making it discoverable and evaluable through standardized metadata. Data quality is transparently measured using industry-relevant standards. Once cataloged and equipped with dataset profiles, data can be found by users—either manually or programmatically—across data spaces and portals using standardized metadata, without requiring direct access to the data itself. This enables users to efficiently select the most suitable data.
Manfred Rauhmeier, Chairman of the acatech Foundation and Secretary of the acatech Coordination Committee:
“The new Landing Page, combined with dataset profiles, is a significant step toward improving access to diverse data spaces and leveraging high-quality data for AI model training. It builds the necessary trust for secure data sharing between organizations, paving the way for a genuine data marketplace. This enhanced data foundation opens up new innovative business models for German and European companies.”
Florian Mauer-Endler, Managing Partner at beebucket:
“Trustworthy AI requires high-quality, well-suited data. With the development of these two services, we are creating an efficient foundation for a federated and legally secure data ecosystem, that seamlessly integrates with and complements the existing landscape of data portals and spaces. At the same time, we simplify the processes of curation, cataloging, and data sharing for providers, as well as the search and procurement of tailored datasets for users.”
Both data-providing and data-using companies, along with operators of data spaces, portals, and other data ecosystems, will benefit from these software solutions. Data providers can improve the discoverability of their datasets through dataset profiles, attracting more customers. Data users gain the ability to efficiently search across data spaces for datasets suited to training AI models. Lastly, operators of data spaces and portals benefit from increased visibility and reach, as new customer groups are directed to their platforms.