Data Set Search Engine
Making distributed data visible and usable
The Dataset Search Engine is an open-source platform that allows you to make distributed datasets discoverable, assessable, and interoperable – all without data migration. The modular system is suitable for different data spaces and industries and can be tailored to your specific requirements.
The Dataset Search Engine helps you find high-quality datasets – securely, transparently, and interoperably.
The Dataset Search Engine enables sovereign data exchange.
The Dataset Search Engine provides the technological foundation to make data visible, comparable, and interoperable – regardless of industry, infrastructure, or system landscape. This modular software solution consists of two core components that work seamlessly together to make distributed data visible and usable:
Extended Data Set Profile Service (EDPS)
Automatically generates standardized dataset profiles with comprehensive quality metrics – directly at the source, without the original data ever leaving its environment. The process follows the compute-to-data principle.
Decentralized Search Engine
Enables precise searches across different data spaces and portals based on the EDPS profiles. The data always remains under the control of the respective providers.
Functional Overview
Profile Generation:
Data providers use the EDPS to create standardized dataset profiles – without sharing the data itself.
Publication:
The profiles are made publicly accessible via the federated infrastructure.
Search & Evaluation:
Users can browse across different data spaces and compare profiles based on uniform quality criteria.
Access Request:
Once a suitable profile is selected, a formal access request is submitted to the respective data space.
Example of a Structured Dataset Profile
The structured quality indicators allow potential data users to accurately assess the suitability of datasets – fully automated and without accessing the original data.
Structural
• Attributes & Data Types (validation of data type consistency)
• Attribute Consistency (completeness of values across rows and columns)
Analytical
• Significant Variance (examination of differing distributions)
• Numerical Analysis
• Text/String Analysis
Temporal
• Time Attributes (temporal coverage from/to)
• Temporal Frequency (regularity of data points and gap analysis)
Access & Data Protection
• Open or Closed Access
• Personal Data with links to data processing agreements
Processing State
• Original Data (unaltered raw data)
• Processed Data (cleaned/transformed data with change logs)
• Refined Data (optimized AI training datasets)
• AI/ML Result Data (algorithm-generated data with references to training datasets)
Special
• Geolocation (detection of geographic information)
• EDP Quick View (access to summary data science information)
Metadata – Automated and Domain-Specific
The EDPS automatically generates metadata and can be precisely tailored to your specific domain. Domain experts can integrate their own analysis methods to make relevant quality indicators visible.
General Data Types
• Structured Data
• Documents (PDF, Word, Excel)
• Graph Data
• Videos & Images
• Audio Data
Domain-Specific Formats
• Medical Imaging Data
• Geo-JSON
• Scientific Formats
• IoT Sensor Data
The Dataset Search Engine in Action
DASEEN is the first implementation of the Dataset Search Engine – a federated search engine for datasets from various data spaces. It demonstrates how distributed data sources can be made searchable in a secure, transparent, and comparable way – without any data migration.
Already today, data portals such as GovData and Mobilithek, as well as providers like BASt, Autobahn GmbH, Toll Collect, and the City of Konstanz, are integrated. The datasets range from mobility and geodata to infrastructure surveys and administrative information.
Thanks to the automated profiles generated by the EDPS, data users can assess the quality of datasets before accessing them – across platforms and without ever touching the original data.
DASEEN continues to grow: The Mobility Data Space and the Pontus-X ecosystem will be integrated soon, and discussions with additional partners are already underway.
“The Dataset Search Engine shows us for the first time what really lies within other data spaces – without exposing sensitive data or requiring complex processes. It gives us insight where there were only black boxes before – and does so in a way that builds trust and enables collaboration.”
Manfred Rauhmeier
Chairman of the acatech Foundation
Who is the Dataset Search Engine for?
The Dataset Search Engine is highly versatile – supporting data-driven sectors such as Industry 4.0, mobility, energy, healthcare, food industry, smart cities, maritime economy, finance, culture, construction, and AI-driven business models.
Across Europe, numerous data spaces and portals are currently being developed to enable secure and interoperable data exchange. The Dataset Search Engine can be seamlessly integrated into these structures – including modern platforms such as Mobilithek and other open data portals.
Wherever data exists but remains difficult to access, the Dataset Search Engine creates transparency, trust, and interoperability – making data usable across sector boundaries.
Do you have questions?
Get in touch!
We’ll show you how to make your data visible, assessable, and interoperable with the Dataset Search Engine.
Project Partner