Metadata Structuring

...

Metadata serves two distinct purposes in the tokenized datasets framework:

Public Metadata

Basic descriptive information such as schema definitions, creation timestamps, and general categorization is stored on-chain in Patricia Tries in plaintext to enable discovery and validation. This information is hashed using SHA-512 to ensure integrity, but the hash is stored alongside the plaintext data.

Privacy-Sensitive Metadata

Statistical summaries, detailed provenance information, and other potentially sensitive metadata can be encrypted and stored off-chain through off-chain workers, with only hash references maintained on-chain. For this sensitive metadata, zero-knowledge proofs can selectively verify properties without revealing the underlying information.

This dual approach balances the need for discoverability with privacy protection, acknowledging that hashing alone does not provide confidentiality for on-chain information.

The marketplace implements a standardized metadata schema that balances comprehensiveness with efficiency. The schema includes:

Dataset identification

Basic information like title, description, version, and creation timestamp

Technical specifications

Format, size, encoding, compression method, and schema definition

Quality indicators

Completeness, consistency metrics, update frequency, and last verification date

Domain-specific attributes

Field-relevant indicators like resolution for images, sampling rate for audio, or collection methodology for surveys

Usage terms

License type, attribution requirements, and permitted use categories

Dataset identification

Basic information like title, description, version, and creation timestamp

Technical specifications

Format, size, encoding, compression method, and schema definition

Quality indicators

Completeness, consistency metrics, update frequency, and last verification date

Domain-specific attributes

Field-relevant indicators like resolution for images, sampling rate for audio, or collection methodology for surveys

Usage terms

License type, attribution requirements, and permitted use categories

Functions of Metadata in the Tokenization Process

This structured approach enables efficient discovery and evaluation of datasets while ensuring that critical information is consistently available across the marketplace. The metadata serves several crucial functions in the tokenization process:

It enables efficient dataset discovery without requiring access to the full data

It provides the basis for quality assessment and validation before purchase

It facilitates provenance tracking and attribution for regulatory compliance

It documents the technical requirements for utilizing the dataset effectively

It enables efficient dataset discovery without requiring access to the full data

It provides the basis for quality assessment and validation before purchase

It facilitates provenance tracking and attribution for regulatory compliance

It documents the technical requirements for utilizing the dataset effectively

Consider a 500 MB dataset of weather records: its metadata might include the source, schema (e.g., columns for temperature, humidity, wind speed), statistical summaries (e.g., average temperature of 15°C), and a timestamp.
The metadata is serialized, hashed with SHA-512 to produce a fixed-length digest, and linked to the CID. This allows consumers to verify the dataset's authenticity and relevance before purchase.

The Proof Behind Champions

Those who compete at the edge of human precision now back the technology that defines digital truth.

Explore The Dolphins

Continue Exploring ZKP Partners

Our only telegram handle: @ZKPofficial

Public Metadata

Privacy-Sensitive Metadata

Dataset identification

Technical specifications

Quality indicators

Domain-specific attributes

Usage terms

Dataset identification

Technical specifications

Quality indicators

Domain-specific attributes

Usage terms

Functions of Metadata in the Tokenization Process

The Proof Behind Champions