Data Design
Building ETL processes for blockchains involves processing huge amounts of unstructured data. Accordingly, designing a database is a non-trivial task. In the initial design of the data model, we pursued the following goals:
Abstraction
- Providing bronze, silver, and gold metrics through abstracting the content of data tables.
Index
- To improve query speed, it's important to optimize the index based on user behavior and usage patterns.
OnChain + OffChain
- To gain a deeper understanding of on-chain data, we provide static data with the suffix info, including some data collected from off-chain/community sources.
Minimize Joins
- With the most frequently used combination of slug+chain, you can quickly find the data you want in multiple tables.
With these goals in mind, we used the following approaches to organise the data:
The quick answer is that we have a special approach to data organisation, as well as an architecture that uses the most relevant data science tools and approaches
Lego
Footprint defines a DeFi Lego hierarchy with a business logic abstraction layer, based on traditional financial instrument hierarchy and smart contract characteristics. It tags on-chain transactions with this Lego hierarchy and transforms unstructured data split out from smart contracts into structured and meaningful data. We designed the database with different business domains and different components for each domain, which can be combined like Lego - through the combination we can quickly parse on-chain data and bring it into a structure and standard data tables. Take a look at the following examples:
Layering
Also we have an important additional step - tiering of tables. We divide tables from different domains into bronze, silver and gold. See more in Layers and Domains.
Semantic categorisation
There are different ways how the event could be called and infinite ways how it could be named. We do not find it a good experience for analysts to go read a smart contract code to understand at what exact point the event was emitted so that it is clear what business knowledge could be extracted. No matter what the code for minting a NFT on a specific chain looks like, one needs to find a way to put all of this data into one category, called Minting
.
It is this organization of data that allows you to achieve the following:
Footprint is much faster
Whereas other popular solutions need to recompute high level indicators every time from raw event data, in Footprint with the data layers abstraction layers (silver and gold), the amount of work is significantly reduced to query key information. Ultimately, user requests are completed in milliseconds, where similar requests can take minutes on other platforms.
With the sophisticated architecture in the background come the caching mechanisms that greatly reduce redundant work and computing resource from O(n) to O(1). Dataset previously queried will be saved in cache and will be used during the next query execution. That way, users get their requests fast and our network donβt experience overloads so other users are not affected.
Several interfaces
The slow execution of requests completely closes access to the DBaaS model through various APIs. With the speed of query execution and the amount of data provided, we are leaders in providing and keeping data up-to-date to our partners. Check out API for more.
It cleans and integrates on-chain data so users of any experience level can quickly start researching tokens, projects and protocols. With over a thousand dashboard templates plus a drag-and-drop interface, anyone can build their own customised charts in minutes. Uncover blockchain data and discover the value trend behind the project.
Updated over 1 year ago