Ramchains

From SuperNET Wiki
Jump to: navigation, search

Ramchains are a customized and distilled database a given blockchain that allows to query information in a much faster way. Developed by jl777, it runs mostly on RAM, and as such is exponentially faster when doing blockchain related processes. It is also half to a third of the regular size of a given blockchain. The process by which Ramchains are made only applies to first generation crypto-currencies, or Bitcoin clones, and BitcoinDark must be also installed for Ramchains to work in their current state.

Ramchains are not a replacement for full nodes, as they are read only technology[1].


What are Ramchains?[edit]

Ramchains are a new SuperNET feature created by jl777, initially developed to remove performance shortcomings in the original Multigateway code. They work essentially as a distillation of a coin blockchain, and allow for ultra­fast, random access queries to the essential blockchain information, while keeping resource usage at a minimum.

The use of entire blockchains as database, specially in cases like Bitcoin or other bitcoind forks with a large, bloated ledger, brings important limitations to blockchain-­based services. Blockchains may require a large storage space, and the coin daemons can be slow when responding to requests, particularly in historical processing and pattern search. This will turn even more expensive if we consider the associated heavy memory and CPU loads.

Ramchains work to solve these problems by providing light weight files that can be accessed directly by SuperNET without sending a request to the coin daemon. The ramchain files are memory mapped, which reduces RAM usage and increases dramatically the query speeds.

“From the limit of about 500 RPC calls per second to bitcoind, ramchains can probably do 100,000 ­ 500,000 internal requests per second.” ­ jl777

The first use case for ramchains is to boost performance and speed in Multigateway operations in the upcoming SuperNET v1b client. But besides optimizing core SuperNET features, ramchains can be extremely valuable for any service that depends on blockchain information.


Ramchain creation process[edit]

Ramchains are created by SuperNET processing the coin blockchain, and generating a specific folder structure under a ramchains folder, at the level where the SuperNET executable is located. The one time creation process can be followed in the process terminal output, and it will take a number of hours depending on the size of the coin blockchain and the performance of the system. It goes as follows:


1) Inside of the ramchains folder, the following subfolders are created:

­ A COIN folder for every coin being processed. Inside of it, ­ A bitstream folder, containing ­ 64 subfolders, from 00000_00fff to 3f000_3fffff , with ­ 64 sub­subfolders inside of each.

2) For every block in the coin blockchain, a n.V file is created containing raw data, n being the block height. V files include all the essential information for Multigateway to validate multisignature transactions. They include lossless encoding for data that can be verified and represented with a small number, like multiple checksums. These V files are created under the ramchains/COIN/bitstream/numbered subfolders, which will be populated with 64 .V files each. If the number of blocks in the blockchain exceeds 262,144 (64 x 64 x 64) blocks above that number will be saved starting back in the first subfolder, also in groups of 64.

3) Once the latest confirmed block .V file has been created, an additional set of n.B files is created from the V files in the same folders. These B files substitute the raw information in the V files for the corresponding strings.

4) When V and B files have been made for every block, B files are combined into .B64 files (containing 64 .B files) and these are combined further into .B4096 files (containing 64 .B64 files).

5) All the .B4096 files can then be combined into a unique COIN.blocks file.

6) Finally, three string tables are created in the ramchains/COIN folder.

a) COIN.addr, an addresses table 
b) COIN.txid, a transaction ID table 
c) COIN.script, a table containing transaction scripts (the transaction instructions) 

Every entry in these tables has been mapped to a raw index. This allows the system to use light, 32 bit numbers to denote any of the original high entropy fields.

Real Time mode[edit]

Once ramchain files have been fully generated, SuperNET will switch from historical reprocessing to real time mode. As new coin blocks are confirmed, new V files are generated; After every .V file is made, the corresponding .B file is made. When there are 64 new .B files ready, a .B64 file is produced, and similarily a .B4096 file. In every SuperNET restart, all the B/B64/B4096 files created in the previous session will be processed and added to the COIN.blocks file, and string tables will be accordingly updated. The absence of string table files is what works as signal for SuperNET to regenerate the ramchain of a particular coin from scratch.

After all the ramchain files are generated the first time, there is data redundancy: all V files but the ones that have been generated in real time mode could be deleted. Also, the B files are contained in the B64 files, which are contained in the B4096 files, which are contained in the blocks file. Code updates will soon include an automated cleanup of unneeded files, to decrease storage space used by ramchains while keeping a reasonable amount of redundancy. The necessary files to use a ramchain can take a fraction of the original blockchain size, before using data compression.

Memory Mapping and Ramchains Access[edit]

A key aspect in ramchains design is the use of memory­mapped files.

“A memory­mapped file is a segment of virtual memory which has been assigned a direct byte­for­byte correlation with some portion of a file or file­like resource. (...) Once present, this correlation between the file and the memory space permits applications to treat the mapped portion as if it were primary memory.” (1)

In order to achieve the fastest possible query speed when accessing ramchains, the main ramchain files (the blocks files and the string tables) are handled in two ways. Besides 3 being accessed in read/write access, they string tables can be loaded into RAM and become a memory mapped structure.

“if there is plenty of memory available, 100% strings will be in memory ­ no HDD access at all. But, if something needs memory, they get swapped out to HDD.” ­ jl777

So the string data can stay in the hard drive without using any memory if needed ­ while being accessed as RAM memory.

Although these string files are much smaller than the original blockchain, coins with a very large blockchain might still result in large ramchains files. To limit the impact in RAM, ramchains includes a customized virtual memory system.

Besides the ramchain blocks/tables and all the source V/B files, the ramchainsfolder also adds under ramchains/COIN/bitstreamone or more space.#files.

Ramchains build the current in-­memory data structure into these memory mapped files, so the actual memory footprint of a coin will be in the 100-­200 MB range regardless of blockchain size. These space.# files are only necessary while SuperNET is running.

During runtime, the order in which ramchain files are accessed to process requests follows a simple logic:

­ Primarily, the blocks information is used. This corresponds to the single internal array composed by COIN.blocks, as updated during startup, and the successive B4096, B64 and B files generated since. In a majority of analysis and requests, the light data contained in blocks information will suffice. ­ When this is not enough (for instance, when the exact transaction ID string or transaction contents are requested by the user) the table strings are used: COIN.addr, COIN,txid and COIN.script. ­ And only in the uncommon cases where other pieces of information are required (for instance, original checksum strings) the raw data is accessed ­ in the V files when still available, and otherwise in the coin blockchain, locally or remotely.

After the first run where a ramchain is generated, initialization speed and processing time are dramatically improved.

“for BTCD a few seconds to load the files, a few seconds to rescan entire blockchain and create RAM resident data structures so any query can be made totally from RAM without any searching, direct lookup. (...) A recalculation of the rich list takes ~40 milliseconds on my Mac mini” ­ jl777

The speed and flexibility using ramchains allows doing searches through regular expressions (for instance, search an entire blockchain for all transactions with addresses that have the pattern “777”) with results delivered in few seconds at most. Thanks to the SuperNET RPC, users will be able to request pattern searches to any SuperNET node hosting ramchains. This type of pattern matching was hardly possible in blockchain explorers due to lack of speed, and can expand notably future use cases for Ramchains besides Multigateway/SuperNET operations.

References[edit]

  1. #. Ramchains Documentation v1, https://slack-files.com/T02LAJBUW-F03EBKCM1-e78b539b05