Skip to content
/ kjsonl Public

An easy to parse file format for large amounts of key-value storage in JSON format

License

Notifications You must be signed in to change notification settings

benjie/kjsonl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kjsonl

An easy to parse file format for large amounts of key-value storage in JSON format.

  • KJSONL (.kjsonl) - key, JSON, linefeed
  • KJSONLU (.kjsonlu) - key, JSON, linefeed (unsorted)

Example:

"population:one": "VR Game"
favourite_book: {"title": "Good Omens", "authors": ["Terry Pratchett", "Neil Gaiman"]}
meaning_of_life: 42

Installation

npm install kjsonl

Library

import { KJSONLGetter } from "kjsonl";

// Create a getter for your chosen KJSONL file:
const getter = new KJSONLGetter(`path/to/file.kjsonl`);

// Your code here; within which you'll probably read one or more keys from the
// kjsonl file:
const value = await getter.get("my_key");

// Finally, release the getter:
await getter.release();

CLI

The kjsonl module is shipped with a command-line kjsonl utility with the following capabilities:

Usage:

  kjsonl get path/to/file.kjsonl key

    Get the value for the given key within the KJSONL file.

  kjsonl keys path/to/file.kjsonl

    Output the keys from the given KJSONL file.

  kjsonl json path/to/file.kjsonl

    Output the given kjsonl file as JSON.

  kjsonl merge -t target.kjsonl source1.kjsonl [source2.kjsonl...]

    Merge the contents of the given source files with the contents of the target file. If the target file doesn't exist, create it.

  kjsonl delete -t path/to/file.kjsonl key1 [key2...]

    Delete the given keys from the given KJSONL file.

Flags:

--help
-h

    Output available CLI flags

--version
-v

    Output the version

Warning

Currently the CLI makes assumptions that the files are KJSONL (sorted) files not KJSONLU (unsorted) files; this may impact some operations - for example, merge may not output what you would expect.

Caution

If the CLI encounters git conflict markers, it will attempt to resolve the conflict by removing these markers and accepting both the incoming and current changes. This approach, however, may not accurately reflect key deletions when a conflict occurs.

KJSONL spec

WORK IN PROGRESS

A KJSONL or KJSONLU file follows these rules:

  1. File is encoded in UTF8
  2. Lines are delimited by \n or \r\n
  3. Lines beginning with # are ignored
  4. Empty lines are ignored
  5. Every non-ignored line must define a key-value pair as follows:
    1. First the encoded key
    2. Next a colon
    3. Next, optionally, a single space character
    4. Finally, the JSON-encoded value with all optional whitespace omitted
  6. For .kjsonl files, other than ignored lines, every line in the file must be sorted by the encoded value of the key

Encoding a key:

  1. If key contains a "special character" or is empty, return JSON.stringify(key)
  2. Otherwise return key

Special characters are any characters that require escaping in JSON, any character with a UTF8 code point value greater than 127, any whitespace character, and the : and # characters. (TBC.)

NOTE: when serializing to KJSONL in other languages, it's essential to match the behavior of JavaScript's JSON.stringify() function.

JSON encoded keys must omit all optional whitespace characters (this means a JSON encoded key will always start and finish with a double quote (") character).

JSON encoded values must not contain newline (CR) or linefeed (LF) characters, all other optional whitespace should be omitted.

Sorted keys: to ensure that git diffs are stable, and to enable dictionary searches across extremely large files are possible, KJSONL files require that entries are sorted. Sorting of two keys is defined in the following way:

  1. Let {bytesA} be a list of the bytes in the UTF8-encoded encoded form of first key
  2. Let {bytesB} be a list of the bytes in the UTF8-encoded encoded form of second key
  3. Let {lenA} be the length of {bytesA}
  4. Let {lenB} be the length of {bytesB}
  5. Let {l} be the minimum of {lenA} and {lenB}
  6. For each {i} from {0} to {l-1}:
    1. Let {a} be the numeric value of the byte at index {i} in {bytesA}
    2. Let {b} be the numeric value of the byte at index {i} in {bytesB}
    3. If {a < b}, return {-1}
    4. If {a > b}, return {1}
  7. If {lenA < lenB} return {-1}
  8. If {lenA > lenB} return {1}
  9. Note: {bytesA} and {bytesB} must be identical
  10. Return {0}

There must be no UTF8 BOM (0xEF 0xBB 0xBF) present in any KJSONL files; all KJSONL files are UTF8 encoded so the BOM is unnecessary.

About

An easy to parse file format for large amounts of key-value storage in JSON format

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published