@discoveryjs/scan-git
is a powerful Node.js library designed for reading and analyzing Git repositories directly from the filesystem. It provides a rich set of APIs that allow you to access Git objects, references, commits, trees, and more without the need for Git command-line tools or external dependencies.
Whether you're building tools for repository analysis, visualization, or automation, @discoveryjs/scan-git
provides a robust and efficient API to meet your Git interaction needs.
Key Features:
- Direct Repository Access: Interact with Git repositories by reading data directly from the
.git
directory. - Comprehensive Git Object Support: Work with both loose and packed objects, including support for large pack files over 2GB.
- Advanced Git Features: Handle complex repository structures with support for cruft packs and on-disk reverse indexes.
- Efficient Data Retrieval: Efficiently fetch commit histories, branches, tags, and files, even for large repositories.
- Flexible APIs: Compute diffs between commits, read specific Git objects, and parse commits, trees, and annotated tags.
npm install @discoveryjs/scan-git
- Git reader
createGitReader(gitdir, options?)
reader.dispose()
- Reference methods
- Trees (file lists) methods
- Commit methods
- Misc methods
- Utils
import { createGitReader } from '@discoveryjs/scan-git';
const reader = await createGitReader('path/to/.git');
const commits = await reader.log({ ref: 'my-branch', depth: 10 });
console.log(commits);
await reader.dispose();
Creates an instance of the Git reader, which provides access to most of the library's functionality:
gitdir
:string
The path to the Git repository. This can either be a directory containing a.git
folder or a direct path to a.git
folder (even if it has a non-standard name).options
(optional):maxConcurrency
:number
(default: 50)
Limits the number of concurrent file system operationscruftPacks
:'include'
|'exclude'
|'only'
|boolean
(default:'include'
)
Defines how cruft packs are processed:'include'
ortrue
– Process all packs'exclude'
orfalse
– Exclude cruft packs from processing'only'
– Process only cruft packs
import { createGitReader } from '@discoveryjs/scan-git';
const reader = await createGitReader('path/to/.git');
Cleans up resources used by the reader instance, such as file handles or caches. This method should be called when the reader instance is no longer needed to ensure proper resource management and avoid memory leaks.
const reader = await createGitReader('path/to/.git');
// do something with reader
// Dispose of the repository instance when done
await reader.dispose();
Note: After calling
dispose()
, attempting to use the reader instance (e.g., calling methods likelog()
orreadCommit()
) will likely result in errors or undefined behavior.
Note: Always ensure
dispose()
is called in applications or scripts that manage multiple repositories or long-running processes to prevent resource exhaustion.
Common parameters:
ref
: string – a reference to an object in repositorywithOid
: boolean – a flag to include resolved oid for a reference
Returns the default branch name of a repository:
const defaultBranch = await reader.defaultBranch();
// 'main'
The algorithm to identify a default branch name:
- if there is only one branch, that must be the default
- otherwise looking for specific branch names, in this order:
upstream/HEAD
origin/HEAD
main
master
Returns the current branch name along with its commit oid.
If the repository is in a detached HEAD state, name
will be null
.
const currentBranch = await reader.currentBranch();
// { name: 'main', oid: '8bb6e23769902199e39ab70f2441841712cbdd62' }
const detachedHead = await reader.currentBranch();
// { name: null, oid: '8bb6e23769902199e39ab70f2441841712cbdd62' }
Checks if a ref
exists.
const isValidRef = reader.isRefExists('main');
// true
Expands a ref
into a full form, e.g. 'main'
-> 'refs/heads/main'
.
Returns null
if ref
doesn't exist. For the symbolic ref names ('HEAD'
, 'FETCH_HEAD'
, 'CHERRY_PICK_HEAD'
, 'MERGE_HEAD'
and 'ORIG_HEAD'
) returns a name without changes.
const fullPath = reader.expandRef('heads/main');
// 'refs/heads/main'
Resolves ref
into oid if it exists, otherwise throws an exception.
In case if ref
is oid, returns this oid back. If ref is not a full path, expands it first.
const oid = await reader.resolveRef('main');
// '8bb6e23769902199e39ab70f2441841712cbdd62'
Returns an info object for provided ref
.
const info = await reader.describeRef('HEAD');
// {
// path: 'HEAD',
// name: 'HEAD',
// symbolic: true,
// ref: 'refs/heads/test',
// oid: '2dbee47a8d4f8d39e1168fad951b703ee05614d6'
// }
const info = await reader.describeRef('main');
// {
// path: 'refs/heads/main',
// name: 'main',
// symbolic: false,
// scope: 'refs/heads',
// namespace: 'refs',
// category: 'heads',
// remote: null,
// ref: null,
// oid: '7b84f676f2fbea2a3c6d83924fa63059c7bdfbe2'
// }
const info = await reader.describeRef('origin/HEAD');
// {
// path: 'refs/remotes/origin/HEAD',
// name: 'HEAD',
// symbolic: false,
// scope: 'refs/remotes',
// namespace: 'refs',
// category: 'remotes',
// remote: 'origin',
// ref: 'refs/remotes/origin/main',
// oid: '7b84f676f2fbea2a3c6d83924fa63059c7bdfbe2'
// }
Checks if a value
is a valid oid.
reader.isOid('7b84f676f2fbea2a3c6d83924fa63059c7bdfbe2'); // true
reader.isOid('main'); // false
const remotes = reader.listRemotes();
// [
// 'origin'
// ]
Get a list of branches for a remote.
const originBranches = await reader.listRemoteBranches('origin');
// [
// 'HEAD',
// 'main'
// ]
const originBranches = await reader.listRemoteBranches('origin', true);
// [
// { name: 'HEAD', oid: '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e' }
// { name: 'main', oid: '56ea7a808e35df13e76fee92725a65a373a9835c' }
// ]
Get a list of local branches.
const localBranches = await reader.listBranches();
// [
// 'HEAD',
// 'main'
// ]
const localBranches = await reader.listBranches(true);
// [
// { name: 'HEAD', oid: '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e' }
// { name: 'main', oid: '56ea7a808e35df13e76fee92725a65a373a9835c' }
// ]
Get a list of tags.
const tags = await reader.listTags();
// [
// 'v1.0.0',
// 'some-feature'
// ]
const tags = await reader.listTags(true);
// [
// { name: 'v1.0.0', oid: '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e' }
// { name: 'some-feature', oid: '56ea7a808e35df13e76fee92725a65a373a9835c' }
// ]
Resolves a Git reference (e.g., branch name, tag, commit, or SHA-1 hash) to the object ID (OID) of the corresponding tree.
ref
:string
– The reference, SHA-1 hash, or object ID to resolve.
Behavior:
- If the reference points to an annotated tag, the method resolves the tag to its underlying object
- If the reference resolves to a commit, the method retrieves the tree associated with the commit
- If the reference resolves directly to a tree, the tree OID is returned
- Throws an error if the resolved object is not a tree, commit, or tag
const treeOid = await reader.treeOidFromRef('HEAD');
// 'a1b2c3d4e5f6...'
// Error handling
try {
const invalidTreeOid = await reader.treeOidFromRef('nonexistent-ref');
} catch (error) {
console.error(error.message); // "Object 'nonexistent-ref' must be a 'tree' but ..."
}
List all files in the repository at the specified commit reference.
ref
: string (default:'HEAD'
) – commit referencefilesWithHash
: boolean (default:false
) – specify to return blob's hashes
const headFiles = reader.listFiles(); // the same as reader.listFiles('HEAD')
// [ 'file.ext', 'path/to/file.ext', ... ]
const headFilesWithHashes = reader.listFiles('HEAD', true);
// [ { path: 'file.ext', hash: 'f2e492a3049...' }, ... ]
Retrieve a tree entry (file or directory) by its path at the specified commit reference.
path
: string - the path to the file or directoryref
: string (default:'HEAD'
) - commit reference
const entry = await reader.getPathEntry('path/to/file.txt');
// { isTree: false, path: 'path/to/file.txt', hash: 'a1b2c3d4e5f6...' }
Retrieve a list of tree entries (files or directories) by their paths at the specified commit reference.
paths
: string[] - an array of paths to files or directoriesref
: string (default:'HEAD'
) - commit reference
const entries = await reader.getPathsEntries([
'path/to/file1.txt',
'path/to/dir1',
'path/to/file2.txt'
]);
// [
// { isTree: false, path: 'path/to/file1.txt', hash: 'a1b2c3d4e5f6...' },
// { isTree: true, path: 'path/to/dir1', hash: 'b1c2d3e4f5g6...' },
// { isTree: false, path: 'path/to/file2.txt', hash: 'c1d2e3f4g5h6...' }
// ]
Compute the file delta (changes) between two commit references, including added, modified, and removed files.
nextRef
: string (default:'HEAD'
) - commit reference for the "next" stateprevRef
: string (optional) - commit reference for the "previous" state
const fileDelta = await reader.deltaFiles('HEAD', 'branch-name');
// {
// add: [ { path: 'path/to/new/file.txt', hash: 'a1b2c3d4e5f6...' }, ... ],
// modify: [ { path: 'path/to/modified/file.txt', hash: 'f1e2d3c4b5a6...', prevHash: 'a1b2c3d4e5f6...' }, ... ],
// remove: [ { path: 'path/to/removed/file.txt', hash: 'a1b2c3d4e5f6...' }, ... ]
// }
Resolves a Git reference (e.g., branch name, tag, or SHA-1 hash) to the object ID (OID) of the corresponding commit.
ref
:string
– The reference, SHA-1 hash, or object ID to resolve.
Behavior:
- If the reference points to an annotated tag, the method resolves the tag to its underlying commit.
- Throws an error if the reference does not resolve to a valid commit.
const commitOid = await reader.commitOidFromRef('HEAD');
// '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e'
// Error handling
try {
const invalidCommitOid = await reader.commitOidFromRef('nonexistent-ref');
} catch (error) {
console.error(error.message); // "Object 'nonexistent-ref' must be a 'commit' but ..."
}
Reads and resolves a commit object identified by a reference (e.g., branch name, tag, or SHA-1 hash).
ref
:string
– The reference, SHA-1 hash, or object ID of the commit.
const commit = await reader.readCommit('HEAD');
// {
// oid: '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e',
// tree: '20596d5c9e037844ae2b707a4a1cb45c72e70e7f',
// parent: ['8bb6e23769902199e39ab70f2441841712cbdd62'],
// author: { name: 'John Doe', email: '[email protected]', timestamp: 1680390225, timezone: '+0200' },
// committer: { name: 'Jane Doe', email: '[email protected]', timestamp: 1680392225, timezone: '+0200' },
// message: 'Initial commit',
// gpgsig: '-----BEGIN PGP SIGNATURE-----...'
// }
Returns a list of commits in topological order, starting from the specified reference.
options
: An object with the following properties:ref
:string
(default:'HEAD'
) – The reference, SHA-1 hash, or object ID to start from.depth
:number
(default:50
) – Limits the number of commits to retrieve. PassInfinity
to retrieve all reachable commits.
const commits = await reader.log({ ref: 'my-branch', depth: 10 });
// [
// { oid: 'a1b2c3d4...', tree: '...', parent: [...], author: {...}, committer: {...}, message: '...' },
// { oid: 'b2c3d4e5...', tree: '...', parent: [...], author: {...}, committer: {...}, message: '...' },
// ...
// ]
To retrieve all commits reachable from a ref
, set the depth
option to Infinity
.
const allCommits = await reader.log({ ref: 'my-branch', depth: Infinity });
console.log(allCommits.length); // All reachable commits
Reads and returns the header of a Git object by its hash.
hash
:Buffer
– The SHA-1 hash of the object
const hash = Buffer.from('8bb6e23769902199e39ab70f2441841712cbdd62', 'hex');
const header = await reader.readObjectHeaderByHash(hash);
// { type: 'commit', length: 123 }
Reads and returns the complete content of a Git object by its hash.
hash
:Buffer
– The SHA-1 hash of the objectcache
:boolean
(optional) – Whether to use reader's caching (default:true
)
const hash = Buffer.from('8bb6e23769902199e39ab70f2441841712cbdd62', 'hex');
const object = await reader.readObjectByHash(hash);
// { type: 'blob', object: <Buffer ...> }
Reads and returns the header of a Git object by its OID (Object ID).
oid
:string
– The Object ID of the Git object
const header = await reader.readObjectHeaderByOid('8bb6e23769902199e39ab70f2441841712cbdd62');
// { type: 'tree', length: 45 }
Reads and returns the complete content of a Git object by its OID.
oid
:string
– The Object ID of the Git object.cache
:boolean
(optional) – Whether to use reader's caching (default:true
).
const object = await reader.readObjectByOid('8bb6e23769902199e39ab70f2441841712cbdd62');
// { type: 'tree', object: <Buffer ...> }
Retrieves repository statistics, including refs, objects, and files.
const stats = await reader.stat();
/*
{
size: 163937,
refs: {
remotes: [
{ remote: "origin", branches: ["HEAD", "main", ...] },
...
],
branches: ["main", "foo", "bar", ...],
tags: ["tag1", "tag2", ...]
},
objects: {
count: 322,
size: 145569,
unpackedSize: 446973,
unpackedRestoredSize: 755430,
types: [
{ type: "tree", count: 23, size: 7537, unpackedSize: 8929, unpackedRestoredSize: 0 },
...
]
loose: {
objects: { count: 19, size: 15407, unpackedSize: 40312, unpackedRestoredSize: 0, types: [...] },
files: [
{
path: "objects/20/596d5c9e037844ae2b707a4a1cb45c72e70e7f",
size: 536,
object: { oid: "20596d5c9e037844ae2b707a4a1cb45c72e70e7f", type: "tree", length: 606 }
},
...
]
},
packed:{
objects: { ... },
files: [
{
path: "objects/pack/pack-43bc2b9ae5b7a56ab22e849c6c1dfaa00ba72ab1.pack",
size: 130194,
objects: { ... },
index: {
path: "objects/pack/pack-43bc2b9ae5b7a56ab22e849c6c1dfaa00ba72ab1.idx",
size: 9556,
namesBytes: 6060,
offsetsBytes: 1212,
largeOffsetsBytes: 0
},
reverseIndex: {
path: "objects/pack/pack-43bc2b9ae5b7a56ab22e849c6c1dfaa00ba72ab1.rev",
size: 1264
}
},
...
]
}
},
files: [
{ path: 'config', size: 123 },
{ path: 'objects/pack/pack-a1b2c3d4.pack', size: 456789 },
{ path: 'refs/heads/main', size: 45 }
]
}
*/
Checks whether the specified directory is a valid Git directory. Returns true
if the directory contains the necessary files and subdirectories to be a valid Git directory (e.g. objects
, refs
, HEAD
, and config
), false
otherwise.
dir
:string
– The path to the directory to check.
import { isGitDir } from '@discoveryjs/scan-git';
const isValidGitDir = await isGitDir('/path/to/repo/.git');
console.log(isValidGitDir); // true or false
Resolves the path to the Git directory for the specified input directory.
dir
:string
– The path to the directory to resolve.
Behaviour:
- If the input directory contains a
.git
subdirectory, the method resolves to its path - If no
.git
subdirectory is found, it resolves the input directory itself, assuming it's already the.git
directory - Throws an error if the input path doesn't exist or isn't a directory
import { resolveGitDir } from '@discoveryjs/scan-git';
try {
const gitDir = await resolveGitDir('/path/to/repo');
console.log(gitDir); // '/path/to/repo/.git' or '/path/to/repo'
} catch (error) {
console.error(error.message);
}
Parses a string representation of a Git contributor into a structured object.
input
:string
– A contributor string in the formatName <email> timestamp timezone
import { parseContributor } from '@discoveryjs/scan-git';
const contributor = parseContributor('John Doe <[email protected]> 1680390225 +0200');
// {
// name: 'John Doe',
// email: '[email protected]',
// timestamp: 1680390225,
// timezone: '+0200'
// }
Parses a Git timezone offset string into a numeric offset in minutes.
offset
:string
– A timezone string in the format+hhmm
or-hhmm
.
import { parseTimezone } from '@discoveryjs/scan-git';
const timezoneOffset = parseTimezone('+0200');
console.log(timezoneOffset); // 120
Parses a buffer representing an annotated Git tag into a structured object.
object
:Buffer
– The tag object buffer.
import { parseAnnotatedTag } from '@discoveryjs/scan-git';
const tagObject = await reader.readObjectByOid('7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e');
const tag = parseAnnotatedTag(tagObject.content);
// {
// tag: 'v1.0.0',
// type: 'tag',
// object: 'a1b2c3d4e5f6g7h8i9j0',
// tagger: { name: 'John Doe', email: '[email protected]', timestamp: 1680390225, timezone: '+0200' },
// message: 'Initial release',
// gpgsig: '-----BEGIN PGP SIGNATURE-----...'
// }
Parses a buffer representing a Git commit into a structured object.
object
:Buffer
– The commit object buffer.
import { parseCommit } from '@discoveryjs/scan-git';
const commitObject = await reader.readObjectByOid('7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e');
const commit = parseCommit(commitObject.content);
// {
// tree: 'a1b2c3d4e5f6g7h8i9j0',
// parent: ['b2c3d4e5f6g7h8i9j0k1'],
// author: { name: 'John Doe', email: '[email protected]', timestamp: 1680390225, timezone: '+0200' },
// committer: { name: 'John Doe', email: '[email protected]', timestamp: 1680390225, timezone: '+0200' },
// message: 'Fix a critical bug',
// gpgsig: '-----BEGIN PGP SIGNATURE-----...'
// }
Parses a buffer representing a Git tree object into a structured array of entries.
buffer
:Buffer
– The tree object buffer.
import { parseTree } from '@discoveryjs/scan-git';
const treeObject = await reader.readObjectByOid('7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e');
const tree = parseTree(treeObject.content);
// [
// { isTree: true, path: 'src', hash: <Buffer ...> },
// { isTree: false, path: 'README.md', hash: <Buffer ...> }
// ]
scan-git | isomorphic-git | Feature |
---|---|---|
✅ | ✅ | loose refs |
✅ | ✅ | packed refs |
🚫 | ✅ | index file Boosts fetching a file list for HEAD |
✅ | ✅ | loose objects |
✅ | ✅ | packed objects (*.pack + *.idx files) |
✅ | 🚫 | 2Gb+ packs support Version 2 pack-*.idx files support packs larger than 4 GiB by adding an optional table of 8-byte offset entries for large offsets |
✅ | 🚫 | On-disk reverse indexes (*.rev files) Reverse index is boosting operations such as a seeking an object by offset or scanning objects in a pack order |
🚫 | 🚫 | multi-pack-index (MIDX) Stores a list of objects and their offsets into multiple packfiles, can provide O(log N) lookup time for any number of packfiles |
🚫 | 🚫 | multi-pack-index reverse indexes (RIDX) Similar to the pack-based reverse index |
✅ | 🚫 | Cruft packs A cruft pack eliminates the need for storing unreachable objects in a loose state by including the per-object mtimes in a separate file alongside a single pack containing all loose objects |
🚫 | 🚫 | Pack and multi-pack bitmaps Bitmaps store reachability information about the set of objects in a packfile, or a multi-pack index |
🚫 (TBD) | 🚫 | commit-graph A binary file format that creates a structured representation of Git’s commit history, optimizes some operations |
MIT