Skip to content
This repository has been archived by the owner on Jan 8, 2025. It is now read-only.

Added file backend alternative #18

Merged
merged 10 commits into from
Jul 30, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.nyc_output
coverage
fs
node_modules
tmp
.nyc_output
27 changes: 19 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Overview

This project allows you to have your own npm registry. This server works with the necessary `npm` commands just like the npmjs.org registry. You can use it to not worry about npm going down or to store your private packages. It performs much faster than npmjs.org and can even be matched with a CDN like Cloudfront to be fast globally.

Rather than trying to copy all the data in npm, this acts more like a proxy. While npm is up, it will cache package data in S3. If npm goes down, it will deliver whatever is available in the cache. This means it won't be a fully comprehensive backup of npm, but you will be able to access anything you accessed before. This makes it easy to set up since you don't need to mirror the entire registry. Any packages previously accessed will be available. Storing the data in S3 makes npm-register easy to maintain since any time a server acts up, you can simply blow it away and provision a new one with the same S3 credentials.
Rather than trying to copy all the data in npm, this acts more like a proxy. While npm is up, it will cache package data locally or in S3. If npm goes down, it will deliver whatever is available in the cache. This means it won't be a fully comprehensive backup of npm, but you will be able to access anything you accessed before. This makes it easy to set up since you don't need to mirror the entire registry. Any packages previously accessed will be available.

The inspiration for this project comes from [sinopia](https://github.com/rlidwka/sinopia). This came out of a need for better cache, CDN, and general performance as well as stability of being able to run multiple instances without depending on a local filesystem.

Expand All @@ -19,11 +19,7 @@ This is also a [12 Factor](http://12factor.net/) app to make it easy to host on
Setup
-----

The bulk of the data is stored in S3. You will need to set the `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_S3_BUCKET` environment variables.

If `REDIS_URL` is set (optional) redis will be used to cache package data.

The easiest way to set this up is with the Heroku button:
The easiest way to set this up is with the Heroku button (you must use S3 with Heroku):

[![Deploy to Heroku](https://www.herokucdn.com/deploy/button.png)](https://heroku.com/deploy)

Expand All @@ -42,14 +38,29 @@ $ npm update --registry http://urltomyregistry

See below for how to enable authorization and `npm publish`.

S3 Storage
----------

Use S3 for storage by setting `NPM_REGISTER_STORAGE=s3`. Then set `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_S3_BUCKET` to the proper values.

Local Filesystem Storage
------------------------

Using the local filesystem is the default. You can explicitly set the storage with `NPM_REGISTER_STORAGE=fs`. Select the location for the files to be stored with `NPM_REGISTER_FS_DIRECTORY=/var/npm-register`.

Redis
-----

Redis can optionally be used to cache the etags and package data. Set `REDIS_URL` to activate it.

How it works
------------

Essentially the goal of the project is to quickly deliver current npm data even when npm is offline. In npm there are 2 main types of requests: package metadata and tarballs.

Package metadata mostly contains what versions of a package are available. These cannot be cached for very long since the package can be updated. By default, it is cached for 60 seconds. You can modify this with `CACHE_PACKAGE_TTL`. Etags are also supported and cached to further speed up access.

The tarballs are the actual code and never change once they are uploaded (though they can be removed via unpublishing). These are downloaded one time from npmjs.org per package and version, stored in S3 and in the local tmp folder for future requests. These have a very long max-age header.
The tarballs are the actual code and never change once they are uploaded (though they can be removed via unpublishing). These are downloaded one time from npmjs.org per package and version, stored locally or in S3 for future requests. These have a very long max-age header.

In the event npmjs.org is offline, npm-register will use the most recent package metadata that was requested from npmjs.org until it comes back online.

Expand All @@ -65,7 +76,7 @@ npm commands supported
Authentication
--------------

npm-register uses an htpasswd file in S3 for authentication and stores tokens in S3. To set this up, first create an htpasswd file. Then upload it to `/htpasswd` in your S3 bucket. Use aws-cli.
npm-register uses an htpasswd file for authentication and stores tokens in S3. To set this up, first create an htpasswd file, then upload it to `/htpasswd` in your S3 bucket or your local file system:

```
$ aws s3 cp s3://S3BUCKET/htpasswd ./htpasswd
Expand Down
1 change: 1 addition & 0 deletions app.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"repository": "https://github.com/dickeyxxx/npm-register",
"keywords": ["node", "npm"],
"env": {
"NPM_REGISTER_STORAGE": "s3",
"AWS_ACCESS_KEY_ID": "",
"AWS_SECRET_ACCESS_KEY": "",
"AWS_S3_BUCKET": "",
Expand Down
22 changes: 5 additions & 17 deletions config.js
Original file line number Diff line number Diff line change
@@ -1,34 +1,22 @@
'use strict'

let path = require('path')
let url = require('url')
let env = process.env

function getenv (key) {
let val = env[key]
if (!val) {
throw new Error(key + ' must be set')
}
return val
}

let config = {
tmp: path.normalize(path.join(__dirname, '/../tmp')),
port: env.PORT || 3000,
production: !!['production', 'staging'].find((e) => e === env.NODE_ENV),
production: !!['production', 'staging'].find(e => e === env.NODE_ENV),
timeout: env.TIMEOUT || 10000,
uplink: url.parse(env.UPLINK || 'https://registry.npmjs.org'),
redis: env.REDIS_URL,
cloudfrontHost: env.CLOUDFRONT_HOST,
s3: {
key: getenv('AWS_ACCESS_KEY_ID'),
secret: getenv('AWS_SECRET_ACCESS_KEY'),
bucket: getenv('AWS_S3_BUCKET')
},
cache: {
packageTTL: parseInt(env.CACHE_PACKAGE_TTL) || 60,
tarballTTL: parseInt(env.CACHE_TARBALL_TTL) || (6 * 60 * 60)
}
},
storage: env.NPM_REGISTER_STORAGE || 'fs',
fs: {directory: env.NPM_REGISTER_FS_DIRECTORY},
s3: {bucket: env.AWS_S3_BUCKET}
}

module.exports = config
140 changes: 58 additions & 82 deletions lib/npm.js
Original file line number Diff line number Diff line change
@@ -1,101 +1,77 @@
'use strict'

let Promise = require('bluebird')
let https = require('https')
let got = require('got')
let co = require('co')
let url = require('url')
let config = require('../config')
let redis = require('./redis')

module.exports = function (metric) {
function rediserr () {
console.error(`cannot connect to redis. state: ${redis.status}`)
}

function cacheKey (name) {
return `/packages/${name}`
}
let cacheKey = name => `/packages/${name}`

function * isEtagFresh (name, etag) {
try {
if (!etag) return
if (redis.status !== 'ready') return rediserr()
return etag === (yield redis.get(`${cacheKey(name)}/etag`))
} catch (err) {
console.error(err.stack)
}
}
function isEtagFresh (name, etag) {
return redis.get(`${cacheKey(name)}/etag`)
.then(cache => etag === cache)
.catch(err => console.error(err.stack))
}

function updateEtag (name, etag) {
co(function * () {
if (redis.status !== 'ready') return rediserr()
yield redis.setex(`${cacheKey(name)}/etag`, config.cache.packageTTL, etag)
})
}
function updateEtag (name, etag) {
redis.setex(`${cacheKey(name)}/etag`, config.cache.packageTTL, etag)
.catch(err => console.error(err.stack))
}

function * fetchFromCache (name) {
try {
if (redis.status !== 'ready') return rediserr()
let pkg = yield redis.get(cacheKey(name))
if (pkg) {
console.error(`${name} found in cache`)
return JSON.parse(pkg)
}
} catch (err) {
console.error(err.stack)
function fetchFromCache (name) {
return redis.get(cacheKey(name))
.then(pkg => {
if (pkg) {
console.log(`${name} found in cache`)
return JSON.parse(pkg)
}
}
})
.catch(err => console.error(err.stack))
}

function updateCache (pkg) {
co(function * () {
if (redis.status !== 'ready') rediserr()
yield redis.setex(cacheKey(pkg.name), config.cache.packageTTL, JSON.stringify(pkg))
})
}
function updateCache (pkg) {
if (!redis) return
redis.setex(cacheKey(pkg.name), config.cache.packageTTL, JSON.stringify(pkg))
.catch(err => console.error(err.stack))
}

function * get (name, etag) {
try {
if (yield isEtagFresh(name, etag)) return 304
let pkg = yield fetchFromCache(name)
if (pkg) return pkg
let end = metric.profile('npm.fetch', {'package': name})
let opts = {timeout: config.timeout, headers: {}}
if (etag) opts.headers['if-none-match'] = etag
let res = yield got(url.resolve(config.uplink.href, '/' + name.replace(/\//, '%2F')), opts)
pkg = JSON.parse(res.body)
pkg.etag = res.headers.etag
updateCache(pkg)
end()
return pkg
} catch (err) {
switch (err.statusCode) {
case 304:
updateEtag(name, etag)
return 304
case 404:
return 404
default:
console.error(`error downloading ${name}: ${err}`)
return 404
}
function * get (name, etag) {
try {
if (etag && redis && (yield isEtagFresh(name, etag))) return 304
let pkg = redis ? yield fetchFromCache(name) : null
if (pkg) return pkg
let opts = {timeout: config.timeout, headers: {}}
if (etag) opts.headers['if-none-match'] = etag
let res = yield got(url.resolve(config.uplink.href, '/' + name.replace(/\//, '%2F')), opts)
pkg = JSON.parse(res.body)
pkg.etag = res.headers.etag
updateCache(pkg)
return pkg
} catch (err) {
switch (err.statusCode) {
case 304:
if (redis) updateEtag(name, etag)
return 304
case 404:
return 404
default:
console.error(`error downloading ${name}: ${err.stack}`)
return 404
}
}
}

function getTarball (name, filename) {
return new Promise(function (resolve, reject) {
https.get(`${config.uplink.href}${name}/-/${filename}`, function (res) {
if (res.statusCode === 404) {
resolve()
return
}
resolve(res)
}).on('error', reject)
})
}
function getTarball (name, filename) {
return new Promise(function (resolve, reject) {
https.get(`${config.uplink.href}${name}/-/${filename}`, function (res) {
if (res.statusCode === 404) return resolve()
resolve(res)
}).on('error', reject)
})
}

return {
get: get,
getTarball
}
module.exports = {
get: get,
getTarball
}
Loading