Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Rename library to sourmash #374

Merged
merged 9 commits into from
Mar 10, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ _minhash.so
*.so
.coverage
sourmash_lib/_minhash.cpp
sourmash/_minhash.cpp
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include LICENSE Makefile Dockerfile LICENSE Makefile README.md requirements.txt
include index.ipynb
recursive-include sourmash_lib *
recursive-include sourmash *
recursive-include third-party *.cc *.h
global-exclude *.orig
global-exclude *.pyc
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Development happens on github at

`sourmash` is the main command-line entry point; run it for help.

`sourmash_lib/` contains the library code.
`sourmash/` contains the library code.

Tests require py.test and can be run with `make test`.

Expand Down
10 changes: 5 additions & 5 deletions doc/api-example.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ Define two sequences:
Create two minhashes using 3-mers, and add the sequences:

```
>>> import sourmash_lib
>>> E1 = sourmash_lib.MinHash(n=20, ksize=3)
>>> import sourmash
>>> E1 = sourmash.MinHash(n=20, ksize=3)
>>> E1.add_sequence(seq1)

```

```
>>> E2 = sourmash_lib.MinHash(n=20, ksize=3)
>>> E2 = sourmash.MinHash(n=20, ksize=3)
>>> E2.add_sequence(seq2)

```
Expand Down Expand Up @@ -81,7 +81,7 @@ raising an exception.
>>> import screed
>>> minhashes = []
>>> for g in genomes:
... E = sourmash_lib.MinHash(n=500, ksize=31)
... E = sourmash.MinHash(n=500, ksize=31)
... for record in screed.open(g):
... E.add_sequence(record.sequence[:50000], True)
... minhashes.append(E)
Expand Down Expand Up @@ -110,7 +110,7 @@ making the minhashes, which can be saved and loaded easily.
## Saving and loading signature files

```
>>> from sourmash_lib import signature
>>> from sourmash import signature
>>> sig1 = signature.SourmashSignature(minhashes[0], name=genomes[0][:20])
>>> with open('/tmp/genome1.sig', 'wt') as fp:
... signature.save_signatures([sig1], fp)
Expand Down
10 changes: 5 additions & 5 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,24 @@ underlying C++ code, but for now this is undocumented.)
``MinHash``: basic MinHash sketch functionality
===============================================

.. automodule:: sourmash_lib
.. automodule:: sourmash
:members:

``SourmashSignature``: save and load MinHash sketches in JSON
=============================================================

.. automodule:: sourmash_lib.signature
.. automodule:: sourmash.signature
:members:

``SBT``: save and load Sequence Bloom Trees in JSON
=============================================================

.. automodule:: sourmash_lib.sbt
.. automodule:: sourmash.sbt
:members: GraphFactory, Node, NodePos, SBT, Leaf
:undoc-members:

``sourmash_lib.fig``: make plots and figures
``sourmash.fig``: make plots and figures
============================================

.. automodule:: sourmash_lib.fig
.. automodule:: sourmash.fig
:members:
10 changes: 5 additions & 5 deletions doc/release.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ cd sourmash
pip uninstall -y sourmash; pip uninstall -y sourmash; make install
mkdir ../not-sourmash # if there is a subdir named 'sourmash' py.test will execute tests
# there instead of the installed sourmash module's tests
pushd ../not-sourmash; py.test --pyargs sourmash_lib; popd
pushd ../not-sourmash; py.test --pyargs sourmash; popd


# Secondly we test via pip
Expand All @@ -63,7 +63,7 @@ cd sourmash
cp dist/sourmash*tar.gz ../../../testenv3/
pip uninstall -y sourmash; pip uninstall -y sourmash; make install
cd ../.. # no subdir named sourmash here, safe for testing installed sourmash module
py.test --pyargs sourmash_lib
py.test --pyargs sourmash

# Is the distribution in testenv2 complete enough to build another
# functional distribution?
Expand All @@ -80,7 +80,7 @@ cd sourmash
make test
pip uninstall -y sourmash; pip uninstall -y sourmash; make install
mkdir ../not-sourmash
pushd ../not-sourmash ; py.test --pyargs sourmash_lib ; popd
pushd ../not-sourmash ; py.test --pyargs sourmash ; popd
```
4\. Publish the new release on the testing PyPI server. You will need
to change your PyPI credentials as documented here:
Expand All @@ -100,7 +100,7 @@ cd sourmash
# install as much as possible from non-test server!
pip install screed pytest numpy matplotlib scipy
pip install -i https://testpypi.python.org/pypi --pre --no-clean sourmash
py.test --pyargs sourmash_lib
py.test --pyargs sourmash
```
5\. Do any final testing:

Expand Down Expand Up @@ -144,5 +144,5 @@ so:
```

apt-cache update && apt-get -y install python-dev libfreetype6-dev && \
pip install sourmash[test] && py.test --pyargs sourmash_lib
pip install sourmash[test] && py.test --pyargs sourmash
```
6 changes: 3 additions & 3 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[pytest]
addopts = --doctest-glob='doc/*.md'
python_files = sourmash_lib/*.py tests/*.py
norecursedirs = utils build buildenv .tox .asv
addopts = --doctest-glob='doc/*.md' --ignore=setup.py
python_files = sourmash/*.py tests/*.py
norecursedirs = utils build buildenv .tox .asv .eggs
20 changes: 10 additions & 10 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
from __future__ import print_function
import sys
from setuptools import setup
from setuptools import setup, find_packages
from setuptools import Extension
import os

# retrieve VERSION from sourmash_lib/VERSION.
# retrieve VERSION from sourmash/VERSION.
thisdir = os.path.dirname(__file__)
version_file = open(os.path.join(thisdir, 'sourmash_lib', 'VERSION'))
version_file = open(os.path.join(thisdir, 'sourmash', 'VERSION'))
VERSION = version_file.read().strip()

EXTRA_COMPILE_ARGS = ['-std=c++11', '-pedantic']
Expand Down Expand Up @@ -51,16 +51,16 @@
"author": "C. Titus Brown",
"author_email": "[email protected]",
"license": "BSD 3-clause",
"packages": ["sourmash_lib"],
"packages": find_packages(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably recursively finds all packages and subpackages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is recommended by the Python packaging guide: https://packaging.python.org/tutorials/distributing-packages/#packages

"entry_points": {'console_scripts': [
'sourmash = sourmash_lib.__main__:main'
'sourmash = sourmash.__main__:main'
]
},
"ext_modules": [Extension("sourmash_lib._minhash",
sources=["sourmash_lib/_minhash.pyx",
"ext_modules": [Extension("sourmash._minhash",
sources=["sourmash/_minhash.pyx",
"third-party/smhasher/MurmurHash3.cc"],
depends=["sourmash_lib/kmer_min_hash.hh"],
include_dirs=["./sourmash_lib",
depends=["sourmash/kmer_min_hash.hh"],
include_dirs=["./sourmash",
"./third-party/smhasher/"],
language="c++",
extra_compile_args=EXTRA_COMPILE_ARGS,
Expand All @@ -74,7 +74,7 @@
},
"include_package_data": True,
"package_data": {
"sourmash_lib": ['*.pxd']
"sourmash": ['*.pxd']
},
"classifiers": CLASSIFIERS
}
Expand Down
5 changes: 0 additions & 5 deletions sourmash

This file was deleted.

File renamed without changes.
26 changes: 26 additions & 0 deletions sourmash/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#! /usr/bin/env python
"""
An implementation of a MinHash bottom sketch, applied to k-mers in DNA.
"""
from __future__ import print_function
import re
import math
import os

from ._minhash import (MinHash, get_minhash_default_seed, get_minhash_max_hash)
DEFAULT_SEED = get_minhash_default_seed()
MAX_HASH = get_minhash_max_hash()

from .signature import (load_signatures, load_one_signature, SourmashSignature,
save_signatures)
from .sbtmh import load_sbt_index, search_sbt_index, create_sbt_index
from . import lca
from . import sbt
from . import sbtmh
from . import sbt_storage
from . import signature

# retrieve VERSION from sourmash/VERSION.
thisdir = os.path.dirname(__file__)
version_file = open(os.path.join(thisdir, 'VERSION'))
VERSION = version_file.read().strip()
File renamed without changes.
File renamed without changes.
File renamed without changes.
53 changes: 24 additions & 29 deletions sourmash_lib/commands.py → sourmash/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@
import random

import screed
import sourmash_lib
from . import DEFAULT_SEED, MinHash, load_sbt_index, create_sbt_index
from . import signature as sig
from . import sourmash_args
from .logging import notify, error, print_results, set_quiet
from .sbtmh import SearchMinHashesFindBest, SigLeaf

from .sourmash_args import DEFAULT_LOAD_K
DEFAULT_COMPUTE_K = '21,31,51'
Expand Down Expand Up @@ -90,7 +91,7 @@ def compute(args):
help='choose number of hashes as 1 in FRACTION of input k-mers')
parser.add_argument('--seed', type=int,
help='seed used by MurmurHash (default: 42)',
default=sourmash_lib.DEFAULT_SEED)
default=DEFAULT_SEED)
parser.add_argument('--randomize', action='store_true',
help='shuffle the list of input filenames randomly')
parser.add_argument('--license', default='CC0', type=str,
Expand Down Expand Up @@ -173,18 +174,18 @@ def make_minhashes():
Elist = []
for k in ksizes:
if args.protein:
E = sourmash_lib.MinHash(ksize=k, n=args.num_hashes,
is_protein=True,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
E = MinHash(ksize=k, n=args.num_hashes,
is_protein=True,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
Elist.append(E)
if args.dna:
E = sourmash_lib.MinHash(ksize=k, n=args.num_hashes,
is_protein=False,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
E = MinHash(ksize=k, n=args.num_hashes,
is_protein=False,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
Elist.append(E)
return Elist

Expand Down Expand Up @@ -551,7 +552,7 @@ def import_csv(args):
hashes = hashes.strip()
hashes = list(map(int, hashes.split(' ' )))

e = sourmash_lib.MinHash(len(hashes), ksize)
e = MinHash(len(hashes), ksize)
e.add_many(hashes)
s = sig.SourmashSignature(e, filename=name)
siglist.append(s)
Expand Down Expand Up @@ -595,10 +596,10 @@ def sbt_combine(args):
inp_files = list(args.sbts)
notify('combining {} SBTs', len(inp_files))

tree = sourmash_lib.load_sbt_index(inp_files.pop(0))
tree = load_sbt_index(inp_files.pop(0))

for f in inp_files:
new_tree = sourmash_lib.load_sbt_index(f)
new_tree = load_sbt_index(f)
# TODO: check if parameters are the same for both trees!
tree.combine(new_tree)

Expand All @@ -610,8 +611,6 @@ def index(args):
"""
Build an Sequence Bloom Tree index of the given signatures.
"""
import sourmash_lib.sbtmh

parser = argparse.ArgumentParser()
parser.add_argument('sbt_name', help='name to save SBT into')
parser.add_argument('signatures', nargs='+',
Expand Down Expand Up @@ -639,10 +638,9 @@ def index(args):
moltype = sourmash_args.calculate_moltype(args)

if args.append:
tree = sourmash_lib.load_sbt_index(args.sbt_name)
tree = load_sbt_index(args.sbt_name)
else:
tree = sourmash_lib.create_sbt_index(args.bf_size,
n_children=args.n_children)
tree = create_sbt_index(args.bf_size, n_children=args.n_children)

if args.traverse_directory:
inp_files = list(sourmash_args.traverse_find_sigs(args.signatures))
Expand Down Expand Up @@ -670,7 +668,7 @@ def index(args):
nums.add(ss.minhash.num)
scaleds.add(ss.minhash.scaled)

leaf = sourmash_lib.sbtmh.SigLeaf(ss.md5sum(), ss)
leaf = SigLeaf(ss.md5sum(), ss)
tree.add_node(leaf)
n += 1

Expand Down Expand Up @@ -833,7 +831,7 @@ def categorize(args):
for row in r:
already_names.add(row[0])

tree = sourmash_lib.load_sbt_index(args.sbt_name)
tree = load_sbt_index(args.sbt_name)

if args.traverse_directory:
inp_files = set(sourmash_args.traverse_find_sigs(args.queries))
Expand All @@ -852,7 +850,7 @@ def categorize(args):
query_ksize, query_moltype)

results = []
search_fn = sourmash_lib.sbtmh.SearchMinHashesFindBest().search
search_fn = SearchMinHashesFindBest().search

for leaf in tree.find(search_fn, query, args.threshold):
if leaf.data.md5sum() != query.md5sum(): # ignore self.
Expand Down Expand Up @@ -1006,16 +1004,14 @@ def gather(args):
outname = args.output_unassigned.name
notify('saving unassigned hashes to "{}"', outname)

e = sourmash_lib.MinHash(ksize=query.minhash.ksize, n=0,
max_hash=new_max_hash)
e = MinHash(ksize=query.minhash.ksize, n=0, max_hash=new_max_hash)
e.add_many(next_query.minhash.get_mins())
sig.save_signatures([ sig.SourmashSignature(e) ],
args.output_unassigned)


def watch(args):
"Build a signature from raw FASTA/FASTQ coming in on stdin, search."
from sourmash_lib.sbtmh import SearchMinHashesFindBest

parser = argparse.ArgumentParser()
parser.add_argument('sbt_name', help='name of SBT to search')
Expand Down Expand Up @@ -1053,7 +1049,7 @@ def watch(args):
moltype = 'protein'
is_protein = True

tree = sourmash_lib.load_sbt_index(args.sbt_name)
tree = load_sbt_index(args.sbt_name)

# check ksize from the SBT we are loading
ksize = args.ksize
Expand All @@ -1062,8 +1058,7 @@ def watch(args):
tree_mh = leaf.data.minhash
ksize = tree_mh.ksize

E = sourmash_lib.MinHash(ksize=ksize, n=args.num_hashes,
is_protein=is_protein)
E = MinHash(ksize=ksize, n=args.num_hashes, is_protein=is_protein)
streamsig = sig.SourmashSignature(E, filename='stdin', name=args.name)

notify('Computing signature for k={}, {} from stdin', ksize, moltype)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions sourmash_lib/lca/__main__.py → sourmash/lca/__main__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
"""
Command-line entry point for 'python -m sourmash_lib.lca'
Command-line entry point for 'python -m sourmash.lca'
"""

import sys
import argparse

from . import classify, index, summarize_main, rankinfo_main, gather_main
from .command_compare_csv import compare_csv
from sourmash_lib.logging import set_quiet, error
from ..logging import set_quiet, error

usage='''
sourmash lca <command> [<args>] - work with taxonomic information.
Expand Down
Loading