Skip to content

Commit

Permalink
[MRG] Rename library to sourmash (#374)
Browse files Browse the repository at this point in the history
* Change library name to sourmash instead of sourmash_lib

  * sourmash_lib is still available for older scripts
  * Remove the local sourmash script
  * Fix tests that expected the local sourmash script

* Fix test discovery

* Fix tox dependencies

* add cpp file to gitignore

* Fix sourmash_lib import in lca

* Tests passing

* avoid using import sourmash inside the module code, use . imports instead

* Avoid changing current tests

* Remove redundant import
  • Loading branch information
luizirber authored and ctb committed Mar 10, 2018
1 parent 7067827 commit 12ef723
Show file tree
Hide file tree
Showing 38 changed files with 143 additions and 141 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ _minhash.so
*.so
.coverage
sourmash_lib/_minhash.cpp
sourmash/_minhash.cpp
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include LICENSE Makefile Dockerfile LICENSE Makefile README.md requirements.txt
include index.ipynb
recursive-include sourmash_lib *
recursive-include sourmash *
recursive-include third-party *.cc *.h
global-exclude *.orig
global-exclude *.pyc
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Development happens on github at

`sourmash` is the main command-line entry point; run it for help.

`sourmash_lib/` contains the library code.
`sourmash/` contains the library code.

Tests require py.test and can be run with `make test`.

Expand Down
10 changes: 5 additions & 5 deletions doc/api-example.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ Define two sequences:
Create two minhashes using 3-mers, and add the sequences:

```
>>> import sourmash_lib
>>> E1 = sourmash_lib.MinHash(n=20, ksize=3)
>>> import sourmash
>>> E1 = sourmash.MinHash(n=20, ksize=3)
>>> E1.add_sequence(seq1)
```

```
>>> E2 = sourmash_lib.MinHash(n=20, ksize=3)
>>> E2 = sourmash.MinHash(n=20, ksize=3)
>>> E2.add_sequence(seq2)
```
Expand Down Expand Up @@ -81,7 +81,7 @@ raising an exception.
>>> import screed
>>> minhashes = []
>>> for g in genomes:
... E = sourmash_lib.MinHash(n=500, ksize=31)
... E = sourmash.MinHash(n=500, ksize=31)
... for record in screed.open(g):
... E.add_sequence(record.sequence[:50000], True)
... minhashes.append(E)
Expand Down Expand Up @@ -110,7 +110,7 @@ making the minhashes, which can be saved and loaded easily.
## Saving and loading signature files

```
>>> from sourmash_lib import signature
>>> from sourmash import signature
>>> sig1 = signature.SourmashSignature(minhashes[0], name=genomes[0][:20])
>>> with open('/tmp/genome1.sig', 'wt') as fp:
... signature.save_signatures([sig1], fp)
Expand Down
10 changes: 5 additions & 5 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,24 @@ underlying C++ code, but for now this is undocumented.)
``MinHash``: basic MinHash sketch functionality
===============================================

.. automodule:: sourmash_lib
.. automodule:: sourmash
:members:

``SourmashSignature``: save and load MinHash sketches in JSON
=============================================================

.. automodule:: sourmash_lib.signature
.. automodule:: sourmash.signature
:members:

``SBT``: save and load Sequence Bloom Trees in JSON
=============================================================

.. automodule:: sourmash_lib.sbt
.. automodule:: sourmash.sbt
:members: GraphFactory, Node, NodePos, SBT, Leaf
:undoc-members:

``sourmash_lib.fig``: make plots and figures
``sourmash.fig``: make plots and figures
============================================

.. automodule:: sourmash_lib.fig
.. automodule:: sourmash.fig
:members:
10 changes: 5 additions & 5 deletions doc/release.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ cd sourmash
pip uninstall -y sourmash; pip uninstall -y sourmash; make install
mkdir ../not-sourmash # if there is a subdir named 'sourmash' py.test will execute tests
# there instead of the installed sourmash module's tests
pushd ../not-sourmash; py.test --pyargs sourmash_lib; popd
pushd ../not-sourmash; py.test --pyargs sourmash; popd
# Secondly we test via pip
Expand All @@ -63,7 +63,7 @@ cd sourmash
cp dist/sourmash*tar.gz ../../../testenv3/
pip uninstall -y sourmash; pip uninstall -y sourmash; make install
cd ../.. # no subdir named sourmash here, safe for testing installed sourmash module
py.test --pyargs sourmash_lib
py.test --pyargs sourmash
# Is the distribution in testenv2 complete enough to build another
# functional distribution?
Expand All @@ -80,7 +80,7 @@ cd sourmash
make test
pip uninstall -y sourmash; pip uninstall -y sourmash; make install
mkdir ../not-sourmash
pushd ../not-sourmash ; py.test --pyargs sourmash_lib ; popd
pushd ../not-sourmash ; py.test --pyargs sourmash ; popd
```
4\. Publish the new release on the testing PyPI server. You will need
to change your PyPI credentials as documented here:
Expand All @@ -100,7 +100,7 @@ cd sourmash
# install as much as possible from non-test server!
pip install screed pytest numpy matplotlib scipy
pip install -i https://testpypi.python.org/pypi --pre --no-clean sourmash
py.test --pyargs sourmash_lib
py.test --pyargs sourmash
```
5\. Do any final testing:

Expand Down Expand Up @@ -144,5 +144,5 @@ so:
```
apt-cache update && apt-get -y install python-dev libfreetype6-dev && \
pip install sourmash[test] && py.test --pyargs sourmash_lib
pip install sourmash[test] && py.test --pyargs sourmash
```
6 changes: 3 additions & 3 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[pytest]
addopts = --doctest-glob='doc/*.md'
python_files = sourmash_lib/*.py tests/*.py
norecursedirs = utils build buildenv .tox .asv
addopts = --doctest-glob='doc/*.md' --ignore=setup.py
python_files = sourmash/*.py tests/*.py
norecursedirs = utils build buildenv .tox .asv .eggs
20 changes: 10 additions & 10 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
from __future__ import print_function
import sys
from setuptools import setup
from setuptools import setup, find_packages
from setuptools import Extension
import os

# retrieve VERSION from sourmash_lib/VERSION.
# retrieve VERSION from sourmash/VERSION.
thisdir = os.path.dirname(__file__)
version_file = open(os.path.join(thisdir, 'sourmash_lib', 'VERSION'))
version_file = open(os.path.join(thisdir, 'sourmash', 'VERSION'))
VERSION = version_file.read().strip()

EXTRA_COMPILE_ARGS = ['-std=c++11', '-pedantic']
Expand Down Expand Up @@ -51,16 +51,16 @@
"author": "C. Titus Brown",
"author_email": "[email protected]",
"license": "BSD 3-clause",
"packages": ["sourmash_lib"],
"packages": find_packages(),
"entry_points": {'console_scripts': [
'sourmash = sourmash_lib.__main__:main'
'sourmash = sourmash.__main__:main'
]
},
"ext_modules": [Extension("sourmash_lib._minhash",
sources=["sourmash_lib/_minhash.pyx",
"ext_modules": [Extension("sourmash._minhash",
sources=["sourmash/_minhash.pyx",
"third-party/smhasher/MurmurHash3.cc"],
depends=["sourmash_lib/kmer_min_hash.hh"],
include_dirs=["./sourmash_lib",
depends=["sourmash/kmer_min_hash.hh"],
include_dirs=["./sourmash",
"./third-party/smhasher/"],
language="c++",
extra_compile_args=EXTRA_COMPILE_ARGS,
Expand All @@ -74,7 +74,7 @@
},
"include_package_data": True,
"package_data": {
"sourmash_lib": ['*.pxd']
"sourmash": ['*.pxd']
},
"classifiers": CLASSIFIERS
}
Expand Down
5 changes: 0 additions & 5 deletions sourmash

This file was deleted.

File renamed without changes.
26 changes: 26 additions & 0 deletions sourmash/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#! /usr/bin/env python
"""
An implementation of a MinHash bottom sketch, applied to k-mers in DNA.
"""
from __future__ import print_function
import re
import math
import os

from ._minhash import (MinHash, get_minhash_default_seed, get_minhash_max_hash)
DEFAULT_SEED = get_minhash_default_seed()
MAX_HASH = get_minhash_max_hash()

from .signature import (load_signatures, load_one_signature, SourmashSignature,
save_signatures)
from .sbtmh import load_sbt_index, search_sbt_index, create_sbt_index
from . import lca
from . import sbt
from . import sbtmh
from . import sbt_storage
from . import signature

# retrieve VERSION from sourmash/VERSION.
thisdir = os.path.dirname(__file__)
version_file = open(os.path.join(thisdir, 'VERSION'))
VERSION = version_file.read().strip()
File renamed without changes.
File renamed without changes.
File renamed without changes.
53 changes: 24 additions & 29 deletions sourmash_lib/commands.py → sourmash/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@
import random

import screed
import sourmash_lib
from . import DEFAULT_SEED, MinHash, load_sbt_index, create_sbt_index
from . import signature as sig
from . import sourmash_args
from .logging import notify, error, print_results, set_quiet
from .sbtmh import SearchMinHashesFindBest, SigLeaf

from .sourmash_args import DEFAULT_LOAD_K
DEFAULT_COMPUTE_K = '21,31,51'
Expand Down Expand Up @@ -90,7 +91,7 @@ def compute(args):
help='choose number of hashes as 1 in FRACTION of input k-mers')
parser.add_argument('--seed', type=int,
help='seed used by MurmurHash (default: 42)',
default=sourmash_lib.DEFAULT_SEED)
default=DEFAULT_SEED)
parser.add_argument('--randomize', action='store_true',
help='shuffle the list of input filenames randomly')
parser.add_argument('--license', default='CC0', type=str,
Expand Down Expand Up @@ -173,18 +174,18 @@ def make_minhashes():
Elist = []
for k in ksizes:
if args.protein:
E = sourmash_lib.MinHash(ksize=k, n=args.num_hashes,
is_protein=True,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
E = MinHash(ksize=k, n=args.num_hashes,
is_protein=True,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
Elist.append(E)
if args.dna:
E = sourmash_lib.MinHash(ksize=k, n=args.num_hashes,
is_protein=False,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
E = MinHash(ksize=k, n=args.num_hashes,
is_protein=False,
track_abundance=args.track_abundance,
scaled=args.scaled,
seed=seed)
Elist.append(E)
return Elist

Expand Down Expand Up @@ -551,7 +552,7 @@ def import_csv(args):
hashes = hashes.strip()
hashes = list(map(int, hashes.split(' ' )))

e = sourmash_lib.MinHash(len(hashes), ksize)
e = MinHash(len(hashes), ksize)
e.add_many(hashes)
s = sig.SourmashSignature(e, filename=name)
siglist.append(s)
Expand Down Expand Up @@ -595,10 +596,10 @@ def sbt_combine(args):
inp_files = list(args.sbts)
notify('combining {} SBTs', len(inp_files))

tree = sourmash_lib.load_sbt_index(inp_files.pop(0))
tree = load_sbt_index(inp_files.pop(0))

for f in inp_files:
new_tree = sourmash_lib.load_sbt_index(f)
new_tree = load_sbt_index(f)
# TODO: check if parameters are the same for both trees!
tree.combine(new_tree)

Expand All @@ -610,8 +611,6 @@ def index(args):
"""
Build an Sequence Bloom Tree index of the given signatures.
"""
import sourmash_lib.sbtmh

parser = argparse.ArgumentParser()
parser.add_argument('sbt_name', help='name to save SBT into')
parser.add_argument('signatures', nargs='+',
Expand Down Expand Up @@ -639,10 +638,9 @@ def index(args):
moltype = sourmash_args.calculate_moltype(args)

if args.append:
tree = sourmash_lib.load_sbt_index(args.sbt_name)
tree = load_sbt_index(args.sbt_name)
else:
tree = sourmash_lib.create_sbt_index(args.bf_size,
n_children=args.n_children)
tree = create_sbt_index(args.bf_size, n_children=args.n_children)

if args.traverse_directory:
inp_files = list(sourmash_args.traverse_find_sigs(args.signatures))
Expand Down Expand Up @@ -670,7 +668,7 @@ def index(args):
nums.add(ss.minhash.num)
scaleds.add(ss.minhash.scaled)

leaf = sourmash_lib.sbtmh.SigLeaf(ss.md5sum(), ss)
leaf = SigLeaf(ss.md5sum(), ss)
tree.add_node(leaf)
n += 1

Expand Down Expand Up @@ -833,7 +831,7 @@ def categorize(args):
for row in r:
already_names.add(row[0])

tree = sourmash_lib.load_sbt_index(args.sbt_name)
tree = load_sbt_index(args.sbt_name)

if args.traverse_directory:
inp_files = set(sourmash_args.traverse_find_sigs(args.queries))
Expand All @@ -852,7 +850,7 @@ def categorize(args):
query_ksize, query_moltype)

results = []
search_fn = sourmash_lib.sbtmh.SearchMinHashesFindBest().search
search_fn = SearchMinHashesFindBest().search

for leaf in tree.find(search_fn, query, args.threshold):
if leaf.data.md5sum() != query.md5sum(): # ignore self.
Expand Down Expand Up @@ -1006,16 +1004,14 @@ def gather(args):
outname = args.output_unassigned.name
notify('saving unassigned hashes to "{}"', outname)

e = sourmash_lib.MinHash(ksize=query.minhash.ksize, n=0,
max_hash=new_max_hash)
e = MinHash(ksize=query.minhash.ksize, n=0, max_hash=new_max_hash)
e.add_many(next_query.minhash.get_mins())
sig.save_signatures([ sig.SourmashSignature(e) ],
args.output_unassigned)


def watch(args):
"Build a signature from raw FASTA/FASTQ coming in on stdin, search."
from sourmash_lib.sbtmh import SearchMinHashesFindBest

parser = argparse.ArgumentParser()
parser.add_argument('sbt_name', help='name of SBT to search')
Expand Down Expand Up @@ -1053,7 +1049,7 @@ def watch(args):
moltype = 'protein'
is_protein = True

tree = sourmash_lib.load_sbt_index(args.sbt_name)
tree = load_sbt_index(args.sbt_name)

# check ksize from the SBT we are loading
ksize = args.ksize
Expand All @@ -1062,8 +1058,7 @@ def watch(args):
tree_mh = leaf.data.minhash
ksize = tree_mh.ksize

E = sourmash_lib.MinHash(ksize=ksize, n=args.num_hashes,
is_protein=is_protein)
E = MinHash(ksize=ksize, n=args.num_hashes, is_protein=is_protein)
streamsig = sig.SourmashSignature(E, filename='stdin', name=args.name)

notify('Computing signature for k={}, {} from stdin', ksize, moltype)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions sourmash_lib/lca/__main__.py → sourmash/lca/__main__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
"""
Command-line entry point for 'python -m sourmash_lib.lca'
Command-line entry point for 'python -m sourmash.lca'
"""

import sys
import argparse

from . import classify, index, summarize_main, rankinfo_main, gather_main
from .command_compare_csv import compare_csv
from sourmash_lib.logging import set_quiet, error
from ..logging import set_quiet, error

usage='''
sourmash lca <command> [<args>] - work with taxonomic information.
Expand Down
Loading

0 comments on commit 12ef723

Please sign in to comment.