Skip to content

Commit

Permalink
migrate to exporthelpers; use a proper PIP package
Browse files Browse the repository at this point in the history
  • Loading branch information
karlicoss committed Dec 4, 2020
1 parent dfa802e commit e93ec39
Show file tree
Hide file tree
Showing 14 changed files with 332 additions and 59 deletions.
24 changes: 24 additions & 0 deletions .ci/run
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash -eu

cd "$(dirname "$0")"
cd ..

if ! command -v sudo; then
# CI or Docker sometimes don't have it, so useful to have a dummy
function sudo {
"$@"
}
fi

if ! [ -z "$CI" ]; then
# install OS specific stuff here
if [[ "$OSTYPE" == "darwin"* ]]; then
# macos
:
else
:
fi
fi

pip3 install --user tox
tox
48 changes: 48 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# see https://github.com/karlicoss/pymplate for up-to-date reference

name: CI
on:
push:
branches: '*'
tags: 'v[0-9]+.*' # only trigger on 'release' tags for PyPi
# Ideally I would put this in the pypi job... but github syntax doesn't allow for regexes there :shrug:
# P.S. fuck made up yaml DSLs.
# TODO cron?
workflow_dispatch: # needed to trigger workflows manually

env:
# useful for scripts & sometimes tests to know
CI: true

jobs:
build:
strategy:
matrix:
platform: [ubuntu-latest]
python-version: [3.6, 3.7, 3.8]

runs-on: ${{ matrix.platform }}

steps:
# ugh https://github.com/actions/toolkit/blob/main/docs/commands.md#path-manipulation
- run: echo "$HOME/.local/bin" >> $GITHUB_PATH

- uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

- uses: actions/checkout@v2
with:
submodules: recursive

# uncomment for SSH debugging
# - uses: mxschmitt/action-tmate@v2

- run: .ci/run

- uses: actions/upload-artifact@v2
with:
name: .coverage.mypy
path: .coverage.mypy/
# restrict to a single python version, otherwise uploading fails
if: ${{ matrix.python-version == '3.8' }}
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "src/stexport/exporthelpers"]
path = src/stexport/exporthelpers
url = https://github.com/karlicoss/exporthelpers.git
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2020 Dmitrii Gerasimov

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
78 changes: 78 additions & 0 deletions README.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#+begin_src python :dir src :results drawer :exports results
import stexport.export as E; return E.make_parser().prog
#+end_src

#+RESULTS:
:results:
Export your personal Stackexchange data
:end:


* Setting up
1. The easiest way is =pip3 install --user git+https://github.com/karlicoss/stexport=.

Alternatively, use =git clone --recursive=, or =git pull && git submodules update --init=. After that, you can use =pip3 install --editable=.
2. See [[https://meta.stackexchange.com/questions/261829/where-i-can-get-my-access-token-key-for-the-api][this]] for info on getting application =key= and =access_token=
# TODO hmm, do we need user access token at all? not sure
# key is probably needed to have more queries
# TODO I have some notes on getting the token in my private secrets.py file

* Exporting

#+begin_src python :dir src :results drawer :exports results
import stexport.export as E; return E.make_parser().epilog
#+end_src

#+RESULTS:
:results:

Usage:

*Recommended*: create =secrets.py= keeping your api parameters, e.g.:


: key = "KEY"
: access_token = "ACCESS_TOKEN"
: user_id = "USER_ID"


After that, use:

: python3 -m stexport.export --secrets /path/to/secrets.py

That way you type less and have control over where you keep your plaintext secrets.

*Alternatively*, you can pass parameters directly, e.g.

: python3 -m stexport.export --key <key> --access_token <access_token> --user_id <user_id>

However, this is verbose and prone to leaking your keys/tokens/passwords in shell history.



I *highly* recommend checking exported files at least once just to make sure they contain everything you expect from your export. If not, please feel free to ask or raise an issue!

:end:


* Using data

#+begin_src python :dir src :results drawer :exports results
import stexport.exporthelpers.dal_helper as D; return D.make_parser().epilog
#+end_src

#+RESULTS:
:results:

You can use =stexport.dal= (stands for "Data Access/Abstraction Layer") to access your exported data, even offline.
I elaborate on motivation behind it [[https://beepb00p.xyz/exports.html#dal][here]].

- main usecase is to be imported as python module to allow for *programmatic access* to your data.

You can find some inspiration in [[https://beepb00p.xyz/mypkg.html][=my.=]] package that I'm using as an API to all my personal data.

- to test it against your export, simply run: ~python3 -m stexport.dal --source /path/to/export~

- you can also try it interactively: ~python3 -m stexport.dal --source /path/to/export --interactive~

:end:
10 changes: 10 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[mypy]
pretty = True
show_error_context = True
show_error_codes = True
check_untyped_defs = True
namespace_packages = True

# an example of suppressing
# [mypy-my.config.repos.pdfannots.pdfannots]
# ignore_errors = True
8 changes: 8 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[pytest]
# discover files that don't follow test_ naming. Useful to keep tests along with the source code
python_files = *.py
addopts =
--verbose

# otherwise it won't discover doctests
--doctest-modules
60 changes: 60 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# see https://github.com/karlicoss/pymplate for up-to-date reference


from setuptools import setup, find_namespace_packages # type: ignore


def main():
# works with both ordinary and namespace packages
pkgs = find_namespace_packages('src')
pkg = min(pkgs) # lexicographically smallest is the correct one usually?
setup(
name=pkg,
use_scm_version={
'version_scheme': 'python-simplified-semver',
'local_scheme': 'dirty-tag',
},
setup_requires=['setuptools_scm'],

zip_safe=False,

packages=pkgs,
package_dir={'': 'src'},
# necessary so that package works with mypy
package_data={pkg: ['py.typed']},

## ^^^ this should be mostly automatic and not requiring any changes

install_requires=[
'stackapi',
'backoff',
# vvv example of git repo dependency
# 'repo @ git+https://github.com/karlicoss/repo.git',

# vvv example of local file dependency. yes, DUMMY is necessary for some reason
# 'repo @ git+file://DUMMY/path/to/repo',
],
extras_require={
'testing': ['pytest'],
'linting': ['pytest', 'mypy', 'lxml'], # lxml for mypy coverage report
},


# this needs to be set if you're planning to upload to pypi
# url='',
# author='',
# author_email='',
# description='',

# Rest of the stuff -- classifiers, license, etc, I don't think it matters for pypi
# it's just unnecessary duplication
)


if __name__ == '__main__':
main()

# TODO
# from setuptools_scm import get_version
# https://github.com/pypa/setuptools_scm#default-versioning-scheme
# get_version(version_scheme='python-simplified-semver', local_scheme='no-local-version')
7 changes: 7 additions & 0 deletions src/stexport/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# NOTE: without __init__.py/__init__.pyi, mypy behaves weird.
# see https://github.com/python/mypy/issues/8584 and the related discussions
# sometime it's kinda valuable to have namespace package and not have __init__.py though,

# TLDR: you're better off having dimmy pyi, or alternatively you can use 'mypy -p src' (but that's a bit dirty?)

# todo not sure how it behaves when installed?
67 changes: 25 additions & 42 deletions model.py → src/stexport/dal.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
#!/usr/bin/env python3
from functools import lru_cache
from pathlib import Path
from typing import NamedTuple, Sequence, Any
from typing import NamedTuple, Sequence, Any, Iterable
from glob import glob
from datetime import datetime
from datetime import datetime, timezone
import json
import logging

from kython import setup_logzero
from .exporthelpers.logging_helper import LazyLogger
from .exporthelpers.dal_helper import Json

import pytz

def get_logger():
return logging.getLogger('stexport')
logger = LazyLogger('stexport')


class Question(NamedTuple):
j: Any
j: Json

# TODO wonder if could use something like attrs to reduce boilerplate
# TODO: e.g. generate baseed on namedtuple schema?
Expand All @@ -35,65 +34,49 @@ def tags(self) -> Sequence[str]:
@property
def creation_date(self) -> datetime:
# all utc https://api.stackexchange.com/docs/dates
return datetime.fromtimestamp(self.j['creation_date'], tz=pytz.utc)
return datetime.fromtimestamp(self.j['creation_date'], tz=timezone.utc)

@property
def link(self) -> str:
return self.j['link']


class SiteModel:
def __init__(self, j):
self.j = j
class SiteDAL(NamedTuple):
j: Json

@property
def questions(self):
def questions(self) -> Iterable[Question]:
return list(sorted(map(Question, self.j['users/{ids}/questions']), key=lambda q: q.creation_date))


class Model:
def __init__(self, sources: Sequence[Path]):
# TODO allow passing multiple later to construct the whole model from chunks
[src] = sources
self.src = src
class DAL:
def __init__(self, sources: Sequence[Path]) -> None:
# TODO later, reconstruct from chunks?
self.src = max(sorted(sources))
self.data = json.loads(self.src.read_text())

def sites(self):
def sites(self) -> Sequence[str]:
return list(sorted(self.data.keys()))

def site_model(self, site: str):
return SiteModel(self.data[site])
def site_dal(self, site: str) -> SiteDAL:
return SiteDAL(self.data[site])


def main():
logger = get_logger()
setup_logzero(logger, level=logging.DEBUG)
import argparse
p = argparse.ArgumentParser()
p.add_argument('--source', type=str, required=True)
p.add_argument('--no-glob', action='store_true')
args = p.parse_args()

if '*' in args.source and not args.no_glob:
sources = glob(args.source)
else:
sources = [args.source]

src = Path(max(sources))

logger.debug('using %s', src)
model = Model([src])

for site in model.sites():
sm = model.site_model(site)
qs = sm.questions
def demo(dal: DAL) -> None:
for site in dal.sites():
sm = dal.site_dal(site)
qs = list(sm.questions)
if len(qs) == 0:
continue
print(f"At {qs}:")
for q in qs:
print(q)


def main() -> None:
from .exporthelpers import dal_helper
dal_helper.main(DAL=DAL, demo=demo)


if __name__ == '__main__':
main()
Loading

0 comments on commit e93ec39

Please sign in to comment.