11 Nov 02:46

JSv4

d587b25

v2.4.0 - Txt-Based Format Annotator + Style Overhaul Latest

Latest

This is a pretty significant upgrade vs 2.3.1. We added a number of features:

We now support ingesting, rendering and annotating txt-based formats like plaintext, markdown, etc.
Our document ingestion pipeline has a parser for txt-based formats.
The task decorator for custom tasks will automatically switch from span-based to token-based annotations depending on the underlying format. At the moment this is just pdf vs non-pdf, but could be a richer taxonomy.
Substantial styling improvements.

What's Changed

Bump pytest from 8.2.2 to 8.3.3 by @dependabot in #227
Bump pytz from 2022.7 to 2024.2 by @dependabot in #226
Bump psycopg2 from 2.9.5 to 2.9.9 by @dependabot in #229
Bump traefik from 3.1.4 to 3.1.5 in /compose/production/traefik by @dependabot in #232
Bump actions/checkout from 4.1.7 to 4.2.0 by @dependabot in #231
Bump cryptography from 43.0.0 to 43.0.1 by @dependabot in #228
Bump traefik from 3.1.5 to 3.1.6 in /compose/production/traefik by @dependabot in #238
Bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #236
Add Txt Annotator by @JSv4 in #233

Full Changelog: v2.3.1...v2.4.0

Contributors

JSv4 and dependabot

Assets 2

20 Sep 22:03

JSv4

v2.3.1

71630c4

v2.3.1 - Improved Admin & Annotation Loading for Analyses

Two primary improvements in this release:

The admin views have been built out with more filters, raw_id renders (to cut down on M2M and FK pulls), and custom actions - including a custom dropdown action on selected Corpus(es) to make them public.
We were previously loading ALL annotations for an analysis in each document view. First off, that's really inefficient for large corpuses. Second, it meant that the annotator got cluttered with random annotations that weren't actually in the loaded document. Added a filter on the fullAnnotationList prop of AnalysesType to filter to document_id. Updated frontend to only request annotation analyses for opened_document.

What's Changed

Bump traefik from 3.1.2 to 3.1.3 in /compose/production/traefik by @dependabot in #217
Bump pillow from 9.4.0 to 10.4.0 by @dependabot in #186
Bump djangorestframework from 3.14.0 to 3.15.2 by @dependabot in #214
Bump gunicorn from 20.1.0 to 23.0.0 by @dependabot in #194
Improve Admin Views by @JSv4 in #219
Bump traefik from 3.1.3 to 3.1.4 in /compose/production/traefik by @dependabot in #225
Bump mypy from 1.11.1 to 1.11.2 by @dependabot in #223
Bump drf-extra-fields from 3.4.1 to 3.7.0 by @dependabot in #221

Full Changelog: v.2.3.0...v2.3.1

Contributors

JSv4 and dependabot

Assets 2

17 Sep 01:48

JSv4

v.2.3.0

ac33c05

v2.3.0 - Add User Feedback

It is now possible to collect feedback from users on public corpuses where can_comment is set to true. Added some nice GUI enhancements to the labels to support more action buttons - including a cool parabolic spiral button cloud that sprouts from an action zone.

What's Changed

Add User Feedback by @JSv4 in #216

Full Changelog: v2.2.0...v.2.3.0

Contributors

JSv4

Assets 2

12 Sep 06:02

JSv4

v2.2.0

1046ae5

v2.2.0 - Document UI Overhaul

This release brings an enormous number of frontend improvements and tweaks, primarily focused on unifying the document annotation and viewer components into a single component that has a single, clean workflow for viewing different extracts and analyses for a given document.

What's Changed

Finalize 2.1 by @JSv4 in #200
Bump crispy-bootstrap5 from 0.7 to 2024.2 by @dependabot in #196
Bump redis from 4.5.1 to 5.0.8 by @dependabot in #201
Bump pytest-django from 4.5.2 to 4.9.0 by @dependabot in #204
Bump django-debug-toolbar from 3.7.0 to 4.4.6 by @dependabot in #203
Enhancement: Sane, Smooth UX for Document-Based Workflows by @JSv4 in #206

Full Changelog: v2.1.0...v2.2.0

Contributors

JSv4 and dependabot

Assets 2

27 Aug 03:34

JSv4

v2.1.0

808050f

v2.1.0 - Corpus Actions

TLDR

This release brings the addition of CorpusActions, GitHub Action-style automatic analyzers or data extractors that run when a document is uploaded. See more here.

What's Changed

Upgrade Django App Dependencies to work with Django LTS by @JSv4 in #172
Add Document Analysis Row by @JSv4 in #175
Bump django from 4.2.14 to 4.2.15 by @dependabot in #180
Bump flake8-isort from 6.0.0 to 6.1.1 by @dependabot in #181
Bump pytest-cov from 4.0.0 to 5.0.0 by @dependabot in #182
Bump cryptography from 38.0.1 to 43.0.0 by @dependabot in #184
Bump traefik from 3.1.0 to 3.1.2 in /compose/production/traefik by @dependabot in #179
Bump django-crispy-forms from 1.14.0 to 2.3 by @dependabot in #166
Add Corpus Actions by @JSv4 in #183
Bump pylint-django from 2.5.3 to 2.5.5 by @dependabot in #129
Bump flower from 1.0.0 to 2.0.1 by @dependabot in #125
Bump django-coverage-plugin from 2.0.3 to 3.1.0 by @dependabot in #190
Bump werkzeug from 2.2.2 to 3.0.3 by @dependabot in #188
Bump celery from 5.2.7 to 5.4.0 by @dependabot in #187
Bump python-slugify from 6.1.2 to 8.0.4 by @dependabot in #192
Bump ipdb from 0.13.9 to 0.13.13 by @dependabot in #189
Bump mypy from 0.991 to 1.11.1 by @dependabot in #191
Bump marvin from 2.3.4 to 2.3.7 by @dependabot in #195
Improved doc analyzer task decorator to do more I/O handling by @JSv4 in #185
Bump factory-boy from 3.2.1 to 3.3.1 by @dependabot in #197
Added Sample Doc Action Task and Cleanup Task Execution by @JSv4 in #198
Bump coverage from 6.5.0 to 7.6.1 by @dependabot in #199

Full Changelog: v2.0.0...v2.1.0

Contributors

JSv4 and dependabot

Assets 2

30 Jul 02:45

JSv4

v2.0.0.post1

1a584dd

v2.0.0.post1 - Post 2.0.0 Fixes

Upgrade Dependencies

The upgrade from Django 3.2* to 4.2.* introduced a syntax change in the management command that caused two django app dependencies to break. In the process of upgrading these, some other dependency issues cropped up.

This release:

Upgrades django app dependencies for full Django 4.2.* compatibility
Upgrades opencv and related dependencies
Introduces additional test cases to improve test coverage.

What's Changed

Upgrade Django App Dependencies to work with Django LTS by @JSv4 in #172

Full Changelog: v2.0.0...v2.0.0.post1

Contributors

JSv4

Assets 2

27 Jul 06:01

JSv4

v2.0.0

d26b78c

v2.0.0 - Stable Data Extract Release

This release includes:

A table-based data extract interface and related models
Improved test coverage
Upgrade to Django 4.2.* LTS

What's Changed

Add Data Extraction by @JSv4 in #117
Bump pytest from 6.2.5 to 8.2.2 by @dependabot in #126
v2 Bugfixes by @JSv4 in #128
Bump actions/upload-artifact from 3 to 4 by @dependabot in #123
Bump actions/setup-node from 3 to 4 by @dependabot in #121
Bump actions/checkout from 3.3.0 to 4.1.7 by @dependabot in #120
Better Docs and Modular Extract Tasks by @JSv4 in #130
Bump actions/setup-python from 4 to 5 by @dependabot in #122
Improve Docs and Diagrams by @JSv4 in #131
Add Testing Docs by @JSv4 in #132
Update Production Compose by @JSv4 in #136
Fix Injection of Configurations into Frontend from Env Variables by @JSv4 in #137
Fix GUI Bugs by @JSv4 in #138
Create Funding.yaml by @JSv4 in #142
Update README.md by @JSv4 in #143
File inspection and Mimetype Limits on Document Upload Mutation. by @JSv4 in #144
Bump traefik from 2.9.6 to 3.0.4 in /compose/production/traefik by @dependabot in #133
Use Default Icon for Labelset Where None Provided by @JSv4 in #146
Updated Terms of Service and Opening Modal by @JSv4 in #147
Install Embeddings Model @ /models in Production Container + Fix Extract Where Search Text is None by @JSv4 in #156
Improve Document Selection Workflows by @JSv4 in #157
Bump traefik from 3.0.4 to 3.1.0 in /compose/production/traefik by @dependabot in #160
Frontend Cleanup by @JSv4 in #163
Fix CorpusCards by @JSv4 in #164
Fix Corpus Query Source Action by @JSv4 in #165
Dynamically Apply OCR, Improve PDF Utilities and Tests by @JSv4 in #167
Improve DB Performance with Additional Indexes by @JSv4 in #168
Long Poll Documents When Document is Processing by @JSv4 in #169
Upgrade Django LTS by @JSv4 in #170

Full Changelog: v1.3.0...v2.0.0

Contributors

JSv4 and dependabot

Assets 2

22 Jul 07:33

JSv4

v2.0.0.b3

9092561

Improved OCR and PDF Parsing

Some PDF-handling-related improvements:

Merged some nlm-ingestor changes from upstream repo to fix an issue with missing style tags with certain pdfs
Improved test coverage for pdf utils
Turn on OCR dynamically for PDFs that appear to need it, avoiding wasting processing power on all PDFs while preventing text-less PDFs when OCR is required.

Also some minor GUI bug-fixes

Assets 2

23 Jun 16:38

JSv4

v2.0.0.b2

726f9dd

v2.0.0 b2 - Improved Documentation and Modular Data Extract

Features:

The data extract tasks are now dynamically loaded and can be applied on a column-by-column basis. So, you can write very specific extract logic for a given column / data field. Newly-registered tasks are displayed automatically on the frontend and can be selected by the user when building a fieldset for a datagrid.
Add a search to the Extracts view and improved various load and performance issues.
Removed the LanguageModel model as it's almost completely subsumed by the ability to create custom extract pipelines. Moreover, it wasn't really doing anything before.
Expanded our docs and tutorials to explain how data extract works and walk someone through writing a custom data extract task.

What's Changed

Bump pytest from 6.2.5 to 8.2.2 by @dependabot in #126
v2 Bugfixes by @JSv4 in #128
Bump actions/upload-artifact from 3 to 4 by @dependabot in #123
Bump actions/setup-node from 3 to 4 by @dependabot in #121
Bump actions/checkout from 3.3.0 to 4.1.7 by @dependabot in #120
Better Docs and Modular Extract Tasks by @JSv4 in #130
Bump actions/setup-python from 4 to 5 by @dependabot in #122

Full Changelog: v2.0.0b1...v2.0.0.b2

Contributors

JSv4 and dependabot

Assets 2

19 Jun 15:45

JSv4

v2.0.0b1

f55cdcf

v2.0.0 b1 - Add Data Extract and Corpus Querying

2.0.0 Beta 1

Added Grid-based Data Extraction and Corpus Querying

This update extends the analytical capabilities of the application, allowing for automated and background extraction of structured data from documents, improving efficiency and scalability.

We've added a couple models on the backend:

Extract: Represents a headless, background annotation task linked to a Corpus and Fieldset.
Fieldset: Defines a reusable set of fields for Extracts, linked to Columns.
Column: Represents a discrete data structure to extract from a document, with various properties like query, match_text, output_type, and more.
Datacell: Represents extracted data for each column and document, storing data as JSON.
LanguageModel: Represents a language model to be used in the extraction process.

Improved Test Suite

LlamaIndex is being tested with vcr.py so we actually have realistic tests and mocks for corpus query and corpus extract tasks
Added a lot of graphql query and endpoint tests

New GUI Elements

There is now an extract tab and a number of GUI elements to make it easy to construct an extract grid made up of documents, corpora and re-usable columns.
Within the Corpus view, there is a query tab you can use to ask questions of the corpus

What's Changed

Add Data Extraction by @JSv4 in #117

Full Changelog: v1.3.0...v2.0.0b1

Contributors

JSv4

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

TLDR

What's Changed

Contributors

Upgrade Dependencies

What's Changed

Contributors

This release includes:

What's Changed

Contributors

Some PDF-handling-related improvements:

Also some minor GUI bug-fixes

What's Changed

Contributors

2.0.0 Beta 1

We've added a couple models on the backend:

Improved Test Suite

New GUI Elements

What's Changed

Contributors

Releases: JSv4/OpenContracts

v2.4.0 - Txt-Based Format Annotator + Style Overhaul

What's Changed

Contributors

v2.3.1 - Improved Admin & Annotation Loading for Analyses

What's Changed

Contributors

v2.3.0 - Add User Feedback

What's Changed

Contributors

v2.2.0 - Document UI Overhaul

What's Changed

Contributors

v2.1.0 - Corpus Actions

TLDR

What's Changed

Contributors

v2.0.0.post1 - Post 2.0.0 Fixes

Upgrade Dependencies

What's Changed

Contributors

v2.0.0 - Stable Data Extract Release

This release includes:

What's Changed

Contributors

Improved OCR and PDF Parsing

Some PDF-handling-related improvements:

Also some minor GUI bug-fixes

v2.0.0 b2 - Improved Documentation and Modular Data Extract

What's Changed

Contributors

v2.0.0 b1 - Add Data Extract and Corpus Querying

2.0.0 Beta 1

We've added a couple models on the backend:

Improved Test Suite

New GUI Elements

What's Changed

Contributors