Skip to content

Releases: JSv4/OpenContracts

v2.4.0 - Txt-Based Format Annotator + Style Overhaul

11 Nov 02:46
d587b25
Compare
Choose a tag to compare

This is a pretty significant upgrade vs 2.3.1. We added a number of features:

  1. We now support ingesting, rendering and annotating txt-based formats like plaintext, markdown, etc.
  2. Our document ingestion pipeline has a parser for txt-based formats.
  3. The task decorator for custom tasks will automatically switch from span-based to token-based annotations depending on the underlying format. At the moment this is just pdf vs non-pdf, but could be a richer taxonomy.
  4. Substantial styling improvements.

What's Changed

Full Changelog: v2.3.1...v2.4.0

v2.3.1 - Improved Admin & Annotation Loading for Analyses

20 Sep 22:03
71630c4
Compare
Choose a tag to compare

Two primary improvements in this release:

  1. The admin views have been built out with more filters, raw_id renders (to cut down on M2M and FK pulls), and custom actions - including a custom dropdown action on selected Corpus(es) to make them public.
  2. We were previously loading ALL annotations for an analysis in each document view. First off, that's really inefficient for large corpuses. Second, it meant that the annotator got cluttered with random annotations that weren't actually in the loaded document. Added a filter on the fullAnnotationList prop of AnalysesType to filter to document_id. Updated frontend to only request annotation analyses for opened_document.

What's Changed

Full Changelog: v.2.3.0...v2.3.1

v2.3.0 - Add User Feedback

17 Sep 01:48
ac33c05
Compare
Choose a tag to compare

It is now possible to collect feedback from users on public corpuses where can_comment is set to true. Added some nice GUI enhancements to the labels to support more action buttons - including a cool parabolic spiral button cloud that sprouts from an action zone.

What's Changed

Full Changelog: v2.2.0...v.2.3.0

v2.2.0 - Document UI Overhaul

12 Sep 06:02
1046ae5
Compare
Choose a tag to compare

This release brings an enormous number of frontend improvements and tweaks, primarily focused on unifying the document annotation and viewer components into a single component that has a single, clean workflow for viewing different extracts and analyses for a given document.

What's Changed

Full Changelog: v2.1.0...v2.2.0

v2.1.0 - Corpus Actions

27 Aug 03:34
808050f
Compare
Choose a tag to compare

TLDR

This release brings the addition of CorpusActions, GitHub Action-style automatic analyzers or data extractors that run when a document is uploaded. See more here.

What's Changed

Full Changelog: v2.0.0...v2.1.0

v2.0.0.post1 - Post 2.0.0 Fixes

30 Jul 02:45
1a584dd
Compare
Choose a tag to compare

Upgrade Dependencies

The upgrade from Django 3.2* to 4.2.* introduced a syntax change in the management command that caused two django app dependencies to break. In the process of upgrading these, some other dependency issues cropped up.

This release:

  1. Upgrades django app dependencies for full Django 4.2.* compatibility
  2. Upgrades opencv and related dependencies
  3. Introduces additional test cases to improve test coverage.

What's Changed

  • Upgrade Django App Dependencies to work with Django LTS by @JSv4 in #172

Full Changelog: v2.0.0...v2.0.0.post1

v2.0.0 - Stable Data Extract Release

27 Jul 06:01
d26b78c
Compare
Choose a tag to compare

This release includes:

  1. A table-based data extract interface and related models
  2. Improved test coverage
  3. Upgrade to Django 4.2.* LTS

What's Changed

  • Add Data Extraction by @JSv4 in #117
  • Bump pytest from 6.2.5 to 8.2.2 by @dependabot in #126
  • v2 Bugfixes by @JSv4 in #128
  • Bump actions/upload-artifact from 3 to 4 by @dependabot in #123
  • Bump actions/setup-node from 3 to 4 by @dependabot in #121
  • Bump actions/checkout from 3.3.0 to 4.1.7 by @dependabot in #120
  • Better Docs and Modular Extract Tasks by @JSv4 in #130
  • Bump actions/setup-python from 4 to 5 by @dependabot in #122
  • Improve Docs and Diagrams by @JSv4 in #131
  • Add Testing Docs by @JSv4 in #132
  • Update Production Compose by @JSv4 in #136
  • Fix Injection of Configurations into Frontend from Env Variables by @JSv4 in #137
  • Fix GUI Bugs by @JSv4 in #138
  • Create Funding.yaml by @JSv4 in #142
  • Update README.md by @JSv4 in #143
  • File inspection and Mimetype Limits on Document Upload Mutation. by @JSv4 in #144
  • Bump traefik from 2.9.6 to 3.0.4 in /compose/production/traefik by @dependabot in #133
  • Use Default Icon for Labelset Where None Provided by @JSv4 in #146
  • Updated Terms of Service and Opening Modal by @JSv4 in #147
  • Install Embeddings Model @ /models in Production Container + Fix Extract Where Search Text is None by @JSv4 in #156
  • Improve Document Selection Workflows by @JSv4 in #157
  • Bump traefik from 3.0.4 to 3.1.0 in /compose/production/traefik by @dependabot in #160
  • Frontend Cleanup by @JSv4 in #163
  • Fix CorpusCards by @JSv4 in #164
  • Fix Corpus Query Source Action by @JSv4 in #165
  • Dynamically Apply OCR, Improve PDF Utilities and Tests by @JSv4 in #167
  • Improve DB Performance with Additional Indexes by @JSv4 in #168
  • Long Poll Documents When Document is Processing by @JSv4 in #169
  • Upgrade Django LTS by @JSv4 in #170

Full Changelog: v1.3.0...v2.0.0

Improved OCR and PDF Parsing

22 Jul 07:33
9092561
Compare
Choose a tag to compare

Some PDF-handling-related improvements:

  1. Merged some nlm-ingestor changes from upstream repo to fix an issue with missing style tags with certain pdfs
  2. Improved test coverage for pdf utils
  3. Turn on OCR dynamically for PDFs that appear to need it, avoiding wasting processing power on all PDFs while preventing text-less PDFs when OCR is required.

Also some minor GUI bug-fixes

v2.0.0 b2 - Improved Documentation and Modular Data Extract

23 Jun 16:38
726f9dd
Compare
Choose a tag to compare

Features:

  • The data extract tasks are now dynamically loaded and can be applied on a column-by-column basis. So, you can write very specific extract logic for a given column / data field. Newly-registered tasks are displayed automatically on the frontend and can be selected by the user when building a fieldset for a datagrid.
  •  Add a search to the Extracts view and improved various load and performance issues.
  • Removed the LanguageModel model as it's almost completely subsumed by the ability to create custom extract pipelines. Moreover, it wasn't really doing anything before.
  • Expanded our docs and tutorials to explain how data extract works and walk someone through writing a custom data extract task.

What's Changed

Full Changelog: v2.0.0b1...v2.0.0.b2

v2.0.0 b1 - Add Data Extract and Corpus Querying

19 Jun 15:45
f55cdcf
Compare
Choose a tag to compare

2.0.0 Beta 1

Added Grid-based Data Extraction and Corpus Querying

This update extends the analytical capabilities of the application, allowing for automated and background extraction of structured data from documents, improving efficiency and scalability.

We've added a couple models on the backend:

Extract: Represents a headless, background annotation task linked to a Corpus and Fieldset.
Fieldset: Defines a reusable set of fields for Extracts, linked to Columns.
Column: Represents a discrete data structure to extract from a document, with various properties like query, match_text, output_type, and more.
Datacell: Represents extracted data for each column and document, storing data as JSON.
LanguageModel: Represents a language model to be used in the extraction process.

Improved Test Suite

  • LlamaIndex is being tested with vcr.py so we actually have realistic tests and mocks for corpus query and corpus extract tasks
  • Added a lot of graphql query and endpoint tests

New GUI Elements

  • There is now an extract tab and a number of GUI elements to make it easy to construct an extract grid made up of documents, corpora and re-usable columns.
  • Within the Corpus view, there is a query tab you can use to ask questions of the corpus

What's Changed

Full Changelog: v1.3.0...v2.0.0b1