v0.13.0
[0.13.0] - 2021-03-15
API-breaking changes:
The Reader
class has been completely rewritten.
A couple methods have been removed, while others have been renamed.
For methods that remain (renamed or not),
their behavior for output data structure and arguments allowed has been changed.
The details are in the following.
Added
- New classmethods of
Reader
for reader instantiation:from_zip
from_dir
- New classes to better structure CHAT data:
Utterance
Token
Gra
- New Reader methods:
append_left
,extend
,extend_left
,pop
,pop_left
tokens
(which givesToken
objects, essentially the "tagged words" from before)
- In the header dictionary, each participant's info has the new key
"dob"
for date of birth (if the info is available in the CHAT header).
The corresponding value is adatetime.date
object.
(The same info was previously exposed as theReader
methoddate_of_birth
,
now removed.) - The test suite now covers code snippets in both the docstrings and
.rst
doc files.
Changed
- CHAT parsing in
Reader
instantiation has been completely rewritten.
The previous private class_SingleReader
has been removed.
This private class duplicated a lot of theReader
code,
which made it hard to make changes. - The
Reader
rewrite has also greatly sped up the reading and parsing of CHAT data. - The
by_files
argument, which manyReader
methods has,
now gives you a simpler list of results for each data file,
no longer the previous output of a dict that mapped a file path to the file's
result. - The
participant
argument, which manyReader
methods has for specifying
which participants' data to include in the output, has been renamed as
participants
to avoid confusion. There is no change to its behavior of
handling either a single string (e.g.,"CHI"
) or a collection of strings
(e.g.,{"CHI", "MOT"}
) . - The following
Reader
methods have been renamed as indicated,
some for stylistic or Pythonic reasons, others for reasons as given:age
->ages
number_of_utterances
->n_utterances
number_of_files
->n_files
filenames
->file_paths
MLU
->mlu
MLUm
->mlum
MLUw
->mluw
TTR
->ttr
IPSyn
->ipsyn
word_frequency
->word_frequencies
from_chat_str
->from_strs
from_chat_files
->from_files
add
->append
.
Since the data files in aReader
have a natural ordering (by time of
recording sessions, and therefore commonly by file paths as well),
a reader is list-like rather than an unordered set of data files,
whichadd
would suggest.participant_codes
->participants
.
Before this version, the methodsparticipant_codes
(for CHI, MOT, etc) and
participants
(for, say, Eve, Mother, Investigator, etc) co-existed,
but in practice we mostly only care about CHI, MOT, etc.
So the methodparticipants
for Eve etc has been removed,
andparticipant_codes
has been renamed asparticipants
.
- Each participant's info in a header dictionary has these keys renamed:
participant_name
->name
participant_role
->role
SES
->ses
(socioeconomic status)
- The class
DependencyGraph
has been made private
(i.e., now_DependencyGraph
with a leading underscore).
Its functionality hasn't really changed (it's used in the computation of IPSyn).
It may be made more visible again in the future if more functionality
related to grammatical relations is developed in the package. - Switched to sphinx-rtd-theme as the documentation theme.
- Switched to CircleCI orbs; update dev requirements' versions.
Deprecated
- The following Reader methods have been deprecated:
tagged_sents
(usetokens
withby_utterances=True
instead)tagged_words
(usetokens
withby_utterances=False
instead)sents
(usewords
withby_utterances=True
instead)
Removed
- The following methods of the
Reader
class have been removed:abspath
. Usefile_paths
instead.index_to_tiers
. All the unparsed tiers are now available fromutterances
.participant_codes
. It's been renamed asparticipants
, another method now removed; see "Changed" above.part_of_speech_tags
update
andremove
. A reader is a list-like collection of CHAT data files,
not a set (whichupdate
andremove
would suggest).search
andconcordance
. To search, use one of
thewords
,tokens
, andutterances
methods to walk through a reader's CHAT data
and keep track of elements of interest.date_of_birth
. The info is now available underheaders
, in each participant's
"dob"
key.
Fixed
- Handled
[/-]
in cleaning utterances. [x <number>]
means a repetition of the previous word/item, not repetition
of the entire utterance.