-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[WIP] C API Introduction
This page contains an initial introduction to the VW C API, including the rationale behind its creation as well as high level design concepts and principles that will need to be followed to maintain a consistent interface.
This page does NOT contain any real function signatures. Any object or function found here should be taken as a pseudocode example used to illustrate specific concepts or principles. Many of the specific designs and patterns the API will use will necessarily be guided by the limitations imposed by the C language
Currently, VW does not have an official API surface; or another way to say it would be that every header file in VW is considered to be a part of the API.
Every language binding in VW binds to and exposes different objects and functions. This results in inconsistent workflows and capabilities across our supported languages. Additionally the python bindings (currently using our C++ interface) has two problems, both of which a rich C interface will solve. The first is that the boost-python binaries need to be installed to compile or run the VW library. The second is a binary incompatibility between the MacOS C++ libraries and Anaconda's python binaries (see issue #2100).
A well-defined API surface will also allow internal code changes to be made without the risk of changing or removing functionality consumers of the library depend on. Finally, a carefully designed C API can potentially allow us to maintain backward ABI compatibility, which would open the possibility of using dynamically loaded libraries for faster client-side deployments. This final point should be considered a stretch goal though, as maintaining a proper ABI requires immense care.
- VW is a library first. The command line tool will be functionality added on top of it
- The only entry point into the core VW library will be the C interface
- At minimum, the following modules will need to be migrated to the new interface:
- All language bindings (including the creation of a new C++ interface, which will bind to the C interface)
- The command line tool
- Any external libraries that use VW
- Any end-to-end tests that currently use any part of the C++ interface
- The following will NOT need to be migrated
- Unit tests
- Possibly some functional tests
- At minimum, the following modules will need to be migrated to the new interface:
- Existing functionality should be allowed as much as possible. Legacy language bindings should be recreated on top of the new API if at all possible
- There may be some existing functionality that is either impossible to replicate or may not make sense anymore. These should be discussed on a case-by-case basis
- The library should own all memory in the following cases
- The memory represents an internal data structure (eg: example)
- A pointer or reference to the memory is saved anywhere, in any form, within the library
- In the case of data that is used strictly as const input parameters (eg: string constants), the caller should own the memory and the library should perform a copy if necessary.
- The C API should be designed for a power user, allowing for maximal functionality and flexibility. Simplified interfaces will be built on top of it.
- Output parameters come at the end of the parameter list
- Parameters that are both input and outputs need not follow this rule. Place them wherever makes the most sense.
- Every API function will return a status code. Function outputs will be returned via out-param
- Language-specific bindings should hide this detail and return errors in a language-idiomatic way
- The library owns all memory associated with opaque types
- The library owns all memory associated with a pointer or reference that is saved within the library
- Deep copies should be made of all const pointer types that need to be saved
- All objects created via a
create
call must be destroyed via adestroy
call- The library will never take ownership of memory created in this way. The objects should be copied if necessary
- The implementation for any API functions in header
<blah>.h
must be in<blah>.cc
Naming conventions should generally follow the GTK coding style
- Object names should be pascal cased and prefixed with VW -- eg:
VWPascalCase
- Function and variable names should be snake cased -- eg:
vw_snake_case
- All function names must be prefixed in the following order:
-
vw
-- eg:vw_workspace
-
create
/destroy
if applicable -- eg:vw_create_workspace
- The component the function is operating on -- eg:
vw_create_workspace
orvw_example_setup
-
get
if applicable -- eg:vw_example_get_feature_space
-
- Functions that allocate and return a pointer to in-library memory must be prefixed with
create
- Functions that free a pointer containing in-library memory must be prefixed with
destroy
- Functions that return a pointer to an internal data structure must be prefixed with
get
The language features allowed under the standard C specifications are very limited, and may be surprising to the typical C++ developer. Listed below are some of the C++ features that are not available in C.
- Object-oriented functionality
- Private member variables
- Member functions
- Function pointers are allowed
- Inheritance, polymorphism, or any form of encapsulation
- Function overloading
- Function names cannot be the same regardless of the type or number of arguments
- References
- Pointers must be used instead
- Default parameter values
- Namespaces
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: