-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[WIP] C API Proposal
Note: The names object, function, and parameter names used in this document will not be the final names. The names are simply used as a descriptor for its intended purpose; final names will need to be properly namespaced and C-ified
enum HashType
{
VW_DEFAULT_HASH,
VW_STRING_HASH,
VW_BYTE_HASH
};
enum ErrorCode { /* TBD */ };
struct vw_feature // Don’t expose the internal VW feature struct.
{
float value;
size_t weight_index;
};
struct primitive_feature_space // For manual construction and manipulation of an example's features
{
unsigned char name;
vw_feature* fs;
size_t len;
};
The following types would allow for custom reductions to be plugged into the VW reduction stack. This functionality currently does not exist in any form
vw* initialize(std::string s, io_buf* model = nullptr, bool skipModelLoad = false, trace_message_t trace_listener = nullptr, void* trace_context = nullptr);
vw* initialize(int argc, char* argv[], io_buf* model = nullptr, bool skipModelLoad = false, trace_message_t trace_listener = nullptr, void* trace_context = nullptr);
vw* seed_vw_model(vw vw_model, std::string extra_args, trace_message_t trace_listener = nullptr, void* trace_context = nullptr);
// Allows the input command line string to have spaces escaped by '\'
vw* initialize_escaped(std::string const& s, io_buf* model = nullptr, bool skipModelLoad = false, trace_message_t trace_listener = nullptr, void* trace_context = nullptr);
void cmd_string_replace_value(std::stringstream*& ss, std::string flag_to_replace, std::string new_value);
VW_DEPRECATED("By value version is deprecated, pass std::string by const ref instead using `to_argv`")
char** get_argv_from_string(std::string s, int& argc);
// The argv array from both of these functions must be freed.
char** to_argv(std::string const& s, int& argc);
char** to_argv_escaped(std::string const& s, int& argc);
void free_args(int argc, char* argv[]);
const char* are_features_compatible(vw& vw1, vw& vw2);
/*
Call finish() after you are done with the vw instance. This cleans up memory usage.
*/
void finish(vw& all, bool delete_all = true);
void sync_stats(vw& all);
enum ReductionType { /* TBD */ };
enum ReductionDataType { /* TBD */ };
struct reduction // Maybe pass around copies of this to avoid the question of memory ownership. Its lightweight and won't be used in the hot path, so perf impact is minimal. Will need to be careful about versioning the struct correctly if we need to extend it
{
predict_fn predict;
learn_fn learn;
ReductionDataType input_data_type;
ReductionDataType output_data_type;
ReductionType type = CUSTOM;
... // needs to partially mimic the learner struct
};
Note: Function signatures TBD. The following type names are used as function pointers in these documents
trace_message_t // A handler for trace logs. Null will result in no trace log handling
example_factory_t // A factory to create examples from. In practice, this should probably just be a memory pool of some sort
typedef void* vw;
typedef void* example;
typedef void* options;
The following typedefs would allow for custom reductions to be plugged into the VW reduction stack. This functionality currently does not exist in any form
typedef void* ReductionStack; // Whats the interaction between ReductionStack and vw?
vw – The internal representation of the VW object. Possibly in a wrapper to allow easy import and export of data across the API
example – The internal representation of the example object. Probably won't need a wrapper
options – The Some well-defined options object. Possibly a flatbuf or protobuf. TBD
The following types would allow for custom reductions to be plugged into the VW reduction stack. This functionality currently does not exist in any form
ReductionStack – Should we treat the stack as an array or a linked list? Internally it’s a linked list, but an array is easier to manipulate for a user
struct ReductionStack {
Stack<learner> _reduction_stack;
size_t _size?;
vw* _vw;
};
The proposed API functions are divided into separate documents based on their primary purpose. Some functions can fall into multiple categories, and these are split relatively arbitrarily based on my own judgement. Each document is each split into 3 sections.
- The current C++ API surface that most language bindings use (found in vw.h).
- The interfaces that will be deprecated in the proposal
- The the proposed functionality.
- Remove
import_example
? - it is unclear what it does - Remove
export_example
? - it is unclear what it does -
parse_label
should return a label - Rename
new_unused_example
->allocate_example
? - Does
num_weights
make sense? Isn't it sparse? -
feature_space
stuff should all be merged into one place - I think this is the right time to remove prediction and label from the example at an api level. It fixes and cleans up A LOT
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: