-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Coding Guidelines
Jack Gerrits edited this page May 4, 2021
·
13 revisions
This page is primarily targeted at maintainers and contributors
The idea of these guidelines is to form a general consensus about the method used and help progress towards the general goal of the VW project to create a very fast, efficient, and capable learning algorithm
- All variables should be initialized. If explicitly allocating memory (malloc), ensure it has been zeroed.
- Memory allocation is avoided by reusing allocated memory where possible.
- Floats are preferred over doubles. Doubles are only used for accumulators.
- Templates are used to eliminate duplicate code and in some places to remove branches from inner loops.
- Pass by reference is the default, except for objects of pointer size. Use a const reference whenever possible.
- All learning reductions are confined to a single file with a single entry point.
- Learning reductions transform an example from one problem type to another.
- A problem type is defined by (label, prediction, features)
- Don't manually manage memory. Don't use
new
/delete
,malloc
/free
. Use RAII and smart pointers whenever possible.-
std::unique_ptr
is used to represent unique ownership -
std::shared_ptr
is used to represent shared ownership
-
- I/O functions should come in pairs. e.g.
read
/write
- Function pointer interfaces are explicit.
- All examples are handled by the same stack of reductions.
- Examples are passed by function call.
- It’s not working until:
- There are no warnings
- Valgrind says it’s clean
- It is running in CI
- Prefer to use fixed size types where possible. Example:
uint32_t
- Untagged unions are not allowed
- C style casts are not allowed. Use
static_cast/const_cast/reinterpret_cast
- Checkable and fixable with
clang-tidy
with checkgoogle-readability-casting
- Checkable and fixable with
- No direct access to
std::cout
,std::cerr
. Use the VW logging interface. - Use future compat whenever possible. This is used for features available in newer versions of C++ but we must conditionally support because VW targets C++11.
- Use
constexpr
whenever possible. Use future compat if it requires C++14 or above. - Use
nodiscard
whenever possible
- Use
- Code is not fast unless a benchmark proves it
- Reductions should not keep a reference to the all object. They should keep a reference to only what they need.
- Rule of 0/3/5
- Use scoped enums over C enums
- Avoid global state
- Undefined behavior is not permitted. Not even if it is faster. Correctness is more important than speed.
- If an error state can be handled locally, that should always be what we do.
- All memory allocation should go through
memory.h
with errors handled by exception (the default) or crash (conditional compile). In the future, we may add the ability to specify a memory allocator. - No new exceptions in the VW slim codepath when that is incorporated.
- We avoid new exceptions in the example handling path and have a goal of removing any others that exist.
- Where error states are unavoidable (i.e. situations like arguments-don’t-make-sense) we’ll accept off-critical-path exceptions. A proposal for refactoring around return codes is welcome, but that will need to be driven by Rajan and is subject to priorities. At the library interface level, I believe this can be handled by having an explicit catch in the library interface for the setup calls.
- There is a
.clang-format file
, which outlines the general style - Class member variables should be prefixed with
_
. Example:uint32_t _number_of_actions;
- All types should be nested in the
VW
namespace- This is a work in progress
- Ultimately this namespace will become
vw
.VW
->vw
- All blocks should be surrounded with braces. Single statement blocks are permitted to be on the same line. This can be easily fixed with tooling like so:
-
clang-tidy -p build -fix -format-style=file -config="{Checks: 'readability-braces-around-statements'}" vowpalwabbit/<your_file>
- Passing
config
like this overrides the default checks specified in the .clang-tidy file -
-format-style=file
means it will also format the fixed code according to the .clang-format file
- Passing
-
This is a list of improvements that we want to make to the code. Any help implementing them is of course welcome.
- Change the io_buf structure to run in it's own thread. Currently, reading bits into program space operates synchronously with parsing which implies that delays in the return of read() delay parsing. This should speedup all input forms (daemon, stdin, file)
- Change the text parser to work in a read-once fashion. Currently, input strings are read multiple times.
- There should be a way to use the algorithms as a library.
- Alternate learning algorithms. We have basic matrix factorization, which needs to be developed further. We also want to push into more complex nonconvex algorithms.
- Learning reductions. Previously, we've used VW as a library to implement learning reductions against, but adding a layer of abstraction in the system allowing reductions to directly operate should be doable, and desirable. Especially in a cluster parallel environment, directly supporting learning reductions appears superior to a library implementation.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: