Skip to content
Luecx edited this page Jun 14, 2022 · 4 revisions

Welcome to the CudAD wiki!

Data generation

To train neural networks for chess engines, one requires a huge amount of data. This data needs to be generated in the first place. Koivisto uses some self-play games with 10 random moves in the beginning. Both sides use 5000 nodes and share the same hash table. A quick overview of the settings used in Koivisto is the following:

- 10 random ply
- start recording fens after 16 ply
- both sides get access to TB
- adjudicate if the score is > 1500 for more than 4 plies
- adjudicate if the score is < 8 for more than 10 plies
- both sides SHARE the same hash table
- each evaluation uses 5000 nodes per search
- remove low depth pruning within the search move loop

The required data format is:

FEN [wdl] score

where score is the score returned by the search. Ideally it should be from whites pov. wdl is the outcome of the game where 1.0 indicates that white has won. 0.5 indicates a draw score and 0.0 means that black has won. An example of those files may look like this:

...
R7/7p/2rR2k1/2P5/8/p6P/r5P1/6K1 b - - 11 40 [0.5] 130
8/2N1r1k1/1p4p1/p1N2p1p/4pP2/1PRnP2P/P5P1/5K2 b - - 0 32 [0.5] 9
r2qr1k1/1p1n1pB1/p5pp/2p2b2/2P5/1P2nN1P/P2QBPP1/3RR1K1 w - - 1 16 [0.5] 32
4rknr/1p3pp1/p2q4/8/P5Q1/1P1N2Pp/3P1P1P/2R1KB2 w - - 4 16 [0.0] -162
4q2k/2p5/1p6/p6p/P4Q1P/2P2P1K/2n2B2/8 w - - 0 40 [1.0] 344
...

Data preparation for training

In order to be able to use the text files inside the CudAD code, one needs to convert them to binary pucks. Luckily CudAD features functions to convert this. To read any text file, one can use:

DataSet ds = read<TEXT>(path_to_your_file);

Writing the dataset to binary format is as simple as:

write(ds, path_to_your_output_file);

read and write both accept one more optional argument which specify the amount of data to read and write. Reading binary dataset which have been written using the write function can be read using read<BINARY>(path_to_your_file). The first step should be to adjust the main.cpp and run:

init();

DataSet ds = read<TEXT>(path_to_your_file);
write(ds, path_to_your_output_file);

close();

for all the text files you provided. The second step of data generation is the shuffling which shuffles the produced binpacks. This can be done using:

init();

std::vector<std::string> input_files{};
// file the input_files with paths to your binary files

mix_and_shuffle_2(input_files, "outfile$.bin", 32);

close();

Here the second argument is a template output file name. The code will search for the dollar sign $ and replace it with a number ranging from 1 to the 3rd argument provided. In the case above, 32 files named outfile1.bin, outfile2.bin, ... outfile32.bin would be created. These files are nicely shuffled and can be directly used for training.

Setting up the first tuning run.

CudAD works around specifying a config for your engine / tuning run. The configs which describe the network architecture, the optimiser to use, learning schedule etc. can be found within /src/arch/. Ideally you should write your own config. A very simple config would look like this:

class YourConfig{

public:
static constexpr int   Inputs        = 12 * 64;
static constexpr int   L2            = 128;
static constexpr int   Outputs       = 1;
static constexpr float SigmoidScalar = 2.5 / 400;

static Optimiser*      get_optimiser() {
    Adam* optim  = new Adam();
    optim->lr    = 1e-1;
    optim->beta1 = 0.95;
    optim->beta2 = 0.999;
//    optim->schedule.step = 5;
//    optim->schedule.gamma = 0.3;

    return optim;
}

static Loss* get_loss_function() {
    MPE* loss_f = new MPE(2.5, false);

    return loss_f;
}

static std::vector<LayerInterface*> get_layers() {
    auto* l1 = new DenseLayer<Inputs, L2, ReLU>();
    auto* l2  = new DenseLayer<L2, Outputs, Sigmoid>();
    dynamic_cast<Sigmoid*>(l2->getActivationFunction())->scalar = SigmoidScalar;

    return std::vector<LayerInterface*> {l1, l2};
}

static void assign_inputs_batch(DataSet&       positions,
                                SparseInput&   in1,
                                SparseInput&   in2,
                                SArray<float>& output,
                                SArray<bool>&  output_mask) {

    ASSERT(positions.positions.size() == in1.n);
    ASSERT(positions.positions.size() == in2.n);

    in1.clear();
    in2.clear();
    output_mask.clear();

#pragma omp parallel for schedule(static) num_threads(8)
    for (int i = 0; i < positions.positions.size(); i++)
        assign_input(positions.positions[i], in1, in2, output, output_mask, i);
}


static int index(Square psq, Piece p, Square kingSquare, Color view) {
    //if(view != WHITE) {
    //    psq = mirrorVertically(psq);
    //}

    return psq +
           (getPieceType (p)         ) * 64 +
           (getPieceColor(p) != WHITE) * 64 * 6;
}

static void assign_input(Position&      p,
                         SparseInput&   in1,
                         SparseInput&   in2,
                         SArray<float>& output,
                         SArray<bool>&  output_mask,
                         int            id) {


    BB     bb {p.m_occupancy};
    int    idx = 0;

    while (bb) {
        Square sq                    = bitscanForward(bb);
        Piece  pc                    = p.m_pieces.getPiece(idx);

        auto view = p.m_meta.getActivePlayer();
        auto inp_idx = index(sq, pc, 0, view);

        in1.set(id, inp_idx);

        bb = lsbReset(bb);
        idx++;
    }

    float p_value = p.m_result.score;
    float w_value = p.m_result.wdl;

    // flip if black is to move -> relative network style
    //if (p.m_meta.getActivePlayer() == BLACK) {
    //   p_value = -p_value;
    //    w_value = -w_value;
    //}

    float p_target = 1 / (1 + expf(-p_value * SigmoidScalar));
    float w_target = (w_value + 1) / 2.0f;

    output(id)      = (p_target + w_target) / 2;
    output_mask(id) = true;
}
};

Requires fields are: Inputs and Outputs. In this case it features a simple architecture with 768 inputs, 128 hidden neurons and one output neuron. The architecture can be adjusted inside get_layers(). Multiple layers can be added. Weight clamping can be enabled and much different layer types can be used.

Beside the architecture, the way positions are mapped to inputs is relevant. An example loop is presented which uses the index function to copmute the index. Since the network only features 1 input, all sparse inputs are written to the first sparse input in1 given. In the example above, the target which the network trains on is the average of the p_target which is the score from the dataset converted to a win-draw-loss ratio and the win-draw-loss score from the dataset.

output(id)      = (p_target + w_target) / 2;
output_mask(id) = true;

Starting the training

The config can be plugged into the trainer class:

const string output    = "../resources/runs/experiment_1/";

vector<string> files {};
// add binary files here

Trainer<YourConfig> trainer {};
trainer.fit(files, files, output);

The output string is used to save the resulting loss history and the trained networks. Secondly one needs to create a vector of paths to the binary dataset which have been shuffled. Lastly, this can be plugged into the trainer and training can begin.

Clone this wiki locally