Make bulk loader work with arrays instead of strings #1101

MatMoore · 2017-12-18T16:26:41Z

Moved from https://trello.com/c/8zhBPuQT/12-make-bulk-loader-work-with-arrays-instead-of-strings.

What

Every night a job runs to rebuild the search index with new popularity data.
https://github.com/alphagov/search-analytics/blob/master/nightly-run.sh

The bulk load script accepts text from standard input, representing elasticsearch documents. It then calls indexing code that is shared with regular indexing functionality, even though the argument type is different.

This makes the code really difficult to work on, because any value can be either a string or an array of hashes. This complexity affects all of the indexing code, eg

    def bulk_payload(document_hashes_or_payload)
      if document_hashes_or_payload.is_a?(Array)
        index_items_from_document_hashes(document_hashes_or_payload)
      else
        index_items_from_raw_string(document_hashes_or_payload)
      end
    end

Why

There are two separate code paths that essentially do the same thing, and if you make any change to this code you have to be very careful to change both of them in the same way, and test both of them.

The text was updated successfully, but these errors were encountered:

MatMoore added the indexing label Dec 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make bulk loader work with arrays instead of strings #1101

Make bulk loader work with arrays instead of strings #1101

MatMoore commented Dec 18, 2017

Make bulk loader work with arrays instead of strings #1101

Make bulk loader work with arrays instead of strings #1101

Comments

MatMoore commented Dec 18, 2017

What

Why