Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running a SQL step after bulk insert into JDBC #411

Open
cyrillay opened this issue Feb 17, 2021 · 7 comments
Open

Running a SQL step after bulk insert into JDBC #411

cyrillay opened this issue Feb 17, 2021 · 7 comments

Comments

@cyrillay
Copy link

Hello,

When using metorikku with a JDBC output, is it possible to add custom SQL after a bulk insert ?

We'd like to run some SQL on the target JDBC to add some indexes on the created tables :

CREATE INDEX … ON …;

First, we thought of adding the « truncate : true » option so that the tables are not dropped but wiped instead, so the indexes would not be deleted at each ingestion.

# job_config.yaml
output:
  jdbc:
    connectionUrl: ****
    options:
      truncate: true

However, this limits the schema evolution : if the schema in the input changes, the job will fail.

I was wondering if it could be possible to keep the default truncate: false, but add some SQL statement (creating indexes) that would be ran after the output is written.
This would allow schema evolution while recreating the indexes at each ingestion.

Do you know if there is a way to achieve this with Metorikku ?

I hope the explanation is precise enough. Thank you in advance for your answer !

@dthauvin
Copy link

Woow could be very useful for us .

@lyogev
Copy link
Contributor

lyogev commented Feb 20, 2021

Hi! so you could hack this some way.
One of the abilities to write via JDBC is with the JDBC Query output, so you could, for example, after writing to your JDBC table perform another write like so (using some dummy table with a single row):

steps:
  - dataFrameName: table1
    sql:
      SELECT * FROM source
  - dataFrameName: single_row_table
    sql:
      SELECT 'no_value' AS col
output:
  - dataFrameName: table1
    outputType: JDBC
...
  - dataFrameName: single_row_table
    outputType: JDBCQuery
    outputOptions:
      query:
        CREATE INDEX … ON …

@cyrillay
Copy link
Author

Hi @lyogev ,

That seemed like a great idea. However, after trying, I get an error :

java.sql.SQLException: Parameter index out of range (1 > number of parameters, which is 0).
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:965)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:898)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:887)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:861)
	at com.mysql.jdbc.PreparedStatement.checkBounds(PreparedStatement.java:3385)
	at com.mysql.jdbc.PreparedStatement.setInternal(PreparedStatement.java:3370)
	at com.mysql.jdbc.PreparedStatement.setString(PreparedStatement.java:4095)
	at com.yotpo.metorikku.output.writers.jdbc.JDBCQueryWriter.addValueToStatement(JDBCQueryWriter.scala:57)
	at com.yotpo.metorikku.output.writers.jdbc.JDBCQueryWriter.$anonfun$write$4(JDBCQueryWriter.scala:35)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
	at com.yotpo.metorikku.output.writers.jdbc.JDBCQueryWriter.$anonfun$write$3(JDBCQueryWriter.scala:34)
	at com.yotpo.metorikku.output.writers.jdbc.JDBCQueryWriter.$anonfun$write$3$adapted(JDBCQueryWriter.scala:33)

From what I understand by looking at the Metorikku code, It seems like it is using a preparedStatement and looking for a placeholder for each field of the destination table, which it can't find as the CREATE INDEX .. ON .. doesn't contain any.

I am now going to try with a dummy SELECT query as a first part of the output, will let you know if I manage to get something :

steps:
  - dataFrameName: table1
    sql:
      SELECT * FROM source
  - dataFrameName: single_row_table
    sql:
      SELECT 'no_value' AS col
output:
  - dataFrameName: table1
    outputType: JDBC
...
  - dataFrameName: single_row_table
    outputType: JDBCQuery
    outputOptions:
      query:
        SELECT 1 from dual where 1 = $1;
        CREATE INDEX … ON …

@cyrillay
Copy link
Author

The SELECT doesn't work :

java.sql.BatchUpdateException: Can not issue executeUpdate() or executeLargeUpdate() for SELECTs
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:403)
	at com.mysql.jdbc.Util.getInstance(Util.java:386)
	at com.mysql.jdbc.SQLError.createBatchUpdateException(SQLError.java:1154)
	at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1835)
	at com.mysql.jdbc.PreparedStatement.executeBatchInternal(PreparedStatement.java:1319)
	at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.java:954)
	at com.yotpo.metorikku.output.writers.jdbc.JDBCQueryWriter.$anonfun$write$2(JDBCQueryWriter.scala:39)

So I tried :

  outputOptions:
    query:
      INSERT INTO dummy_table(dummy_field) VALUES (?);
      CREATE INDEX index_test ON test_table(test_field(30));

But I get a syntax error, I understand that conn.prepareStatement in com.yotpo.metorikku.output.writers.jdbc.JDBCQueryWriter can only accept one SQL statement, see here.

So unfortunately I don't think it is doable this way... Maybe if the yaml could support a syntax with

outputOptions:
    queries:

?

What do you think ? Do you have any idea from those tests @lyogev ?

@lyogev
Copy link
Contributor

lyogev commented Feb 23, 2021

I've opened this PR for this #414

@lyogev
Copy link
Contributor

lyogev commented Feb 24, 2021

@cyrillay would you mind testing #414 and see if it fits what you need?

@cyrillay
Copy link
Author

@lyogev

Worked like a charm ! Thanks a lot 👍 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants