Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.2.3 -> 2.3 upgrade db error with mysql metadb #23605

Closed
1 of 2 tasks
kolfild26 opened this issue May 10, 2022 · 9 comments
Closed
1 of 2 tasks

2.2.3 -> 2.3 upgrade db error with mysql metadb #23605

kolfild26 opened this issue May 10, 2022 · 9 comments
Labels
area:core kind:bug This is a clearly a bug

Comments

@kolfild26
Copy link
Contributor

Apache Airflow version

2.3.0 (latest released)

What happened

Hi,
Tried to upgrade from v.2.2.3 to v.2.3
airflow upgrade db:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
    self.dialect.do_execute(
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 716, in do_execute
    cursor.execute(statement, parameters)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/mysql/connector/cursor_cext.py", line 269, in execute
    result = self._cnx.cmd_query(stmt, raw=self._raw,
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/mysql/connector/connection_cext.py", line 528, in cmd_query
    raise errors.get_mysql_exception(exc.errno, msg=exc.msg,
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'rendered_task_instance_fields.dag_id' in 'on clause'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/media/data/anaconda3/envs/airflow2.3/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/__main__.py", line 38, in main
    args.func(args)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 51, in command
    return func(*args, **kwargs)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/utils/cli.py", line 99, in wrapper
    return f(*args, **kwargs)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/cli/commands/db_command.py", line 82, in upgradedb
    db.upgradedb(to_revision=to_revision, from_revision=from_revision, show_sql_only=args.show_sql_only)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/utils/session.py", line 71, in wrapper
    return func(*args, session=session, **kwargs)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/utils/db.py", line 1400, in upgradedb
    for err in _check_migration_errors(session=session):
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/utils/db.py", line 1285, in _check_migration_errors
    yield from check_fn(session=session)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/airflow/utils/db.py", line 1243, in check_bad_references
    invalid_row_count = invalid_rows_query.count()
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3062, in count
    return self._from_self(col).enable_eagerloads(False).scalar()
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2803, in scalar
    ret = self.one()
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2780, in one
    return self._iter().one()
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2818, in _iter
    result = self.session.execute(
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1670, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1520, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 313, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1389, in _execute_clauseelement
    ret = self._execute_context(
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1748, in _execute_context
    self._handle_dbapi_exception(
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1929, in _handle_dbapi_exception
    util.raise_(
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
    self.dialect.do_execute(
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 716, in do_execute
    cursor.execute(statement, parameters)
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/mysql/connector/cursor_cext.py", line 269, in execute
    result = self._cnx.cmd_query(stmt, raw=self._raw,
  File "/media/data/anaconda3/envs/airflow2.3/lib/python3.8/site-packages/mysql/connector/connection_cext.py", line 528, in cmd_query
    raise errors.get_mysql_exception(exc.errno, msg=exc.msg,
sqlalchemy.exc.ProgrammingError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'rendered_task_instance_fields.dag_id' in 'on clause'
[SQL: SELECT count(*) AS count_1
FROM (SELECT rendered_task_instance_fields.dag_id AS dag_id, rendered_task_instance_fields.task_id AS task_id, rendered_task_instance_fields.execution_date AS execution_date, rendered_task_instance_fields.rendered_fields AS rendered_fields, rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml
FROM rendered_task_instance_fields
WHERE NOT (EXISTS (SELECT 1
FROM task_instance INNER JOIN dag_run ON rendered_task_instance_fields.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id AND rendered_task_instance_fields.execution_date = dag_run.execution_date
WHERE rendered_task_instance_fields.dag_id = task_instance.dag_id AND rendered_task_instance_fields.task_id = task_instance.task_id))) AS anon_1]

The above mysql statement, which causes error, seems to be generated in the count(self) function in /sqlalchemy/orm/query.py.
I ran that statement through mysql workbench and it's easy to see that it's wrong

SELECT 1 FROM task_instance INNER JOIN dag_run ON rendered_task_instance_fields.dag_id = task_instance.dag_id
AND dag_run.run_id = task_instance.run_id
AND rendered_task_instance_fields.execution_date = dag_run.execution_date
WHERE rendered_task_instance_fields.task_id = task_instance.task_id
AND rendered_task_instance_fields.dag_id = task_instance.dag_id;

There is no the rendered_task_instance_fields table in the FROM clause.
I probably know how to fix the statement but the mysql code is autogenerated (am I right?) but not hardcoded.

What you think should happen instead

No response

How to reproduce

  1. upgrade airflow installation from pip apache airflow
  2. run airflow db upgrade --show-sql-only to check the changes. Everything's fine.
  3. run airflow db upgrade

Operating System

Oracle Linux 7.8

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

MySQL DB 5.7.10 which is supported per the docs.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@kolfild26 kolfild26 added area:core kind:bug This is a clearly a bug labels May 10, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented May 10, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@kolfild26
Copy link
Contributor Author

So, currently unable to move to v.2.3.
p.s. I didn't try to initiate the db from scratch, because I need to keep the past data and settings.

@IMax153
Copy link

IMax153 commented May 10, 2022

I can confirm that we are experiencing the same issue outlined above.

@tchakib
Copy link

tchakib commented May 10, 2022

I had the same problem
here is the query generated by the db.py script > https://github.com/apache/airflow/blob/v2-3-stable/airflow/utils/db.py#L1115

SELECT count(*) AS count_1 FROM (SELECT rendered_task_instance_fields.dag_id AS dag_id, rendered_task_instance_fields.task_id AS task_id, rendered_task_instance_fields.execution_date AS execution_date, rendered_task_instance_fields.rendered_fields AS rendered_fields, rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml FROM rendered_task_instance_fields WHERE NOT (EXISTS (SELECT 1 FROM task_instance INNER JOIN dag_run ON rendered_task_instance_fields.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id AND rendered_task_instance_fields.execution_date = dag_run.execution_date WHERE rendered_task_instance_fields.dag_id = task_instance.dag_id AND rendered_task_instance_fields.task_id = task_instance.task_id))) AS anon_1;

it seems to me that the query should be generated as below

SELECT count(*) AS count_1 FROM (SELECT rendered_task_instance_fields.dag_id AS dag_id, rendered_task_instance_fields.task_id AS task_id, rendered_task_instance_fields.execution_date AS execution_date, rendered_task_instance_fields.rendered_fields AS rendered_fields, rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml FROM rendered_task_instance_fields WHERE NOT (EXISTS (SELECT 1 FROM task_instance INNER JOIN dag_run ON dag_run.run_id = task_instance.run_id INNER JOIN rendered_task_instance_fields ON rendered_task_instance_fields.dag_id = task_instance.dag_id AND rendered_task_instance_fields.execution_date = dag_run.execution_date WHERE rendered_task_instance_fields.dag_id = task_instance.dag_id AND rendered_task_instance_fields.task_id = task_instance.task_id))) AS anon_1;

I wonder if the problem concerns version 5.7 ?

@uranusjr
Copy link
Member

Probably we can work around this by explicitly joining RenderedTaskInstanceFields? cc @dstandish

@dstandish
Copy link
Contributor

The query was changed to left join (instead of a not exists) in #23528

So that should fix this. cc @ashb

@kolfild26
Copy link
Contributor Author

so, it'll be released in v.2.3.1, right?
now it's included in v2-3-test, but not in v2-3-stable.

@potiuk
Copy link
Member

potiuk commented May 14, 2022

so, it'll be released in v.2.3.1, right? now it's included in v2-3-test, but not in v2-3-stable.

Correctl v2-3-test is there for cherry-picking changes from main and we merge it to v2-3-stable when we release.

@eladkal
Copy link
Contributor

eladkal commented May 20, 2022

Closing as should be fixed by #23528

@eladkal eladkal closed this as completed May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

7 participants