Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of testing Providers that were prepared on February 05, 2022 #21348

Closed
6 of 54 tasks
potiuk opened this issue Feb 5, 2022 · 16 comments
Closed
6 of 54 tasks

Status of testing Providers that were prepared on February 05, 2022 #21348

potiuk opened this issue Feb 5, 2022 · 16 comments
Labels
kind:meta High-level information important to the community testing status Status of testing releases

Comments

@potiuk
Copy link
Member

potiuk commented Feb 5, 2022

Body

I have a kind request for all the contributors to the latest provider packages release.
Could you please help us to test the RC versions of the providers?

Let us know in the comment, whether the issue is addressed.

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 3.0.0rc1

Provider apache.druid: 2.3.0rc1

Provider apache.hive: 2.2.0rc1

Provider apache.spark: 2.1.0rc1

Provider apache.sqoop: 2.1.0rc1

Provider cncf.kubernetes: 3.0.2rc1

Provider docker: 2.4.1rc1

Provider exasol: 2.1.0rc1

Provider google: 6.4.0rc1

Provider http: 2.0.3rc1

Provider imap: 2.2.0rc1

Provider jdbc: 2.1.0rc1

Provider microsoft.azure: 3.6.0rc1

Provider microsoft.mssql: 2.1.0rc1

Provider microsoft.psrp: 1.1.0rc1

Provider mysql: 2.2.0rc1

Provider oracle: 2.2.0rc1

Provider postgres: 3.0.0rc1

Provider qubole: 2.1.0rc1

Provider slack: 4.2.0rc1

Provider snowflake: 2.5.0rc1

Provider sqlite: 2.1.0rc1

Provider ssh: 2.4.0rc1

Provider tableau: 2.1.4rc1

Provider vertica: 2.1.0rc1

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@potiuk potiuk added the kind:meta High-level information important to the community label Feb 5, 2022
@raphaelauv
Copy link
Contributor

raphaelauv commented Feb 5, 2022

#21175: @ferruzzi

docker operator is buggy even with do_xcom_push=False it fail

DockerOperator(
    task_id='docker_op_tester',
    dag=dag,
    api_version='auto',
    docker_url="unix://var/run/docker.sock",
    command='/bin/echo tata',
    image='centos:latest',
    network_mode='bridge',
    mount_tmp_dir=False,
    do_xcom_push=True,
)

fail

with

[2022-02-05, 22:38:18 UTC] {taskinstance.py:1259} INFO - Executing <Task(DockerOperator): docker_op_tester> on 2022-02-03 07:00:00+00:00
[2022-02-05, 22:38:18 UTC] {standard_task_runner.py:52} INFO - Started process 325 to run task
[2022-02-05, 22:38:18 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'docker_dag', 'docker_op_tester', 'scheduled__2022-02-03T07:00:00+00:00', '--job-id', '34', '--raw', '--subdir', 'DAGS_FOLDER/docker_dag.py', '--cfg-path', '/tmp/tmpqd592sdh', '--error-file', '/tmp/tmp3e4mxwk4']
[2022-02-05, 22:38:18 UTC] {standard_task_runner.py:77} INFO - Job 34: Subtask docker_op_tester
[2022-02-05, 22:38:18 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: docker_dag.docker_op_tester scheduled__2022-02-03T07:00:00+00:00 [running]> on host 1b2f3575c860
[2022-02-05, 22:38:18 UTC] {taskinstance.py:1424} INFO - Exporting the following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=docker_dag
AIRFLOW_CTX_TASK_ID=docker_op_tester
AIRFLOW_CTX_EXECUTION_DATE=2022-02-03T07:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-02-03T07:00:00+00:00
[2022-02-05, 22:38:18 UTC] {docker.py:227} INFO - Starting docker container from image centos:latest
[2022-02-05, 22:38:19 UTC] {docker.py:289} INFO - tata
[2022-02-05, 22:38:19 UTC] {xcom.py:333} ERROR - Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your airflow config.
[2022-02-05, 22:38:19 UTC] {taskinstance.py:1700} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
    self.xcom_push(key=XCOM_RETURN_KEY, value=result)
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2135, in xcom_push
    XCom.set(
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 67, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 100, in set
    value = XCom.serialize_value(value)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 331, in serialize_value
    return json.dumps(value).encode('UTF-8')
  File "/usr/local/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type CancellableStream is not JSON serializable
[2022-02-05, 22:38:19 UTC] {taskinstance.py:1267} INFO - Marking task as FAILED. dag_id=docker_dag, task_id=docker_op_tester, execution_date=20220203T070000, start_date=20220205T223818, end_date=20220205T223819
[2022-02-05, 22:38:19 UTC] {standard_task_runner.py:89} ERROR - Failed to execute job 34 for task docker_op_tester
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
    args.func(args, dag=self.dag)
  File "/usr/local/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/cli.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 298, in task_run
    _run_task_by_selected_method(args, dag, ti)
  File "/usr/local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
    _run_raw_task(args, ti)
  File "/usr/local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 180, in _run_raw_task
    ti._run_raw_task(
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
    self.xcom_push(key=XCOM_RETURN_KEY, value=result)
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2135, in xcom_push
    XCom.set(
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 67, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 100, in set
    value = XCom.serialize_value(value)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 331, in serialize_value
    return json.dumps(value).encode('UTF-8')
  File "/usr/local/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type CancellableStream is not JSON serializable
[2022-02-05, 22:38:19 UTC] {local_task_job.py:154} INFO - Task exited with return code 1
[2022-02-05, 22:38:19 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check

@kazanzhy
Copy link
Contributor

kazanzhy commented Feb 5, 2022

Changes in Qubole provider was in documentation (#20058) and typing (#21074)
There are no code changes, therefore seems everything is OK.

@Ritika-Singhal
Copy link
Contributor

Ritika-Singhal commented Feb 6, 2022

#19787 requires one additional fix in here:
https://github.com/apache/airflow/blob/providers-amazon/3.0.0rc1/airflow/providers/amazon/aws/operators/glue.py
Need to modify the default value of num_of_dpus to None (which is currently set to 6).

Without this fix, it becomes inconsistent with the logic changed in the airflow/providers/amazon/aws/hooks/glue.py. It then gives the error to the user if user doesn't specify num_of_dpus=None while calling the AWSGlueJobOperator

File "/Users/ritika/opt/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1511, in _execute_task result = execute_callable(context=context) File "/Users/ritika/opt/anaconda3/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 123, in execute create_job_kwargs=self.create_job_kwargs, File "/Users/ritika/opt/anaconda3/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 74, in __init__ raise ValueError("Cannot specify num_of_dpus with custom WorkerType") ValueError: Cannot specify num_of_dpus with custom WorkerType [2022-02-05, 16:57:41 UTC] {local_task_job.py:154} INFO - Task exited with return code 1

@mariotaddeucci
Copy link
Contributor

mariotaddeucci commented Feb 6, 2022

#20807 all good.
#20506 was tested on #20615 (amazon 2.6.0)

@Ritika-Singhal
Copy link
Contributor

Ritika-Singhal commented Feb 6, 2022

For an additional fix for #19787, I have created another pull request #21353
All other functionalities in #19787 works as expected.

@potiuk
Copy link
Member Author

potiuk commented Feb 6, 2022

Ok. Still worth testing Amazon changes on this version but I will remove Amazon + Docker (if confirmed) and prepare RC2 right after we release all others and get fixes merged. Thanks @Ritika-Singhal @raphaelauv !

@potiuk
Copy link
Member Author

potiuk commented Feb 6, 2022

I also removed some of the "no-need-to-test" changes (and 2.6.0 amazon changes - which was a bug in generation of the issue - sorry for that.

@rsg17
Copy link
Contributor

rsg17 commented Feb 6, 2022

Hi @potiuk - It is my first time doing this testing: Are there any specific steps we follow?

My PR adds a hook for Google Calendar.

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2022

Hi @potiuk - It is my first time doing this testing: Are there any specific steps we follow?

Good question. Depends on your "local" environment. But what I would do, is to use:

  1. Start airflow 2.2.3 in breeze
./breeze start-airflow --use-airflow-version 2.2.3 --backend postgres --db-reset  --load-default-connections 

This will start our dockerized development environment with Airflow 2.2.3 installed and open 4 terminals: triggerer, scheduler, webserver and "bash console".

  1. update the package to RC:

In the console:

pip install apache-airflow-providers-google==6.4.0rc1 

Then restarting scheduler/webserver (by Ctrl+C) followed by "up cursor" to go back and run previous command

  1. Prepare a test dag using calendar api and run it

For that you likely need to configure the "google_default_connection" to include your credentials. You can put your dags in "files/dags" folder of your Airlfow sources (the "files" folder is mounted to inside the docker) and they should be scanned/visible in webserver

@malthe
Copy link
Contributor

malthe commented Feb 7, 2022

@potiuk – Oracle and PSRP providers all good.

@josh-fell
Copy link
Contributor

@potiuk Part of #21237 was adding 3 new lexers for the SQL rendering in the UI. For the providers that use the new lexers, the rendering won't be as readable as it could be because the configured lexer doesn't exist yet (i.e. the rendering becomes a Python type rendering). Do we want to wait on releasing #21237 at all until those lexers are available or not push the SQL template_fields_renderers change for those particular providers yet?

For example, there is a new postgresql lexer and added to the RedshiftSqlOperator (among others). Without the new lexer in core Airflow, the rendering of the SQL statement looks like this:
image

Instead of:
image

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2022

Good point Josh. I think this one might be a bit tricky, I do not think it is "blocker" - the rendering is not nice but it does not "break" anything.

However I thought it might be a nice way to incentivize people to migrate to 2.2.4 if we also cherry-pick the lexer to 2.2.4. Then we could telll them "migrate to 2.2.4 to get it nicer". This is a new feature - of course - so technically we shoud add it in 2.3.0. So I am a bit torn here.

There are quite a number of those providers that are only released because of this renderer so we can easily skip them from this release and release them when we release 2.3.0

@jedcunningham @josh-fell @wdyt?

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2022

The second option is to indeed drop all the "21237" providers an release them in RC2 with some "conditional" code that will check if the lexer is there. After thinking a bit I think that would be much "cleaner" solution and I am leaning towards this option.

smth like:

template_fields_renderer = {//}.update({'sql': 'sql'} if 'sql' in get_attr_renderer() else {})

@jedcunningham
Copy link
Member

I'd rather not include it for 2.2.4 since, as you said, it's a new feature. Doing it conditionally, until the next lower bound bump, I think makes sense.

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2022

Yeah. It's easy enough to fix and I agree it's better.

I think - due to the the number of those affected, it makes more sense to cancel the whole release and re-release all providers as RC2 with that fix to not mess with two releases. there are no super-urgent changes in this wave.

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2022

As discussed above ^^ I will cancel the whole vote/release and will make an RC2 as soon as we release the conditonally working SQL renderer. Thanks @josh-fell for spotting this and raising it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:meta High-level information important to the community testing status Status of testing releases
Projects
None yet
Development

No branches or pull requests

9 participants