Connection.copy_records_to_table() fails on NUMERIC type #157

sergeyspatar · 2017-06-17T05:37:53Z

asyncpg version: 0.11.0
PostgreSQL version: 9.6.3
Python version: 3.5.3
Platform: Linux x86_64 (Debian)
Do you use pgbouncer?: no
Did you install asyncpg with pip?: no
If you built asyncpg locally, which version of Cython did you use?: 0.25.2
Can the issue be reproduced under both asyncio and
uvloop?: yes

It seems that NUMERIC values are not packed correctly for PGCOPY binary stream. I tried passing int, float and Decimal values and got the same error insufficient data left in message. Here is my example:

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

from decimal import Decimal

async def run():
    conn = await asyncpg.connect(user='postgres')
    await conn.execute('create temporary table temp_table (c1 numeric)')
    #await conn.copy_records_to_table('temp_table', records=[(123,)])
    #await conn.copy_records_to_table('temp_table', records=[(0.123,)])
    await conn.copy_records_to_table('temp_table', records=[(Decimal('0.123'),)])

asyncio.get_event_loop().run_until_complete(run())

Here is the traceback:

# python3 test_pgcopy_numeric.py
Traceback (most recent call last):
  File "test_pgcopy_numeric.py", line 16, in <module>
    asyncio.get_event_loop().run_until_complete(run())
  File "/usr/lib/python3.5/asyncio/base_events.py", line 466, in run_until_complete
    return future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "test_pgcopy_numeric.py", line 14, in run
    await conn.copy_records_to_table('temp_table', records=[(Decimal('0.123'),)])
  File "/usr/lib/python3/dist-packages/asyncpg/connection.py", line 619, in copy_records_to_table
    copy_stmt, records, intro_ps._state, timeout)
  File "/usr/lib/python3/dist-packages/asyncpg/connection.py", line 741, in _copy_in_records
    copy_stmt, None, None, records, intro_stmt, timeout)
  File "asyncpg/protocol/protocol.pyx", line 445, in copy_in (asyncpg/protocol/protocol.c:66303)
  File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
asyncpg.exceptions.ProtocolViolationError: insufficient data left in message

The text was updated successfully, but these errors were encountered:

bennoleslie · 2017-06-23T09:19:23Z

FWIW, I experience the same problem (Postgres 9.5.6 Python 3.6).

bennoleslie · 2017-06-23T11:15:54Z

I've done some digging here. I believe the root cause is the fact that the numeric codec is implemented as text protocol only (not binary). COPY IN requires all the data be encoded as binary. So the resolution of this would be changing/adding a binary codec for numeric (but then it is likely playing whack-a-mole, since it will be the next type that is only implemented as text causing a problem).

It would be nice to get a better error message. Ideally, there would be a way to pass format=PG_FORMAT_BINARY down to get_data_codec.

As far as I can tell that would somehow mean _ensure_rows_decoder needing to pass it in during the loop over all the columns. Potentially, a binary_only parameter could be added to _ensure_rows_decoder, which would then pass format=PG_FORMAT_BINARY to get_data_codec. That would provide a much better and more obvious error than what happens currently.

bennoleslie · 2017-06-23T11:23:44Z

Searching through to find which types (that are likely to be used) would cause a problem, and we have:

numeric
money
MAC address
tsearch (not sure this would be common on copy in?)

Money looks relatively easy to implement (see: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/cash.c) it seems like just an int64 encoding.

MAC address also looks relatively straight forward (see: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/mac.c#L161).

Numeric is a little more involved (see: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/numeric.c#L858)

Hopefully this is a starting point for someone.

bennoleslie · 2017-06-23T11:37:33Z

A potential problem with implementing this is how to handle display scale dscale. The structure is in postgres is:

typedef struct NumericVar
{
	int			ndigits;		/* # of digits in digits[] - can be 0! */
	int			weight;			/* weight of first digit */
	int			sign;			/* NUMERIC_POS, NUMERIC_NEG, or NUMERIC_NAN */
	int			dscale;			/* display scale */
	NumericDigit *buf;			/* start of palloc'd space for digits[] */
	NumericDigit *digits;		/* base-NBASE digits */
} NumericVar;

I think that ndigits, weight and sign can all be round-tripped via the Decimal type, but I'm not sure how dscale would be handled inside Python.

bennoleslie · 2017-06-23T11:53:46Z

I think a starting point for numeric_encode could be something like:

NUMERIC_POS = 0x0000
NUMERIC_NEG = 0x4000
NUMERIC_NAN = 0xC000

cdef numeric_encode(ConnectionSettings settings, WriteBuffer buf, obj):
    raw_sign, digits, exponent = obj.as_tuple()
    ndigits = len(digits)
    sign = NUMERIC_POS if py_sign == 0 else NUMERIC_NEG
    if exponent == 'nan':
         sign |= NUMERIC_NAN
	 weight = 0 ??
    else:
         weight = len(digits) - exponent - 1) # Check this
    dscale = ???
    buf.write_int16(ndigits)
    buf.write_int16(weight)
    buf.write_int16(sign)
    buf.write_int16(dscale)
    for digit in digits:
        buf.write_int8(digit)

Main thing missing is working out what dscale value is appropriate.

…odec. COPY .. FROM .. (FORMAT binary) currently requires that all data is encoded in binary format. Hence, we need to check if we have appropriate codecs for all columns before initiating CopyIn. Issue: #157.

elprans · 2017-06-26T14:04:43Z

I added a check to raise a better error message when a binary format encode is unavailable.

elprans · 2017-06-26T14:07:57Z

As for the numeric encoding, dscale is actually display precision (number of digits after the dot). Basically, the negated exponent when the exponent is negative, otherwise zero.

1st1 · 2017-06-30T19:58:34Z

Can we close this issue now?

sergeyspatar · 2017-06-30T21:13:47Z

I just retested the issue using current sources from master branch. Now it raises this error:

RuntimeError: no binary format encoder for type numeric (OID 1700)

So, it is still impossible to use NUMERIC columns in Connection.copy_records_to_table(). Thus, I think the issue should stay open.

elprans · 2017-06-30T21:22:19Z

There are two outstanding issues here: lack of binary codec for numeric and the missing fallback to text-based COPY for cases where there are missing binary encoders for the target columns.

Fixes: #157

alex-eri · 2021-01-13T20:55:53Z

macaddr not implemented.

elprans added a commit that referenced this issue Jul 6, 2017

Implement binary format codec for numeric type

13ef9a9

Fixes: #157

elprans mentioned this issue Jul 6, 2017

Implement binary format codec for numeric type #168

Merged

elprans added a commit that referenced this issue Jul 6, 2017

Implement binary format codec for numeric type

83f69af

Fixes: #157

elprans added a commit that referenced this issue Jul 6, 2017

Implement binary format codec for numeric type

6b6d554

Fixes: #157

elprans added a commit that referenced this issue Jul 6, 2017

Implement binary format codec for numeric type

aa36548

Fixes: #157

elprans added a commit that referenced this issue Jul 6, 2017

Implement binary format codec for numeric type

0f84835

Fixes: #157

elprans closed this as completed in #168 Jul 6, 2017

elprans added a commit that referenced this issue Jul 6, 2017

Implement binary format codec for numeric type

a5413eb

Fixes: #157

SamuilDichev mentioned this issue May 25, 2021

set_type_codec() doesn't work with copy_records_to_table() #762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection.copy_records_to_table() fails on NUMERIC type #157

Connection.copy_records_to_table() fails on NUMERIC type #157

sergeyspatar commented Jun 17, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

elprans commented Jun 26, 2017

elprans commented Jun 26, 2017

1st1 commented Jun 30, 2017

sergeyspatar commented Jun 30, 2017

elprans commented Jun 30, 2017

alex-eri commented Jan 13, 2021

Connection.copy_records_to_table() fails on NUMERIC type #157

Connection.copy_records_to_table() fails on NUMERIC type #157

Comments

sergeyspatar commented Jun 17, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

bennoleslie commented Jun 23, 2017

elprans commented Jun 26, 2017

elprans commented Jun 26, 2017

1st1 commented Jun 30, 2017

sergeyspatar commented Jun 30, 2017

elprans commented Jun 30, 2017

alex-eri commented Jan 13, 2021