Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection.copy_records_to_table() fails on NUMERIC type #157

Closed
sergeyspatar opened this issue Jun 17, 2017 · 11 comments
Closed

Connection.copy_records_to_table() fails on NUMERIC type #157

sergeyspatar opened this issue Jun 17, 2017 · 11 comments

Comments

@sergeyspatar
Copy link

  • asyncpg version: 0.11.0
  • PostgreSQL version: 9.6.3
  • Python version: 3.5.3
  • Platform: Linux x86_64 (Debian)
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: no
  • If you built asyncpg locally, which version of Cython did you use?: 0.25.2
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : yes

It seems that NUMERIC values are not packed correctly for PGCOPY binary stream. I tried passing int, float and Decimal values and got the same error insufficient data left in message. Here is my example:

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

from decimal import Decimal

async def run():
    conn = await asyncpg.connect(user='postgres')
    await conn.execute('create temporary table temp_table (c1 numeric)')
    #await conn.copy_records_to_table('temp_table', records=[(123,)])
    #await conn.copy_records_to_table('temp_table', records=[(0.123,)])
    await conn.copy_records_to_table('temp_table', records=[(Decimal('0.123'),)])

asyncio.get_event_loop().run_until_complete(run())

Here is the traceback:

# python3 test_pgcopy_numeric.py
Traceback (most recent call last):
  File "test_pgcopy_numeric.py", line 16, in <module>
    asyncio.get_event_loop().run_until_complete(run())
  File "/usr/lib/python3.5/asyncio/base_events.py", line 466, in run_until_complete
    return future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "test_pgcopy_numeric.py", line 14, in run
    await conn.copy_records_to_table('temp_table', records=[(Decimal('0.123'),)])
  File "/usr/lib/python3/dist-packages/asyncpg/connection.py", line 619, in copy_records_to_table
    copy_stmt, records, intro_ps._state, timeout)
  File "/usr/lib/python3/dist-packages/asyncpg/connection.py", line 741, in _copy_in_records
    copy_stmt, None, None, records, intro_stmt, timeout)
  File "asyncpg/protocol/protocol.pyx", line 445, in copy_in (asyncpg/protocol/protocol.c:66303)
  File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
asyncpg.exceptions.ProtocolViolationError: insufficient data left in message
@bennoleslie
Copy link

FWIW, I experience the same problem (Postgres 9.5.6 Python 3.6).

@bennoleslie
Copy link

I've done some digging here. I believe the root cause is the fact that the numeric codec is implemented as text protocol only (not binary). COPY IN requires all the data be encoded as binary. So the resolution of this would be changing/adding a binary codec for numeric (but then it is likely playing whack-a-mole, since it will be the next type that is only implemented as text causing a problem).

It would be nice to get a better error message. Ideally, there would be a way to pass format=PG_FORMAT_BINARY down to get_data_codec.

As far as I can tell that would somehow mean _ensure_rows_decoder needing to pass it in during the loop over all the columns. Potentially, a binary_only parameter could be added to _ensure_rows_decoder, which would then pass format=PG_FORMAT_BINARY to get_data_codec. That would provide a much better and more obvious error than what happens currently.

@bennoleslie
Copy link

Searching through to find which types (that are likely to be used) would cause a problem, and we have:

  • numeric
  • money
  • MAC address
  • tsearch (not sure this would be common on copy in?)

Money looks relatively easy to implement (see: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/cash.c) it seems like just an int64 encoding.

MAC address also looks relatively straight forward (see: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/mac.c#L161).

Numeric is a little more involved (see: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/numeric.c#L858)

Hopefully this is a starting point for someone.

@bennoleslie
Copy link

A potential problem with implementing this is how to handle display scale dscale. The structure is in postgres is:

typedef struct NumericVar
{
	int			ndigits;		/* # of digits in digits[] - can be 0! */
	int			weight;			/* weight of first digit */
	int			sign;			/* NUMERIC_POS, NUMERIC_NEG, or NUMERIC_NAN */
	int			dscale;			/* display scale */
	NumericDigit *buf;			/* start of palloc'd space for digits[] */
	NumericDigit *digits;		/* base-NBASE digits */
} NumericVar;

I think that ndigits, weight and sign can all be round-tripped via the Decimal type, but I'm not sure how dscale would be handled inside Python.

@bennoleslie
Copy link

I think a starting point for numeric_encode could be something like:

NUMERIC_POS = 0x0000
NUMERIC_NEG = 0x4000
NUMERIC_NAN = 0xC000

cdef numeric_encode(ConnectionSettings settings, WriteBuffer buf, obj):
    raw_sign, digits, exponent = obj.as_tuple()
    ndigits = len(digits)
    sign = NUMERIC_POS if py_sign == 0 else NUMERIC_NEG
    if exponent == 'nan':
         sign |= NUMERIC_NAN
	 weight = 0 ??
    else:
         weight = len(digits) - exponent - 1) # Check this
    dscale = ???
    buf.write_int16(ndigits)
    buf.write_int16(weight)
    buf.write_int16(sign)
    buf.write_int16(dscale)
    for digit in digits:
        buf.write_int8(digit)

Main thing missing is working out what dscale value is appropriate.

elprans added a commit that referenced this issue Jun 26, 2017
…odec.

COPY .. FROM .. (FORMAT binary) currently requires that all data is
encoded in binary format.  Hence, we need to check if we have
appropriate codecs for all columns before initiating CopyIn.

Issue: #157.
@elprans
Copy link
Member

elprans commented Jun 26, 2017

I added a check to raise a better error message when a binary format encode is unavailable.

@elprans
Copy link
Member

elprans commented Jun 26, 2017

As for the numeric encoding, dscale is actually display precision (number of digits after the dot). Basically, the negated exponent when the exponent is negative, otherwise zero.

@1st1
Copy link
Member

1st1 commented Jun 30, 2017

Can we close this issue now?

@sergeyspatar
Copy link
Author

I just retested the issue using current sources from master branch. Now it raises this error:

RuntimeError: no binary format encoder for type numeric (OID 1700)

So, it is still impossible to use NUMERIC columns in Connection.copy_records_to_table(). Thus, I think the issue should stay open.

@elprans
Copy link
Member

elprans commented Jun 30, 2017

There are two outstanding issues here: lack of binary codec for numeric and the missing fallback to text-based COPY for cases where there are missing binary encoders for the target columns.

@alex-eri
Copy link

macaddr not implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants