BUG: setting dtype='timestamp[ns][pyarrow]' removes precision from strings

This issue has been created since 2022-11-18.

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas import *
import sys
import datetime as dt
import pyarrow as pa
import io

data = [dt.datetime(2020, 1, 1, 1, 1, 1, 1), dt.datetime(1999, 1, 1)]

ser = pd.Series(data, dtype='timestamp[ns][pyarrow]')
print(ser)
print()
ser = pd.Series(data, dtype='datetime64[ns]')
print(ser)

Issue Description

The above outputs:

0    2020-01-01 01:01:01.000001
1           1999-01-01 00:00:00
dtype: timestamp[ns][pyarrow]

0   2020-01-01 01:01:01.000001
1   1999-01-01 00:00:00.000000
dtype: datetime64[ns]

Note how, in the pyarrow case, the date formats are different.

This causes issues when parsing with to_datetime and specifying format=:

In [1]: data = [dt.datetime(2020, 1, 1, 1, 1, 1, 1), dt.datetime(1999, 1, 1)]

In [2]: ser = pd.Series(data, dtype='timestamp[ns][pyarrow]')
   ...: string_data = ser.to_csv(index=False, header=None).splitlines()
   ...: pd.to_datetime(string_data, format='%Y-%m-%d %H:%M:%S.%f')
   ...: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [2], line 3
      1 ser = pd.Series(data, dtype='timestamp[ns][pyarrow]')
      2 string_data = ser.to_csv(index=False, header=None).splitlines()
----> 3 pd.to_datetime(string_data, format='%Y-%m-%d %H:%M:%S.%f')

File ~/pandas-dev/pandas/core/tools/datetimes.py:1110, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
   1108         result = _convert_and_box_cache(argc, cache_array)
   1109     else:
-> 1110         result = convert_listlike(argc, format)
   1111 else:
   1112     result = convert_listlike(np.array([arg]), format)[0]

File ~/pandas-dev/pandas/core/tools/datetimes.py:436, in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    433         return res
    435 utc = tz == "utc"
--> 436 result, tz_parsed = objects_to_datetime64ns(
    437     arg,
    438     dayfirst=dayfirst,
    439     yearfirst=yearfirst,
    440     utc=utc,
    441     errors=errors,
    442     require_iso8601=require_iso8601,
    443     allow_object=True,
    444     format=format,
    445     exact=exact,
    446 )
    448 if tz_parsed is not None:
    449     # We can take a shortcut since the datetime64 numpy array
    450     # is in UTC
    451     dta = DatetimeArray(result, dtype=tz_to_dtype(tz_parsed))

File ~/pandas-dev/pandas/core/arrays/datetimes.py:2144, in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object, format, exact)
   2142 order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"
   2143 try:
-> 2144     result, tz_parsed = tslib.array_to_datetime(
   2145         data.ravel("K"),
   2146         errors=errors,
   2147         utc=utc,
   2148         dayfirst=dayfirst,
   2149         yearfirst=yearfirst,
   2150         require_iso8601=require_iso8601,
   2151         format=format,
   2152         exact=exact,
   2153     )
   2154     result = result.reshape(data.shape, order=order)
   2155 except OverflowError as err:
   2156     # Exception is raised when a part of date is greater than 32 bit signed int

File ~/pandas-dev/pandas/_libs/tslib.pyx:442, in pandas._libs.tslib.array_to_datetime()
    440 @cython.wraparound(False)
    441 @cython.boundscheck(False)
--> 442 cpdef array_to_datetime(
    443     ndarray[object] values,
    444     str errors='raise',

File ~/pandas-dev/pandas/_libs/tslib.pyx:622, in pandas._libs.tslib.array_to_datetime()
    620     continue
    621 elif is_raise:
--> 622     raise ValueError(
    623         f"time data \"{val}\" at position {i} doesn't "
    624         f"match format \"{format}\""

ValueError: time data "1999-01-01 00:00:00" at position 1 doesn't match format "%Y-%m-%d %H:%M:%S.%f"

Expected Behavior

If it printed

0    2020-01-01 01:01:01.000001
1    1999-01-01 00:00:00.000000
dtype: timestamp[ns][pyarrow]

, just like it does for datetime64[ns], then the issue would be solve

Installed Versions

INSTALLED VERSIONS

commit : 8020bf1
python : 3.8.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.0.dev0+697.g8020bf1b25.dirty
numpy : 1.23.4
pytz : 2022.6
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.56.4
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.6.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 0.8.3
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.3
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.9
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.43
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.11.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None
None

MarcoGorelli wrote this answer on 2022-11-18

This only became apparent when the historic issue of to_datetime not respecting exact for ISO8601 formats was solved #49333

MarcoGorelli wrote this answer on 2022-11-18

Much simpler reproducer of the underlying issue:

In [1]: str(Timestamp(2020, 1, 1, 1, 1, 1, 1))
Out[1]: '2020-01-01 01:01:01.000001'

In [2]: str(Timestamp(2020, 1, 1, 1))
Out[2]: '2020-01-01 01:00:00'

Same thing happens in the standard library:

In [5]: str(dt.datetime(2000, 1, 1, 1, 1, 1))
Out[5]: '2000-01-01 01:01:01'

In [6]: str(dt.datetime(2000, 1, 1, 1, 1, 1, 123))
Out[6]: '2000-01-01 01:01:01.000123'

Could it be that

raise TypeError(
"ExtensionArray formatting should use ExtensionArrayFormatter"
)

should always print with %Y-%m-%d %H:%M:%S.%f?
Maybe this only needs doing when saving to csv?
Will look into this more

MarcoGorelli wrote this answer on 2022-11-18

Looks like date_format in to_csv doesn't work with the arrow type either:

In [1]: data = [dt.datetime(2020, 1, 1, 1, 1, 1, 1), dt.datetime(1999, 1, 1)]

In [2]: ser1 = pd.Series(data, dtype='timestamp[ns][pyarrow]')
   ...: ser2 = pd.Series(data, dtype='datetime64[ns]')
   ...: 

In [3]: ser1.to_csv(date_format='%d-%m-%Y')
Out[3]: ',0\n0,2020-01-01 01:01:01.000001\n1,1999-01-01 00:00:00\n'

In [4]: ser2.to_csv(date_format='%d-%m-%Y')
Out[4]: ',0\n0,01-01-2020\n1,01-01-1999\n'
MarcoGorelli wrote this answer on 2022-11-18

I think this might be the issue

if isinstance(arr, np.ndarray):

here arr.dtype.kind is 'M', not sure why it's excluded.

Would

-    if isinstance(arr, np.ndarray):
+    if isinstance(arr, np.ndarray) or is_extension_array_dtype(arr.dtype):

work? Will try

mroeschke wrote this answer on 2022-11-18

ensure_wrapped_if_datetimelike is called in a lot of places, so I'm not sure if it will always be valid to convert an ArrowExtensionArray (with a datelike type) to a DatetimeArray/TimedeltaArray

cc @jbrockmendel

jbrockmendel wrote this answer on 2022-11-18

haven't looked closely at the rest of the thread, but ensure_wrapped_if_datetimelike should only wrap np.ndarrays

MarcoGorelli wrote this answer on 2022-11-18

ok thanks, I'll see where else this could be handled

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-12-07
Star Count 36164
Watcher Count 1118
Fork Count 15472
Issue Count 3683

YOU MAY BE INTERESTED

Issue Title Created Date Comment Count Updated Date
Integrity check failed for "follow-redirects" 4 2022-02-17 2022-11-29
request.setTimeout stopped working properly 4 2022-01-17 2022-11-29
Crash when joining singleplayer 2 2021-04-08 2022-11-15
Floating Point Imprecision when resolving Collisions causes inexistent Collision 1 2021-05-07 2022-11-29
Basemap is blur and throw System.InvalidOperationException 1 2021-04-02 2022-11-29
Microsoft Visual C++ runtime library error 9 2022-01-23 2022-10-16
How to make water get darker closer to the bottom 6 2020-07-17 2022-12-02
Enhancement request : shortcut for jumping to immediate parent block 1 2020-11-24 2022-11-24
Notify: jsdelivr is down in China 1 2022-05-18 2022-11-24
Persistent TCP on the server side [enhancement] 18 2015-10-30 2022-10-16
After add "@capacitor-community/firebase-analytics": "^1.0.1", problem with Android app build 0 2022-09-01 2022-09-13
ts.vim 重新输入过快导致 ts completion 无法正确触发 0 2021-05-19 2022-08-29
Remove grey background on images 4 2022-02-18 2022-08-03
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte 17 2021-03-03 2022-12-06
Primary email address is not set when signing up 4 2022-10-19 2022-11-23
Add option to sync animation playback for better comparison 1 2021-04-27 2022-10-11
Doesnt work with the new airship update. 5 2021-03-31 2022-11-16
Rename `ConfigGenesis` into `GenesisConfig` 0 2022-10-21 2022-11-20
Publish with genesis file bufio.Scanner: token too long 5 2022-10-12 2022-11-20
[Menu] Introduce new `data-active` state attribute 0 2022-03-31 2022-11-15
【紧急】虚拟机安装Apollo6.0开发环境-构建失败-[ERROR] Build failed! 3 2021-09-10 2022-12-07
Gesture support in non- minimalistic normal mode 0 2022-11-25 2022-11-29
`pulp publish` fails for large bower resolution file 0 2022-06-11 2022-11-22
nosync makes F34/EL8/EL7 chroots not work on F35 host: /lib64/libc.so.6: version `GLIBC_2.34' not found 3 2021-12-29 2022-11-24
Minecraft shows black screen after launching the game 2 2022-03-19 2022-07-11
insights: spike on language stats over time 3 2022-05-04 2022-11-24
增加计算算力FlOPs和参数量的profile模块 1 2021-02-23 2022-11-24
Document `.^create` 0 2019-08-24 2021-10-22
Value cannot be null. Parameter name: uri 2 2021-10-21 2022-10-18
[5.1] [apple] support tvos_sim_arm64 in toolchain 0 2022-02-09 2022-03-29
Add timeout experimental feature to the spec 3 2021-09-27 2022-12-06
Testing Data 3 2021-08-24 2021-10-19
Add v1.0.1 to CHANGELOG 0 2021-07-28 2022-09-19
Crash in as_node_ensure_login_shm when remote servers are restarted 16 2022-01-05 2022-12-04
Function type alias triggers compiler bug 0 2022-11-05 2022-12-01
[Dependencies.swift][SPM] Crash in swift-atomics when install through tuist Dependencies 2 2022-09-22 2022-12-04
Clarify/Fix monorepo package resolution 1 2022-09-05 2022-10-18
Supporting multi-clock discrete systems 3 2021-07-22 2022-11-23
Image Appears for a single second and disappears. 0 2021-10-16 2022-09-28
how do i get the traceback from a `HandlerExecutionFailed` in 5.0? 2 2021-12-15 2022-11-18
Bump lodash from 4.17.11 to 4.17.21 0 2021-05-08 2022-09-29
Volume shortcuts don't work again (v 3.850) 2 2021-12-13 2022-10-12
Solved issue #115 0 2021-08-26 2022-01-15
ICE: rustdoc on garbage code 0 2022-12-05 2022-11-30
Compiling fails on FreeBSD 3 2022-07-26 2022-12-04
Problem with qbittorrent web ui in latest image 11 2021-01-24 2022-12-02
support of remote dev 4 2022-02-09 2022-12-01
Up-to-date check of “listProductsReleases” ignores “sinceBuild” and “untilBuild” 1 2022-02-03 2022-12-01
Token icon size is not consistent 1 2021-11-29 2022-12-01
ReferenceError: Blob is not defined 2 2021-09-23 2022-12-06