BUG: Wrong index constructor if new DF/Series is created from DF/Series

This issue has been created since 2022-09-21.

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from pandas import DataFrame as pDF, Series as pSe

# INIT
# ────────────────────────────────────────────────────────────
# DATA
x = pDF({'A':[1,2,3,4],'B':[5,6,7,8]}, index=[1,0,1,0])

# NEW VAR TO BE ADDED IN SEVERAL WAYS
N = [1,2,3,4]


# (1) Works as expected → New variable from a List
# ────────────────────────────────────────────────────────────
x['N'] = N

# (1) RESULT → OK
	A	B	N
1	1	5	1
0	2	6	2
1	3	7	3
0	4	8	4


# (2) Works as expected → New variable from pSe/PDF
# ────────────────────────────────────────────────────────────
x['N'] = pDF(N, index=x.index)
# OR 
x['N'] = pSe(N, index=x.index)
# Produces the same correct result

# (2) RESULT → OK
	A	B	N
1	1	5	1
0	2	6	2
1	3	7	3
0	4	8	4

# (3) DOESN'T Work as expected → New variable from ► INNER pSe/PDF ◄
# ────────────────────────────────────────────────────────────
x['N'] = pDF(pSe(N), index=x.index)
# OR 
x['N'] = pSe(pSe(N), index=x.index)
# Produces WRONG Result as index mismatch

# (3) RESULT → WRONG → INDEX MISMATCH
	A	B	N
1	1	5	2
0	2	6	1
1	3	7	2
0	4	8	1

# (4) WRONG behavior can be overcome by using set_index as follows:
x['N'] = pDF(pSe(N)).set_index(x.index)
# Which produces correct result

Issue Description

Working with variables, and creating new variables is a crucial component of pandas.
Thus I assume that the construct of the new variable should be consistent regardless of using index= or set_index() method.

The example above shows that issue is inside the constructor when an inner dataframe or series is used and a non-unique index is present in the targeted dataframe → in such a situation using index= param during the constructor doesn't work as expected and it is inconsistent with the rest of the approachs.

Expected Behavior

Consistent behavior of constructor for index= and set_index regardless of input object to DataFrame/Serie.

Installed Versions

INSTALLED VERSIONS

commit : e8093ba
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.4.3
numpy : 1.20.3
pytz : 2021.3
dateutil : 2.8.2
setuptools : 58.0.4
pip : 21.2.4
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
brotli :
fastparquet : None
fsspec : 2021.10.1
gcsfs : None
markupsafe : 1.1.1
matplotlib : 3.5.2
numba : 0.54.1
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None

rhshadrach wrote this answer on 2022-09-24

Thanks for the report! I believe what you're highlighting is intended behavior. When data is a DataFrame or Series and the index argument is specified, pandas will align the data with the provided index. I thought this was well documented in the Series / DataFrame constructor, but I'm not seeing that now. We may have another issue about documenting this.

MichalRIcar wrote this answer on 2022-09-24

Thank you for the reply.
From MPOV the issue is that behavior is dangerously inconsistent:
Adding a new variable to DataFrame from another DF/Serie, which has a different index from targeted DF:
► x['N'] = pSe(pSe(N), index=x.index)
► then constructor behaves like pd.concat and silently ignoring specified index=x.index ◄ as the example above shows.

To wrap it up:
x['N'] = pSe(pSe(N), index=x.index) == pd.concat([x, pSe(N)], 1)
► silently ignoring key factor index=x.index

however

other approaches like
x['N'] = pSe(pSe(N)).set_index(x.index)
x['N'] = pSe(N, index=x.index)
x['N'] = list(N)
etc.. producing correct-expected results and not behaving like pd.concat.

The problem from my side is that this silent behavior of ignoring the specified index and acting as pd.concat is the only one of the 4 approaches above.

Thus, to conclude, shouldn't the result table be just same regardless using pDF(index = x.index ) or .(set_index(x.index))?

rhshadrach wrote this answer on 2022-09-25

x['N'] = pSe(pSe(N), index=x.index)

This combines two different operations:

  1. the creation of a Series by providing a Series for data and specifying index; and
  2. the assignment of a Series to a DataFrame

We can only determine the behavior of each one individually, as this logically determines what happens when you performance them sequentially. Can you specify which of these two do you think behaves improperly and specify why? Include both if you believe they are both incorrect, but do so individually.

MichalRIcar wrote this answer on 2022-09-26

Hello,

thank you, a good idea to reduce it to the elements.

The main point is that pandas constructor of new DataFrame or Serie with input as another panda (DF/SE) works as a pd.concat as the result shows:

image

The problem is that the second line is just the same with a slight difference - instead of using index=x.index inside pDF(), we define specify index outside via set_index(x.index).

Thus, the new DataFrame constructor with inner index specification works as pd.concat, not as a new dataframe... if it's intentional, then I believe should be highlighted in the documentation as the behavior is not compatible with other approaches and in general pandas behavior..

rhshadrach wrote this answer on 2022-09-26

Thanks for elaborating here. Constructing a DataFrame from a Series while specifying an index will align the Series to the provided index:

ser = pd.Series([3, 4], index=[1, 2])
print(ser)
# 2    3
# 1    4
# dtype: int64

df = pd.DataFrame(ser, index=[1, 2, 3])
print(df)
#      0
# 1  4.0
# 2  3.0
# 3  NaN

Using .set_index on the other hand replaces the existing index without modifying the values. Is it these two behaviors which you view as being incompatible?

In any case, the previous issue I referenced above was #42818. This was closed by adding the line

If a dict contains Series which have an index defined, it is aligned by its index.

which is good, but I think there could still be improvement here. Namely, mentioning alignment with Series or DataFrame inputs (and not just those in a dictionary).

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-09-29
Star Count 35374
Watcher Count 1122
Fork Count 15034
Issue Count 3579

YOU MAY BE INTERESTED

Issue Title Created Date Comment Count Updated Date
Error when I'm trying to use torch-tensorrt 3 2022-08-30 2022-09-20
about BART generation 3 2020-08-11 2022-05-17
So AE models can't generate text right? 1 2020-12-06 2022-05-17
COMPILATION FAILED! No makefile found. 1 2018-11-25 2022-07-23
관리자 계정을 생성하고 전체, 단건으로 조회한다. 0 2022-08-18 2022-09-14
Compiling mod issue. 2 2021-07-07 2022-08-31
auth/heroku GET requests are insecure according to omniauth 1 2021-01-21 2022-09-10
scp_ok = scp("existing.script", "non-existing-hostname"); never returns 1 2022-01-20 2022-09-06
Update github actions to lastest ubuntu 1 2021-10-15 2022-09-18
Add continent field to iNaturalist? 3 2021-10-08 2022-09-18
Post Processing Failed when Quick-check renaming occurs 7 2022-07-26 2022-09-22
Multilevel menu support 8 2018-07-19 2022-08-26
APC status light remains on after being engaged 2 2021-12-19 2022-08-21
How to run ganache in browser 2 2021-11-30 2022-07-20
Not working with latest zxing version. 4 2021-04-21 2022-09-25
SSR Support 3 2021-08-17 2021-11-19
Pagination 6 2020-09-27 2022-01-15
Skins does not work 2 2020-09-26 2022-01-15
Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com 3 2022-05-02 2022-09-23
Light.pg is unnecessarily large and takes long to load 0 2020-12-29 2022-09-11
Black Screen when the runner installs a card hosting counters 1 2021-10-08 2022-09-17
WebTarget-based API 0 2019-08-25 2022-01-15
Protect stats page with basic auth 1 2021-03-24 2022-09-15
BUG: Possibly tree-shaking issue(not enabled by default on v2) 3 2021-10-11 2022-08-30
pnpm workspace not work 4 2021-10-13 2022-08-19
Cryogen 0.3.x is not working on windows 5 2020-02-13 2022-09-15
UNION-ing two disparate records, custom prefetched rows? 5 2021-07-06 2022-08-14
OSX ChromeNotInstalledError - My resolution 3 2020-06-18 2022-09-21
Editor not working 2 2021-02-19 2022-01-08
Image Not Saving To Gallery 1 2021-02-10 2022-08-22
Todo - Figure out why the editor highlights are offset in Sublime-linter 1 2017-07-13 2022-09-22
Mancano elementi determinati dall'avviso 25 1 2021-02-17 2022-09-24
Support reading configuration files from dependencies in the classpath 5 2022-08-21 2022-09-21
i2c.c anomalies (IDFGH-7991) 1 2022-08-05 2022-09-21
Feature request: Add analysis_optionns.yaml to generated project. 0 2021-02-23 2022-09-07
didn't present 5 2021-06-02 2022-01-15
bug: Capacitor Browser not working in Android - build error 1 2022-09-19 2022-09-21
handleStrangeRoleResources / Error linking 2 2022-04-17 2022-09-21
Tutorials page has title "Archive" 2 2018-08-06 2022-09-07
Change the color of the app icon 2 2022-08-11 2022-09-09
Add ergonomic support for nested routers 15 2021-05-17 2022-09-21
Android TV H5 demo应用 — 朱皮特的烂笔头 0 2021-06-25 2022-09-29
为啥子应用打包后静态资源地址会出现undefined,在umirc.ts中将runtimePublicPath设置为false就好了,这样设置会不会有什么问题啊? 2 2022-09-06 2022-09-23
qiankun插件可以在基座动态加载子应用路由? 0 2022-09-08 2022-09-06
cshark incompatable with libustream-openssl 0 2022-09-04 2022-09-27
[SUPPORT] JDBC Catalog, Avoid strong dependencies on hadoop and hive metastore 1 2022-03-01 2022-09-16
To improve test suites 1 2022-04-21 2022-09-29
RFC: useStateRef 11 2021-08-24 2022-01-19
Test failure System.Text.Json.Serialization.Tests.WriteValueTests.SerializeExceedMaximumBufferSize 3 2022-03-11 2022-08-04
OmniAuth SAML plugin issue 6 2015-03-03 2022-07-19