BUG: transform("max") replaces all entries with NaN when applied after groupby with as_index=False

This issue has been created since 2022-11-22.

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame({"key": ["a", "b", "c", "b", "a"], "value": [False, False, True, False, True]}, index=[7, 2, 5, 3, 9])

df

Out[4]:
  key  value
7   a  False
2   b  False
5   c   True
3   b  False
9   a   True

df.groupby("key", as_index=False).transform("max")

Out[5]:
   key value
7  NaN   NaN
2  NaN   NaN
5  NaN   NaN
3  NaN   NaN
9  NaN   NaN

df.groupby("key", as_index=True).transform("max")

Out[6]:
   value
7   True
2  False
5   True
3  False
9   True

Issue Description

When using transform("max") after groupby with as_index=False, all entries in the original dataframe are converted to NaN. This is not only the case for "max" but also for "min" as well as "sum" or "mean" when having integers in the "value" column.

When using as_index=True instead, the transform works as expected, however, the "key" column does not appear as an index but simply disappears from the dataframe.

Expected Behavior

  1. I would expect that when using as_index=False only the entries in the "value" column change, i.e. change to

True
False
True
False
True

  1. I would expect that when using as_index=True the "key" column appears as an index.

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.104-linuxkit
Version : #1 SMP Thu Mar 17 17:08:06 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.1
numpy : 1.23.4
pytz : 2022.4
dateutil : 2.8.2
setuptools : 58.1.0
pip : 21.2.4
Cython : None
pytest : 7.2.0
hypothesis : 5.23.7
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.02.0
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 5.0.0.2
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.0
snappy : None
sqlalchemy : 1.3.24
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : 0.14.0
tzdata : None

phofl wrote this answer on 2022-11-22

This worked in 1.4.4, but I don't think that this should work. max is a reducer and should be used with agg and not transform.

cc @rhshadrach Can you weigh in?

asishm wrote this answer on 2022-11-22

related : #37093

rhshadrach wrote this answer on 2022-11-22

Thanks for the report! Haven't run a git bisect but it will undoubtedly point to one of my PRs. Should be able to get a fix in for 1.5.3.

@phofl

I don't think that this should work. max is a reducer and should be used with agg and not transform.

When given a reducer, transform will compute the result and then broadcast it to each group. E.g.

df = pd.DataFrame({"a": [1, 1, 2, 2, 2], "b": range(5)})
result = df.groupby("a", as_index=True).transform("max")
print(result)

#    b
# 0  1
# 1  1
# 2  4
# 3  4
# 4  4
rhshadrach wrote this answer on 2022-11-22

@Verena1695

2. I would expect that when using as_index=True the "key" column appears as an index.

as_index only has an effect (or at least, should 😆) when doing an aggregation, not a transform; see the documentation here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

The result you want can be achieved by

df = pd.DataFrame({"key": ["a", "b", "c", "b", "a"], "value": [False, False, True, False, True]}, index=[7, 2, 5, 3, 9])
transformed = df.groupby("key").transform("max")
result = pd.concat([df["key"], transformed], axis=1)
More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-12-07
Star Count 36164
Watcher Count 1118
Fork Count 15472
Issue Count 3683

YOU MAY BE INTERESTED

Issue Title Created Date Comment Count Updated Date
App badge is not cleared after reading notifications 2 2022-07-18 2022-08-08
Unable to implement the library after jcenter is shutdown 4 2022-01-12 2022-11-06
aPuppet client is starting itself? 1 2021-06-29 2022-11-10
ignoreWhitespace: true apparently ignores only leading whitespace 3 2020-06-09 2022-10-31
FATAL ERROR Committing semi space failed. Allocation failed - JavaScript heap out of memory 1 2022-05-08 2022-10-25
[BUG] the add and remove commands don't get syntax highlighting 5 2022-02-24 2022-10-27
Duende IdentityServer does not validate introspection requests for Refresh Tokens 0 2022-11-24 2022-12-03
Is elastalert index created as shipped? should be 0 2020-08-12 2022-10-04
How To Remove some pair (key, minhash) from MinHashLSH? 3 2020-10-21 2022-12-04
The notification system is very nice, but 0 2021-10-10 2022-11-11
[Feature] Remove Incoming Peering 2 2021-10-18 2022-11-07
How can we implement Select/Unselect All functionality 5 2018-10-01 2022-08-13
ReferenceError: Element is not defined when using `Combobox` 1 2022-06-07 2022-12-02
chown on vm.tty sometimes fails 4 2017-02-15 2022-11-23
Adding borders to labels 0 2021-11-10 2022-02-19
Setting Tanzu Bash completion causes error : storage: object doesn't exist 5 2022-03-28 2022-10-17
[New Chain / Network] 0 2022-01-18 2022-12-02
Support using keys from IMDS as TrustedUserCAKeys 4 2021-02-03 2022-07-18
Boyfriend Animate FLA 0 2021-07-30 2022-09-23
Rename "Details" to "Info" 0 2021-09-07 2022-12-05
Not all recommended packages available in Ubuntu 18.04 LTS repos 1 2022-01-05 2022-11-15
Download RC version reports success, but acutally fails 7 2022-05-30 2022-11-29
Optionally disabling pencil smoothing 1 2021-12-31 2022-11-13
Combine Synced/Syncing into Connectivity popup 0 2021-11-06 2022-11-19
Syntax for optional arguments in docstrings 1 2021-09-20 2022-11-12
sql: Support DISCARD SEQUENCES 0 2022-08-25 2022-08-25
RFC: Custom Prerendering / SSR 2 2020-04-03 2022-11-28
Slice `got` turns to different output than non-slice 2 2021-01-06 2022-11-26
Please add support for safari officially we are ready to help. 0 2021-05-17 2022-09-24
PopupMenu posititioning should be flipped in RTL layout 0 2021-09-17 2022-08-17
[email protected]"^17.0.1" from the root project, peer [email protected]"^0.14.9 || ^15.3.0 || ^16.0.0" from [email protected] 2 2021-04-30 2022-11-22
Install specific functions 502 Server Error 1 2022-03-04 2022-12-03
Sidebar with appendTo appears shortly during destruction 0 2021-11-01 2022-09-17
Not include Imagestream tags in Deployment Config 2 2022-09-06 2022-10-12
Limit traceback to one entry by default 5 2022-07-27 2022-11-13
clever-f triggered by any use of f or t in normal mode 1 2022-05-14 2022-11-25
TIC-80 exports 4 GB of empty audio 4 2021-08-25 2022-11-19
How to do if the routing_url is a list? 1 2022-06-21 2022-11-15
Typo and awkward grammar in sec 1.2 0 2022-08-21 2022-11-30
[MPS] Add support for aten::expm1.out for MPS backend 0 2022-10-11 2022-10-11
Automate generation of new releases PRs 0 2021-08-25 2022-11-11
expand test selection to include platform-specific extensions 0 2022-05-11 2022-11-20
Mobile menu not showing menu items 1 2022-03-30 2022-11-28
UI Tear Sheet Dialog examples have an additional PressableButton spanning the width as the expected two buttons 2 2022-08-18 2022-09-09
can't firmware 1 2017-02-23 2022-01-23
BUG: Resizing canvas makes componenents end up with wrong positions once exported with dmode 'absolute' 2 2022-02-24 2022-11-21
having peerDevDependencies 1 2022-03-14 2022-11-29
Linker error on TF 2.6.2 5 2021-11-30 2022-09-04
nasm Error when compilig for other x86_64 architecture. 4 2021-11-29 2022-10-27
Misleading reference to SetSourceRevisionId target 0 2021-08-01 2022-12-05