ENH: Do not require to sort entire DF if `by` option used in `merge_asof`

This issue has been created since 2022-11-21.

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

As a pandas user I'd like to have following behavior when using merge_asof.
Right now it requires to sort entire DF, but it looks like there is no need to do that if by option used.
Let me try to explain with example:
If we have 2 dataframes:

main_df = pd.DataFrame({'id': ['1', '1', '1', '2', '2'], 'tracking': [1, 4, 7, 1, 5]})

  id  tracking
0  1         1
1  1         4
2  1         7
3  2         1
4  2         5

measurements = pd.DataFrame({'id': ['1', '1'], 'position': [2, 5], 'value': [100, 150]})

  id  position  value
0  1         2    100
1  1         5    150

And we want to use merge_asof to join them

pd.merge_asof(
    left=main_df, 
    right=measurements, 
    by='id', 
    left_on='tracking', 
    right_on='position', 
    direction='nearest')

Since left df is not sorted we face error:
ValueError: left keys must be sorted

So we need to sort left df first:

pd.merge_asof(
    left=main_df.sort_values('tracking'),
    right=measurements,
    by='id',
    left_on='tracking',
    right_on='position',
    direction='nearest'
)

But the sort order in result not so obvious as origin sort where each segment with given id was sorted independent.

0  1         1       2.0  100.0
1  2         1       NaN    NaN
2  1         4       5.0  150.0
3  2         5       NaN    NaN
4  1         7       5.0  150.0

Feature Description

It would be nice if pandas require sort only segment, defined by by argument in merge_asof function.

Alternative Solutions

Haven't seen any alternatives.

Additional Context

No response

samukweku wrote this answer on 2022-11-22

@filippzorin if you look at the implementation for merge_asof, the iteration is done on the non-equi join, and if there is a match, then check if the equi join matches, and then break. Crude interpretation of what happens. So, the search doesnt happen on the equi first, but on the non-equi

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-12-07
Star Count 36164
Watcher Count 1118
Fork Count 15472
Issue Count 3683

YOU MAY BE INTERESTED

Issue Title Created Date Comment Count Updated Date
Can Model's FileField import automated while create objects? 0 2021-10-17 2022-11-21
Did not update the self.category_name_to_id in remap function. 0 2022-04-22 2022-10-10
keygen (!!!不要修改这里!!!) 1 2022-06-05 2022-06-05
keygen (!!!不要修改这里!!!) 1 2022-06-05 2022-06-13
keygen (!!!不要修改这里!!!) 1 2022-06-05 2022-06-10
PageSpeed Insights reports error loading usefathom 0 2022-01-23 2022-10-02
recipes/reactor 中所有例子 make 之后显示 error 如下: ambigous ???? 2 2020-04-18 2022-10-10
NSF ripping tips for Mesen-X? 0 2022-10-23 2022-11-20
domain manager fails to work when an api gateway resource is defined (only) 8 2019-04-26 2022-11-21
Unc0ver Jailbreak Struck at 29 1 2021-10-27 2022-09-29
[SOLVED] "MHD-connection" received signal SIGSEGV, Segmentation fault with strlen() 4 2022-07-13 2022-12-01
List of JupyterCon Sprint Ideas? 1 2017-03-04 2022-11-13
C# how to determine the size of cloud blob directory and container? 4 2019-01-14 2022-12-05
subjectDirectoryAttributes doesn't seem to output correctly 0 2021-04-15 2022-11-23
Implementing measuring handles 13 2021-03-18 2022-11-28
Add RedisBloom module examples. 2 2022-01-13 2022-09-25
[Bug] Negative time values shown as `#####` 2 2022-02-10 2022-09-21
Possible to customize the root directory (iOS) ? 0 2021-12-02 2022-10-05
clang crash in ci 2 2021-12-24 2022-01-16
docker-compose does not work due to postgres error 1 2021-10-17 2022-11-28
Support char -> string similarly to T -> seq[T] 0 2021-06-21 2022-10-07
Setup automatic merge of master to live 3 2021-01-20 2022-11-23
Unable to run `sst test` when using Python for functions 2 2022-04-10 2022-09-18
Backspace hides caret when beyond line end 13 2022-01-03 2022-10-23
Performance improvement: whitelist (or blacklist) for events fields 1 2021-03-05 2022-12-05
Pin base image in script/Dockefile.alpine to alpine:3.12 0 2021-07-15 2022-10-23
Support for Adobe Animate Atlas 2 2021-07-27 2022-10-24
Issue installing `pg_cron` locally 10 2022-01-31 2022-11-25
TypeScript adjustments 1 2022-05-26 2022-10-06
[BUG] missing icons in Rich text editor. Icon file is missing 0 2021-02-01 2022-10-29
BMCWEB has several non-const global variables 0 2021-02-18 2022-11-21
Can't get server to autostart on boot 14 2022-04-17 2022-10-21
boolean arguments in fabfiles are cleaned-up (and fixed) 0 2022-09-28 2022-12-01
occasional screwup in send_activity without clear error messaging 15 2022-07-20 2022-11-26
[FEATURE] LayoutBoundary component 2 2021-03-11 2022-10-07
Proper formatting for code per file type 5 2022-07-21 2022-09-06
Mark selected changesets visually 0 2021-10-11 2022-11-27
ASensorManager_getDefaultSensor returns NULL 7 2020-07-24 2022-08-05
The theme only takes effect on some elements 2 2021-12-01 2022-02-01
Increase concurrency level of available regions when CosmosClient starts 11 2021-09-08 2022-11-05
segmentation fault for pcl icp implementation in pytorch cpp extension 9 2018-10-05 2022-12-06
Firefox check boxes and radio buttons too large 9 2018-02-17 2022-10-30
com.example.vision.DetectWebDetectionsTest: detectWebAnnotations failed 1 2021-08-10 2022-11-30
Build Error At Dagger Version 2.40 and Kotlin Version 1.5.31 8 2021-11-12 2022-11-18
False alarm from msan with << operator of std::stringstream 3 2021-09-17 2022-11-08
Hardware Encoding not working (VAAPI) 3 2022-03-07 2022-11-18
[PowerRename] New Fluent UI is less screen space efficient 22 2021-10-31 2022-10-23
Request: for always-on-top feature, allow to choose apps that by default will be on top when opened 1 2022-04-20 2022-09-29
list selector in actionmode 1 2014-02-18 2022-11-24
Provide docs on how to integrate custom msbuild target with dotnet watch and hot reload 3 2021-12-17 2022-10-30