ENH: read_excel: Add a callable to access the Cell before getting the value

This issue has been created since 2022-11-23.

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

In pandas.read_excel it's not possible to access information about the cell object before converting to dataframe. In some cases as in #47269,#46895, and #49770 it's necessary to acces the cell, especially number_format to change the values

Feature Description

Add a new parameter to read_excel, something like value_of_cell, that receive a cell and return a value. In my case i change _convert_cell in _openpyxl to:

def _convert_cell(self, cell, convert_float: bool, value_of_cell:Union[None, Callable]) -> Scalar:
        from openpyxl.cell.cell import (
            TYPE_ERROR,
            TYPE_NUMERIC,
        )
        if cell.value is None:
            return ""  # compat with xlrd
        elif cell.data_type == TYPE_ERROR:
            return np.nan
        elif value_of_cell is not None: # this is new
            return value_of_cell(cell) # this is new
        elif cell.data_type == TYPE_NUMERIC:
            # GH5394, GH46988
            if convert_float:
                val = int(cell.value)
                if val == cell.value:
                    return val
            else:
                return float(cell.value)
        return cell.value

so in my code i call

def conv(cell):
            if cell.number_format.startswith("0"):
                return "0" * (len(cell.number_format) - len(str(cell.value))) + str(cell.value)
            else:
                return cell.value
 df = pd.read_excel(file, header=None, na_filter=False, dtype=str, value_of_cell:=conv)

of course is neccesary to edit all the calls from read_excel to _convert_cell ( i think is something like 10 functions), but i will create a pull request if can pass the test. I hope this is useful for the community

Alternative Solutions

Another option is to use openpyxl directly as, but loss the extra functions of read_excel and more code is needed:

        document = load_workbook(file)
        sheet = document.active
        for row in sheet:
            for cell in row:
              cell.value = value_of_cell(cell)
        df = pd.DataFrame(sheet.values)
        df = df.fillna('')

Additional Context

No response

Polandia94 wrote this answer on 2022-11-23

I think this could be a provisory solution to:
#20828
If value_of_cell is:

def value_of_cell(cell):
    if len(str(cell.value))> 15:
        return str(cell.value)
    else:
        return cell.value
More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-12-07
Star Count 36168
Watcher Count 1118
Fork Count 15472
Issue Count 3683

YOU MAY BE INTERESTED

Issue Title Created Date Comment Count Updated Date
Проблема с версиями компонента в HACS 5 2021-08-04 2022-10-17
Player settings and mux_video shortcode support 2 2021-06-04 2022-09-23
RSS Feed not showing properly after updating to PHP 8.0.x 2 2022-01-27 2022-11-12
Cant get email and name scopes 0 2022-05-28 2022-07-23
Install mynode on a running LVM root 3 2022-10-04 2022-11-29
Macro contracts not checked 1 2022-07-01 2022-10-24
Dropdown suggestion menu 1 2021-12-29 2022-10-11
是否可以控制某篇文章的目录显示层级? 1 2021-03-05 2022-01-14
failed tests in pytest-enabled rez-selftest are not getting picked up 0 2021-08-04 2022-10-24
Linux下外放好像完美解决了,不知道Mac能不能同样操作 1 2022-01-12 2022-11-17
Selector Execution Store 1 2022-03-30 2022-11-14
More examples and getting-started code demos needed 2 2022-03-24 2022-11-14
Playbook fails at ERROR: Misconfiguration detected! Unable to proceed as handlers will fail in the play later 12 2022-11-21 2022-11-23
Show a readable error message when there’s an error in a module 0 2022-06-30 2022-11-21
Connection Curves: Null Reference Exception when only using a head gaze pointer 1 2021-03-23 2022-10-28
Min and Max cannot be reflected and cannot be used in measures 5 2022-02-14 2022-12-03
Unable to transfer nep17 other then GAS or NEO using NetworkFacade class 1 2021-07-05 2022-08-07
Unable to perform a transfer using transfer.js example from next branch 3 2021-07-29 2022-11-04
Unnecessary sorting in addWtiness 0 2021-08-22 2022-10-23
Feature Request: Media import by URL 0 2022-05-05 2022-11-27
x86:parse bytes: “0x8E, 0xCF” to "mov cs, edi" 1 2021-03-15 2022-09-07
Colours mismatched 0 2021-03-16 2021-12-26
java-client release with Mac2Driver 3 2021-03-10 2022-11-05
`yarn test` fails on fresh install on mac 5 2021-06-23 2022-07-25
Non-English input is not supported 1 2022-08-14 2022-10-01
Please add support for bond network devices 0 2022-08-13 2022-10-01
[Bug] Setting `% !TeX program` disables custom `Build & View` pipeline 2 2022-09-06 2022-12-01
Avoiding XXE attacks 2 2021-07-14 2022-10-29
Crash: _Complex long double 0 2021-12-12 2022-08-16
Pihole, wowup and wago.io not working 4 2022-08-28 2022-11-26
Can i use it to share image 3 2019-01-29 2022-11-19
structs should default to "embed" behaviour if "cmd" isn't specified. 9 2021-05-09 2022-10-11
Merge 3.15.14 into 3.18.x 2 2022-10-25 2022-11-11
[docs] Add Chinese translation for WasmEdge Book 2 2022-01-18 2022-10-12
How to set mem and cpu resourcing in helm chart? 3 2021-08-09 2022-11-28
{{event}} macro produces links to old URLs in the all loclaes except en-US 2 2018-12-31 2022-11-25
Hokusai setup in the docs 2 2021-02-08 2022-11-22
Sony a7siii unlock language by usb pmca-gui-v0.18-6-g295cbc8-win.exe latest development build 2 2022-05-02 2022-11-02
Array values duplicated with OneOrMore options (...) 1 2021-01-25 2022-12-01
AndroidAlarmManager.oneShotAt not working 1 2022-02-01 2022-11-26
Two bad links 1 2021-10-05 2021-12-25
processingPipeline.Process fails for developer accounts on openshift without the service binding operator 12 2021-07-30 2022-09-27
syscall, os/exec: unsanitized NUL in environment variables - CVE-2022-41716 - [eks go1.15 backport] 1 2022-11-21 2022-11-28
first commands of GPU as .cuda() and etc running slow on the first GPU run 1 2022-08-13 2022-11-18
SQL: Join Column Type Mismatch 12 2021-12-13 2022-11-26
Make partial_fit compatible pipeline components. 1 2021-02-07 2022-11-22
LightGBMClassificationModel.fit gives raise Py4JNetworkError("Answer from Java side is empty") py4j.protocol.Py4JNetworkError: Answer from Java side is empty 1 2021-12-31 2022-10-13
flake: e2e test failures 3 2022-04-28 2022-10-05
In Sol->Yul, the code depends on AST IDs 1 2020-11-19 2022-07-23