QST: handle lines with less separators than main lines in read_csv

This issue has been created since 2022-09-23.

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/73820090/make-pandas-read-csv-to-not-add-lines-with-less-columns-delimiters-than-the-main

Question about pandas

The on_bad_lines=warn question me. Pandas team added the functionality to directly handle lines with more separators than the main lines, that's why it don't seems strange to me that some other pandas option could handle in the same way lines with less separators than main lines

Using pandas.read_csv with on_bad_lines='warn' option the case there is too many
columns delimiters work well, bad lines are not loaded and stderr catch the bad lines
numbers:

    import pandas as pd
    from io import StringIO
    data = StringIO("""
    nom,f,nb
    bat,F,52
    cat,M,66,
    caw,F,15
    dog,M,66,,
    fly,F,61
    ant,F,21""")
    df = pd.read_csv(data, sep=',', on_bad_lines='warn')

    b'Skipping line 4: expected 3 fields, saw 4\nSkipping line 6: expected 3 fields, saw 5\n'

    df.head(10)
    #    nom  f  nb
    # 0  bat  F  52
    # 1  caw  F  15
    # 2  fly  F  61
    # 3  ant  F  21

But in case the number of delimiter (here sep=,) is less than the main, the line
is added adding NaN.:

    import pandas as pd
    from io import StringIO
    data = StringIO("""
    nom,f,nb
    bat,F,52
    catM66,
    caw,F,15
    dog,M66
    fly,F,61
    ant,F,21""")
    df = pd.read_csv(data, sep=',', on_bad_lines='warn', dtype=str)
    df.head(10)

    #       nom    f   nb
    # 0     bat    F   52
    # 1  catM66  NaN  NaN            <==
    # 2     caw    F   15
    # 3     dog  M66  NaN            <==
    # 4     fly    F   61
    # 5     ant    F   21

Is there a way to make read_csv to not add lines with less columns delimiters than
the main lines ?

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-09-29
Star Count 35374
Watcher Count 1122
Fork Count 15034
Issue Count 3579

YOU MAY BE INTERESTED

Issue Title Created Date Comment Count Updated Date
Cutoff + floating over apps + autohide not working properly + settings not saved + darkmode 2 2021-11-23 2022-09-08
Code has been released 0 2021-05-21 2022-07-06
Example of parallel reduction? 0 2022-07-18 2022-07-25
Deno.Buffer will be deprecated in Deno 2.0 1 2022-02-25 2022-09-02
Create better upgrade story 0 2021-10-19 2022-09-02
Use framework agnostic UI 5 2020-02-26 2022-01-24
Installing dev v 10 build 11 2 2022-03-08 2022-07-14
Operator : Div(Grad()) 1 2022-03-24 2022-09-16
orbit_prediction/code_pattern/ssa_notebook.ipynb throwing error while pip installing orbit_prediction 1 2021-03-02 2022-09-10
SCP "Is a Directory" 0 2022-09-12 2022-09-21
NativeWindow.CursorGrabbed and NativeWindow.CursorVisible behaviour should be separated 3 2021-12-21 2022-08-01
菜单的目录显示不正确 1 2021-07-21 2022-01-20
Subsetted fonts crash with: `Error: Not a fixed size` 1 2022-04-09 2022-09-27
Timeout error while sending large file attachments 0 2021-02-08 2022-09-19
[Question] jumpserver-core v2.17.3 tag直接docker build后启动容器失败 3 2022-01-16 2022-09-05
Measuring decreasing VC for opaque recursive functions not generated 6 2022-05-25 2022-09-15
`@extern` functions preconditions wiped out in presence of unsupported constructs 0 2022-05-27 2022-09-15
Checkbox Column (showSelect: false) 8 2021-04-15 2022-09-09
Cross post: how to control the relationships between imported blank nodes? 0 2022-06-20 2022-09-19
MongoDB: invalid port number in database URL 0 2022-01-03 2022-01-18
feature: add support for comments on slides and shapes 0 2021-10-26 2022-09-24
Initializing shared memories (within a module). 5 2021-04-25 2022-09-13
Heroku CLI output invisible in Terminal (Cmder) 1 2021-04-30 2022-09-18
zap.sh script stop working if JAVA_TOOL_OPTIONS variable set. 0 2022-07-26 2022-09-28
"Press enter to continue" blocks startup 7 2020-01-15 2022-09-18
wget errors because we don't implement `clock_getres()` 7 2022-04-10 2022-08-27
FileManager: Lockup when ViewMode configuration is not present 1 2022-04-09 2022-08-31
Keycloak and LDAPS User Federation 6 2022-06-21 2022-09-04
Error while decryption 3 2019-02-11 2022-09-10
Loading never ends when listing (decryption error when creating) 2 2019-04-03 2022-09-27
Sharing passwords is bugged. 8 2019-07-02 2022-09-05
FR: use SDL.SetWindowTitle for document name 1 2018-03-04 2022-09-14
[BUG]: AV1 is not available 10 2022-08-10 2022-09-03
CssSyntax error: Selector "&:hover" is not pure (pure selectors must contain at least one local class or id) 1 2022-01-27 2022-01-23
Error: Callback was already called & circular dependency 1 2021-08-22 2022-07-09
Error using quoteOrderQty in Binance Futures 3 2021-08-09 2022-09-18
pipe_echo_server example fails on Windows 5 2015-07-27 2022-09-16
Based on the docker production environment example provided on the official website, can you successfully access port 3000 after deployment using the host network? 2 2020-06-20 2022-08-26
kubectl port-forward and exec not working with cloudflared tunnel 1 2022-05-20 2022-09-05
'Introduction to gRPC' issue 1 2021-05-05 2021-12-27
x/vulndb: potential Go vuln in Go Standard Library (package not identified): CVE-2021-33195 0 2022-01-07 2022-08-26
Tutorial update -- version update 1 2021-08-04 2022-07-20
Enhancement of Posteriors Density Plot 0 2021-08-05 2022-08-04
matplotlib pin 4 2021-08-04 2022-01-13
Time for a new release 2 2021-01-11 2022-01-29
BMW Connected Drive shows wrong remaining_range_total 11 2022-07-08 2022-09-06
[rke2] custom cluster gets to an active state with only `etcd` and `controlplane` node(s) 2 2022-03-23 2022-09-07
website: Wander is missing from Tools list 2 2022-08-16 2022-09-16
Live data highlights disapearing after closing and reopening editor 1 2021-03-22 2022-08-23
Improve logging and debugging capabilities in `Core` 0 2022-07-19 2022-09-25