Sunday 29 October 2023

Replace missing or NaN values in a Pandas DataFrame

We can fill the missing values in a DataFrame using fillna method.

Example

df_without_missing_values = df.fillna(0)

 

Above snippet replace all the missing values with 0 and assign the dataset to the variable ‘df_without_missing_values’, but do not affect the original DataFrame.

 

To update the changes in original DataFrame, you can set the argument inplace to True.

 

df.fillna(0, inplace=True)

 

Find the below working application.

 

fill_missing_values.py

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
    'A' : [1, 2, np.nan, 3, 4, np.nan, 5],
    'B' : [1, np.nan, np.nan, 3, 4, 5, 6],
    'C' : [1, 2, np.nan, 4, 5, np.nan, 6],
    'D' : ['a', 'b', None, None, 'c', 'd', 'e']
}

df = pd.DataFrame(data)
df_without_missing_values = df.fillna(0)

print('Original DataFrame')
print(df)

print('\nDataFrame by filling with 0')
print(df_without_missing_values)

df.fillna(0, inplace=True)
print('\nOriginal DataFrame by filling with the argument inplace=True')
print(df)

Output

Original DataFrame
     A    B    C     D
0  1.0  1.0  1.0     a
1  2.0  NaN  2.0     b
2  NaN  NaN  NaN  None
3  3.0  3.0  4.0  None
4  4.0  4.0  5.0     c
5  NaN  5.0  NaN     d
6  5.0  6.0  6.0     e

DataFrame by filling with 0
     A    B    C  D
0  1.0  1.0  1.0  a
1  2.0  0.0  2.0  b
2  0.0  0.0  0.0  0
3  3.0  3.0  4.0  0
4  4.0  4.0  5.0  c
5  0.0  5.0  0.0  d
6  5.0  6.0  6.0  e

Original DataFrame by filling with the argument inplace=True
     A    B    C  D
0  1.0  1.0  1.0  a
1  2.0  0.0  2.0  b
2  0.0  0.0  0.0  0
3  3.0  3.0  4.0  0
4  4.0  4.0  5.0  c
5  0.0  5.0  0.0  d
6  5.0  6.0  6.0  e

Fill the missing values in a specific column

 

df['City'] = df['City'].fillna('not_found')

Above statement fill the missing values in ‘City’ column with the value ‘not_found’

 

df['Age'].fillna(0, inplace=True)

Above statement fill the missing values in ‘Age’ column with the value 0.

 

Find the below working application.

 

fill_missing_values_in_a_column.py

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Ram', 'Joel', 'Gopi', 'Jitendra', "Raj"],
        'Age': [34, np.nan, 29, 41, 52, np.nan],
        'City': ['Bangalore', None, 'Hyderabad', None, 'Bangalore', 'Chennai']}

df = pd.DataFrame(data)
print(df)

df['City'] = df['City'].fillna('not_found')
df['Age'].fillna(0, inplace=True)

print('\nAfter filling missing values')
print(df)

Output

       Name   Age       City
0   Krishna  34.0  Bangalore
1       Ram   NaN       None
2      Joel  29.0  Hyderabad
3      Gopi  41.0       None
4  Jitendra  52.0  Bangalore
5       Raj   NaN    Chennai

After filling missing values
       Name   Age       City
0   Krishna  34.0  Bangalore
1       Ram   0.0  not_found
2      Joel  29.0  Hyderabad
3      Gopi  41.0  not_found
4  Jitendra  52.0  Bangalore
5       Raj   0.0    Chennai


  

Previous                                                 Next                                                 Home

No comments:

Post a Comment