‘sort_values’ values method keep the missing values at last by default. To confirm the same let’s experiment with the below dataset.
Name Age City Gender 0 Krishna 34.0 Bangalore Male 1 Sailu 35.0 Hyderabad Female 2 Joel 29.0 None Male 3 Chamu NaN Chennai Female 4 Jitendra 52.0 None Male 5 Raj NaN Chennai Male
Sort the DataFrame by Age column
sort_by_age_ascending_1 = df.sort_values('Age')
Above snippet sort the DataFrame by ‘Age’ column and assign the transformed DataFrame to the variable sort_by_age_ascending_1. Content of the DataFrame ‘sort_by_age_ascending_1’ is given below.
Name Age City Gender 2 Joel 29.0 None Male 0 Krishna 34.0 Bangalore Male 1 Sailu 35.0 Hyderabad Female 4 Jitendra 52.0 None Male 3 Chamu NaN Chennai Female 5 Raj NaN Chennai Male
As you see above snippet, all NaN values in Age column are shifted to the last and remaining values are sorted in ascending order of Age.
You can achieve the same result by passing the argument na_position to 'last'.
sort_by_age_ascending_2 = df.sort_values('Age', na_position='last')
Sort the DataFrame by Age column and move all the missing values to the top
By passing the argument na_position='first', we can move all the missing value to the top.
sort_by_age_ascending_none_to_first_1 = df.sort_values('Age', na_position='first')
Above snippet generate below DataFrame.
Name Age City Gender 3 Chamu NaN Chennai Female 5 Raj NaN Chennai Male 2 Joel 29.0 None Male 0 Krishna 34.0 Bangalore Male 1 Sailu 35.0 Hyderabad Female 4 Jitendra 52.0 None Male
Find the below working application.
missing_values_handling.py
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Jitendra', "Raj"],
'Age': [34, 35, 29, np.nan, 52, np.nan],
'City': ['Bangalore', 'Hyderabad', None, 'Chennai', None, 'Chennai'],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Male']}
df = pd.DataFrame(data)
sort_by_age_ascending_1 = df.sort_values('Age')
sort_by_age_ascending_2 = df.sort_values('Age', na_position='last')
sort_by_age_ascending_none_to_first_1 = df.sort_values('Age', na_position='first')
print('df : \n', df)
print('\nsort_by_age_ascending_1 : \n', sort_by_age_ascending_1)
print('\nsort_by_age_ascending_2 : \n', sort_by_age_ascending_2)
print('\nsort_by_age_ascending_none_to_first_1 : \n', sort_by_age_ascending_none_to_first_1)
Output
df : Name Age City Gender 0 Krishna 34.0 Bangalore Male 1 Sailu 35.0 Hyderabad Female 2 Joel 29.0 None Male 3 Chamu NaN Chennai Female 4 Jitendra 52.0 None Male 5 Raj NaN Chennai Male sort_by_age_ascending_1 : Name Age City Gender 2 Joel 29.0 None Male 0 Krishna 34.0 Bangalore Male 1 Sailu 35.0 Hyderabad Female 4 Jitendra 52.0 None Male 3 Chamu NaN Chennai Female 5 Raj NaN Chennai Male sort_by_age_ascending_2 : Name Age City Gender 2 Joel 29.0 None Male 0 Krishna 34.0 Bangalore Male 1 Sailu 35.0 Hyderabad Female 4 Jitendra 52.0 None Male 3 Chamu NaN Chennai Female 5 Raj NaN Chennai Male sort_by_age_ascending_none_to_first_1 : Name Age City Gender 3 Chamu NaN Chennai Female 5 Raj NaN Chennai Male 2 Joel 29.0 None Male 0 Krishna 34.0 Bangalore Male 1 Sailu 35.0 Hyderabad Female 4 Jitendra 52.0 None Male
No comments:
Post a Comment