Tuesday 12 December 2023

Pandas: Extract random samples or rows in a DataFrame

 

Using sample() method, we can extract ‘n’ number of random rows in a DataFrame.

 

Find the below examples.

 

Example 1: Get 1 random row from a DataFrame

df.sample()

 

Example 2: Get n random rows from a DataFrame

Below snippet return 3 random rows from the dataset.

df.sample(3)

 

Example 3: Get 20% of samples from a DataFrame

df.sample(frac=0.2)

 

Get 45% of samples from a DataFrame

df.sample(frac=0.45)

 

Example 4: Get random column from a DataFrame

df.sample(axis=1)

df.sample(axis='columns')

 

Example 5: Get random columns from a DataFrame

Below snippet return 3 random columns from the dataset.

df.sample(3, axis=1)

df.sample(3, axis='columns')

 

Find the below working application.

 

random_row_selection.py

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Krishna', 'Sailu', 'Joel', 'Chamu', 'Jitendra', "Raj"],
        'Age': [34, 35, 29, 35, 52, 34],
        'City': ['Bangalore', 'Hyderabad', 'Hyderabad', 'Chennai', 'Bangalore', 'Chennai'],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Male'],
        'Rating': [81, 76, 67, 100, 87, 89]}

df = pd.DataFrame(data)
print('Original DataFrame')
print(df)

# Select one random row
one_random_row = df.sample()
three_random_rows = df.sample(3)
twenty_percent_samples = df.sample(frac=0.2)
forty_five_percen_samples = df.sample(frac=0.45)

random_single_column_1 = df.sample(axis=1)
random_single_column_2 = df.sample(axis='columns')

random_three_column_1 = df.sample(3, axis=1)
random_three_column_2 = df.sample(3, axis='columns')

print('\none_random_row: \n', one_random_row)
print('\nthree_random_rows: \n', three_random_rows)
print('\ntwenty_percent_samples: \n', twenty_percent_samples)
print('\nforty_five_percen_samples: \n', forty_five_percen_samples)
print('\nrandom_single_column_1: \n', random_single_column_1)
print('\nrandom_single_column_2: \n', random_single_column_2)
print('\nrandom_three_column_1: \n', random_three_column_1)
print('\nrandom_three_column_2: \n', random_three_column_2)

 

Output

Original DataFrame
       Name  Age       City  Gender  Rating
0   Krishna   34  Bangalore    Male      81
1     Sailu   35  Hyderabad  Female      76
2      Joel   29  Hyderabad    Male      67
3     Chamu   35    Chennai  Female     100
4  Jitendra   52  Bangalore    Male      87
5       Raj   34    Chennai    Male      89

one_random_row: 
        Name  Age       City Gender  Rating
4  Jitendra   52  Bangalore   Male      87

three_random_rows: 
       Name  Age       City  Gender  Rating
5      Raj   34    Chennai    Male      89
0  Krishna   34  Bangalore    Male      81
3    Chamu   35    Chennai  Female     100

twenty_percent_samples: 
     Name  Age     City  Gender  Rating
3  Chamu   35  Chennai  Female     100

forty_five_percen_samples: 
       Name  Age       City  Gender  Rating
1    Sailu   35  Hyderabad  Female      76
5      Raj   34    Chennai    Male      89
0  Krishna   34  Bangalore    Male      81

random_single_column_1: 
    Age
0   34
1   35
2   29
3   35
4   52
5   34

random_single_column_2: 
    Age
0   34
1   35
2   29
3   35
4   52
5   34

random_three_column_1: 
    Rating  Age       City
0      81   34  Bangalore
1      76   35  Hyderabad
2      67   29  Hyderabad
3     100   35    Chennai
4      87   52  Bangalore
5      89   34    Chennai

random_three_column_2: 
    Age  Gender  Rating
0   34    Male      81
1   35  Female      76
2   29    Male      67
3   35  Female     100
4   52    Male      87
5   34    Male      89

 

 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment