Top 71 Pandas Interview Questions in Python (Intermediate)

All the Key Points...

In this article, we will cover some of the intermediate-level Pandas interview questions and answers in Python for the interview preparation.

These questions are intended to test your knowledge and understanding of Pandas beyond the basics.

We will provide an example code to help you better understand each question.

Pandas Interview Questions & Answers in Python

1. What is the difference between loc and iloc in Pandas?

loc is used to access data by label(s), while iloc is used to access data by integer(s). For example:

# access a single value using loc
df.loc[row_label, column_label]

# access a single value using iloc
df.iloc[row_index, column_index]

2. How to filter rows of a DataFrame based on a condition?

## Filter the data frame based on condition

df_filtered = df[df['column_name'] > value]

3. How to filter rows of a DataFrame based on multiple conditions?

## Filter the rows of a DataFrame based on multiple conditions

df_filtered = df[(df['column_name1'] > value1) & (df['column_name2'] == value2)]

4. How to sort a DataFrame by a column in ascending order?

# sort a DataFrame by a column in ascending order

df_sorted = df.sort_values('column_name')

5. How to sort a DataFrame by a column in descending order?

# sort a DataFrame by a column in descending order

df_sorted = df.sort_values('column_name', ascending=False)

6. How to group a DataFrame by a column and calculate the mean of another column?

# group a DataFrame by a column and calculate the mean another

df_grouped = df.groupby('column_name1')['column_name2'].mean()

7. How to group a DataFrame by multiple columns and calculate the mean of another column?

# group a DataFrame by multiple columns and get the mean of another column

df_grouped = df.groupby(['column_name1', 'column_name2'])['column_name3'].mean()

8. How to merge two DataFrames on a common column?

# merge two DataFrames on a common column

df_merged = pd.merge(df1, df2, on='common_column')

9. How to merge two DataFrames on multiple common columns?

# merge two DataFrames on multiple common columns

df_merged = pd.merge(df1, df2, on=['common_column1', 'common_column2'])

10. How to concatenate two DataFrames vertically?

# Concatenate two DataFrames vertically

df_concatenated = pd.concat([df1, df2], axis=0)

11. How to Concatenate two DataFrames horizontally?

# Concatenate two DataFrames horizontally

df_concatenated = pd.concat([df1, df2], axis=1)

12. How to rename the columns of a DataFrame?

# Rename the columns of a DataFrame

df_renamed = df.rename(columns={'old_column_name1': 'new_column_name1', 'old_column_name2': 'new_column_name2'})

13. How to rename the index of a DataFrame?

# Rename the index of a DataFrame

df_renamed = df.rename(index={'old_index_name1': 'new_index_name1', 'old_index_name2': 'new_index_name2'})

14. How to drop a column from a DataFrame?

# Drop a column from a DataFrame

df_dropped = df.drop('column_name', axis=1)

15. How to drop a row from a DataFrame?

## Drop a row from a DataFrame

df_dropped = df.drop(index=row_index)

16. How to apply a function element-wise to a column of a DataFrame?

# Apply a function element-wise to a column of a DataFrame

df['new_column'] = df['column'].apply(function)

17. How to apply a function element-wise to multiple columns of a DataFrame?

df[['new_column1', 'new_column2']] = df[['column1', 'column2']].applymap(function)

18. How to pivot a DataFrame?

df_pivot = df.pivot(index='index_column', columns='column_to_pivot', values='column_to_aggregate')

19. How to melt a DataFrame?

df_melted = pd.melt(df, id_vars=['column_to_keep'], value_vars=['column_to_melt'], var_name='new_column_name1', value_name='new_column_name2')</code>

20. How to convert a DataFrame from wide format to long format?

df_long = pd.wide_to_long(df, stubnames='column_to_split', i=['index_column'], j='new_column_name')

21. How to apply a custom function to each row of a DataFrame?

def custom_function(row):
    # do something with the row
    return result

df['new_column'] = df.apply(custom_function, axis=1)

22. How to fill missing values in a DataFrame?

df_filled = df.fillna(value)

23. How to interpolate missing values in a DataFrame?

df_interpolated = df.interpolate()

24. How to remove duplicate rows from a DataFrame?

df_deduplicated = df.drop_duplicates()

25. How to convert a column of a DataFrame from one data type to another?

df['new_column'] = df['old_column'].astype(new_data_type)

26. How to convert a string column to a datetime column?

df['new_column'] = pd.to_datetime(df['old_column'])

27. How to convert a datetime column to a string column?

df['new_column'] = df['old_column'].dt.strftime('%Y-%m-%d')

28. How to create a new column based on a condition?

df['new_column'] = np.where(df['column_to_check'] > value, 'Yes', 'No')

29. How to replace values in a column based on a condition?

df['column_to_replace'] = np.where(df['column_to_check'] > value, new_value, df['column_to_replace'])

30. How to count the number of occurrences of each value in a column?

df_count = df['column_to_count'].value_counts()

31. How to calculate the cumulative sum of a column?

df_cumsum = df['column_to_sum'].cumsum()

32. How to calculate the percentage change of a column?

df_pct_change = df['column_to_change'].pct_change()

33. How to calculate the rolling mean of a column?

df_rolling_mean = df['column_to_mean'].rolling(window_size).mean()

34. How to calculate the exponential weighted mean of a column?

df_ewm_mean = df['column_to_mean'].ewm(span=span_size).mean()

35. How to calculate the correlation between two columns?

corr = df['column1'].corr(df['column2'])

36. How to calculate the covariance between two columns?

# Calculate the covariance between two columns

cov = df['column1'].cov(df['column2'])

37. How to create a histogram of a column?

# Create a histogram of a column

df['column_to_plot'].hist()

Related Article: Top 11 Data Visualization Libraries in Python.

38. How to create a scatter plot of two columns?

# Create a scatter plot of two columns

df.plot.scatter(x='column1', y='column2')

39. How to create a box plot of a column?

# Create a box plot of a column

df.boxplot(column='column_to_plot')

40. How to create a bar plot of a column?

# Creating a bar plot of a column

df['column_to_plot'].value_counts().plot(kind='bar')

41. How to create a pie chart of a column?

## creating a pie chart of a column

df['column_to_plot'].value_counts().plot(kind='pie')

42. How to create a heatmap of a correlation matrix?

# creating a heatmap of a correlation matrix

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

43. How to create a line plot of a column with multiple groups?

# creating a line plot of a column with multiple groups

df.groupby('group_column')['column_to_plot'].plot(legend=True)

44. How to join two DataFrames based on their indexes?

joined_df = df1.join(df2)

Related Article: Joins in Python – Ultimate Guide on Joins Functions

45. How to stack multiple DataFrames vertically?

stacked_df = pd.concat([df1, df2, df3], axis=0)

46. How to stack multiple DataFrames horizontally?

stacked_df = pd.concat([df1, df2, df3], axis=1)

47. How to group a DataFrame by multiple columns?

grouped_df = df.groupby(['column1', 'column2'])['value_column'].sum()

48. How to pivot a table with hierarchical indexes?

df_pivot = df.pivot_table(index=['index_column1', 'index_column2'], columns='column_to_pivot', values='value_column')

49. How to rename columns in a DataFrame?

df.rename(columns={'old_name': 'new_name'}, inplace=True)

50. How to sort a DataFrame by one or more columns?

df.sort_values(['column1', 'column2'], ascending=[True, False], inplace=True)

51. How to select rows based on a condition?

df[df['column'] > 50]

52. How to select rows based on multiple conditions?

df[(df['column1'] > 50) & (df['column2'] == 'value')]

53. How to group a DataFrame by a time interval?

df.groupby(pd.Grouper(key='timestamp_column', freq='1H'))['value_column'].mean()

54. How to apply a function to a DataFrame column?

df['new_column'] = df['old_column'].apply(lambda x: x**2)

55. How to convert a DataFrame column from string to datetime?

df['datetime_column'] = pd.to_datetime(df['datetime_string_column'])

56. How to fill missing values in a DataFrame column?

df['value_column'].fillna(df['value_column'].mean(), inplace=True)

57. How to create a new DataFrame column based on the values of other columns?

df['new_column'] = df['column1'] + df['column2']

58. How to select a random sample of rows from a DataFrame?

df.sample(n=10, random_state=42)

59. How to create a new DataFrame column based on conditions?

df['new_column'] = np.where(df['column'] > 50, 'high', 'low')

60. How to convert a DataFrame column from object to category?

df['category_column'] = df['object_column'].astype('category')

61. How to split a DataFrame into two or more subsets based on a condition?

high_values = df[df['value_column'] > 50]
low_values = df[df['value_column'] <= 50]

62. How to merge two DataFrames based on their indexes?

merged_df = df1.merge(df2, left_index=True, right_index=True)

63. How to pivot a DataFrame with multiple value columns?

df_pivot = df.pivot_table(index='index_column', columns='column_to_pivot', values=['value_column1', 'value_column2'])

64. How to select the top n rows within each group in a DataFrame?

df.groupby('group_column').apply(lambda x: x.nlargest(3, 'value_column'))

65. How to create a new DataFrame column based on the difference between two columns?

df['difference_column'] = df['column1'] - df['column2']

66. How to resample a DataFrame with a datetime index?

df.resample('1D').mean()

67. How to calculate the rolling average of a DataFrame column?

df['rolling_average'] = df['value_column'].rolling(window=3).mean()

68. How to create a rolling window calculation in Pandas?

You can use the rolling() method in Pandas to create a rolling window calculation on a DataFrame.

For example, you can create a rolling average of the ‘Value’ column of a DataFrame with a window size of 3 using the following code:

## Creating a rolling window calculation in Pandas

df['rolling_average'] = df['Value'].rolling(window=3).mean()

This code adds a new column to the DataFrame called ‘rolling_average’ which contains the rolling average of the ‘Value’ column with a window size of 3.

69. How to merge two DataFrames with different column names?

You can use the merge() function in Pandas to merge two DataFrames with different column names. For example, suppose you have two DataFrames: one containing ‘ID’ and ‘Value1’ columns, and the other containing ‘ID’ and ‘Value2’ columns. You can merge them on the ‘ID’ column using the following code:

merged_df = pd.merge(df1, df2, on='ID')

This code merges df1 and df2 on the ‘ID’ column, creating a new DataFrame called ‘merged_df’ that contains the ‘ID’, ‘Value1’, and ‘Value2’ columns.

70. How to group a DataFrame by multiple columns and apply a custom function?

You can use the groupby() method in Pandas to group a DataFrame by multiple columns, and then apply a custom function to the resulting groups.

For example, suppose you have a DataFrame containing ‘Year’, ‘Month’, and ‘Value’ columns, and you want to group the DataFrame by both ‘Year’ and ‘Month’, and then calculate the standard deviation of the ‘Value’ column for each group. You can use the following code:

## Grouping a DataFrame by multiple columns and apply a custom function on data

grouped_df = df.groupby(['Year', 'Month'])['Value'].apply(lambda x: x.std())

This code groups the DataFrame df by both ‘Year’ and ‘Month’, and applies a lambda function to the ‘Value’ column that calculates the standard deviation for each group.

The resulting grouped_df DataFrame has a multi-level index with the ‘Year’ and ‘Month’ columns, and a single ‘Value’ column containing the standard deviation for each group.

71. How to convert a DataFrame to a dictionary with column names as keys and values as lists?

You can use the to_dict() method in Pandas to convert a DataFrame to a dictionary with column names as keys and values as lists.

For example, suppose you have a DataFrame containing ‘Name’, ‘Age’, and ‘Gender’ columns, and you want to convert it to a dictionary with ‘Name’, ‘Age’, and ‘Gender’ as keys and lists of corresponding values as values. You can use the following code:

dict_data = df.to_dict('list')

This code converts the DataFrame df to a dictionary called dict_data, with ‘Name’, ‘Age’, and ‘Gender’ as keys and lists of corresponding values as values.

Conclusion

These are some of the intermediate level pandas interview questions with solutions in Python.

Practicing these pandas questions or Python problems will give you a good understanding of pandas and prepare you for your next interview.