In this article, we will cover some of the intermediate-level Pandas interview questions and answers in Python for the interview preparation.
These questions are intended to test your knowledge and understanding of Pandas beyond the basics.
Pandas Interview Questions & Answers in Python
1. What is the difference between loc and iloc in Pandas?
loc
is used to access data by label(s), while iloc
is used to access data by integer(s). For example:
# access a single value using loc df.loc[row_label, column_label] # access a single value using iloc df.iloc[row_index, column_index]
2. How to filter rows of a DataFrame based on a condition?
## Filter the data frame based on condition df_filtered = df[df['column_name'] > value]
3. How to filter rows of a DataFrame based on multiple conditions?
## Filter the rows of a DataFrame based on multiple conditions df_filtered = df[(df['column_name1'] > value1) & (df['column_name2'] == value2)]
4. How to sort a DataFrame by a column in ascending order?
# sort a DataFrame by a column in ascending order df_sorted = df.sort_values('column_name')
5. How to sort a DataFrame by a column in descending order?
# sort a DataFrame by a column in descending order df_sorted = df.sort_values('column_name', ascending=False)
6. How to group a DataFrame by a column and calculate the mean of another column?
# group a DataFrame by a column and calculate the mean another df_grouped = df.groupby('column_name1')['column_name2'].mean()
7. How to group a DataFrame by multiple columns and calculate the mean of another column?
# group a DataFrame by multiple columns and get the mean of another column df_grouped = df.groupby(['column_name1', 'column_name2'])['column_name3'].mean()
8. How to merge two DataFrames on a common column?
# merge two DataFrames on a common column df_merged = pd.merge(df1, df2, on='common_column')
9. How to merge two DataFrames on multiple common columns?
# merge two DataFrames on multiple common columns df_merged = pd.merge(df1, df2, on=['common_column1', 'common_column2'])
10. How to concatenate two DataFrames vertically?
# Concatenate two DataFrames vertically df_concatenated = pd.concat([df1, df2], axis=0)
11. How to Concatenate two DataFrames horizontally?
# Concatenate two DataFrames horizontally df_concatenated = pd.concat([df1, df2], axis=1)
12. How to rename the columns of a DataFrame?
# Rename the columns of a DataFrame df_renamed = df.rename(columns={'old_column_name1': 'new_column_name1', 'old_column_name2': 'new_column_name2'})
13. How to rename the index of a DataFrame?
# Rename the index of a DataFrame df_renamed = df.rename(index={'old_index_name1': 'new_index_name1', 'old_index_name2': 'new_index_name2'})
14. How to drop a column from a DataFrame?
# Drop a column from a DataFrame df_dropped = df.drop('column_name', axis=1)
15. How to drop a row from a DataFrame?
## Drop a row from a DataFrame df_dropped = df.drop(index=row_index)
16. How to apply a function element-wise to a column of a DataFrame?
# Apply a function element-wise to a column of a DataFrame df['new_column'] = df['column'].apply(function)
17. How to apply a function element-wise to multiple columns of a DataFrame?
df[['new_column1', 'new_column2']] = df[['column1', 'column2']].applymap(function)
18. How to pivot a DataFrame?
df_pivot = df.pivot(index='index_column', columns='column_to_pivot', values='column_to_aggregate')
19. How to melt a DataFrame?
df_melted = pd.melt(df, id_vars=['column_to_keep'], value_vars=['column_to_melt'], var_name='new_column_name1', value_name='new_column_name2')</code>
20. How to convert a DataFrame from wide format to long format?
df_long = pd.wide_to_long(df, stubnames='column_to_split', i=['index_column'], j='new_column_name')
21. How to apply a custom function to each row of a DataFrame?
def custom_function(row): # do something with the row return result df['new_column'] = df.apply(custom_function, axis=1)
22. How to fill missing values in a DataFrame?
df_filled = df.fillna(value)
23. How to interpolate missing values in a DataFrame?
df_interpolated = df.interpolate()
24. How to remove duplicate rows from a DataFrame?
df_deduplicated = df.drop_duplicates()
25. How to convert a column of a DataFrame from one data type to another?
df['new_column'] = df['old_column'].astype(new_data_type)
26. How to convert a string column to a datetime column?
df['new_column'] = pd.to_datetime(df['old_column'])
27. How to convert a datetime column to a string column?
df['new_column'] = df['old_column'].dt.strftime('%Y-%m-%d')
28. How to create a new column based on a condition?
df['new_column'] = np.where(df['column_to_check'] > value, 'Yes', 'No')
29. How to replace values in a column based on a condition?
df['column_to_replace'] = np.where(df['column_to_check'] > value, new_value, df['column_to_replace'])
30. How to count the number of occurrences of each value in a column?
df_count = df['column_to_count'].value_counts()
31. How to calculate the cumulative sum of a column?
df_cumsum = df['column_to_sum'].cumsum()
32. How to calculate the percentage change of a column?
df_pct_change = df['column_to_change'].pct_change()
33. How to calculate the rolling mean of a column?
df_rolling_mean = df['column_to_mean'].rolling(window_size).mean()
34. How to calculate the exponential weighted mean of a column?
df_ewm_mean = df['column_to_mean'].ewm(span=span_size).mean()
35. How to calculate the correlation between two columns?
corr = df['column1'].corr(df['column2'])
36. How to calculate the covariance between two columns?
# Calculate the covariance between two columns cov = df['column1'].cov(df['column2'])
37. How to create a histogram of a column?
# Create a histogram of a column df['column_to_plot'].hist()
Related Article: Top 11 Data Visualization Libraries in Python.
38. How to create a scatter plot of two columns?
# Create a scatter plot of two columns df.plot.scatter(x='column1', y='column2')
39. How to create a box plot of a column?
# Create a box plot of a column df.boxplot(column='column_to_plot')
40. How to create a bar plot of a column?
# Creating a bar plot of a column df['column_to_plot'].value_counts().plot(kind='bar')
41. How to create a pie chart of a column?
## creating a pie chart of a column df['column_to_plot'].value_counts().plot(kind='pie')
42. How to create a heatmap of a correlation matrix?
# creating a heatmap of a correlation matrix sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
43. How to create a line plot of a column with multiple groups?
# creating a line plot of a column with multiple groups df.groupby('group_column')['column_to_plot'].plot(legend=True)
44. How to join two DataFrames based on their indexes?
joined_df = df1.join(df2)
Related Article: Joins in Python – Ultimate Guide on Joins Functions
45. How to stack multiple DataFrames vertically?
stacked_df = pd.concat([df1, df2, df3], axis=0)
46. How to stack multiple DataFrames horizontally?
stacked_df = pd.concat([df1, df2, df3], axis=1)
47. How to group a DataFrame by multiple columns?
grouped_df = df.groupby(['column1', 'column2'])['value_column'].sum()
48. How to pivot a table with hierarchical indexes?
df_pivot = df.pivot_table(index=['index_column1', 'index_column2'], columns='column_to_pivot', values='value_column')
49. How to rename columns in a DataFrame?
df.rename(columns={'old_name': 'new_name'}, inplace=True)
50. How to sort a DataFrame by one or more columns?
df.sort_values(['column1', 'column2'], ascending=[True, False], inplace=True)
51. How to select rows based on a condition?
df[df['column'] > 50]
52. How to select rows based on multiple conditions?
df[(df['column1'] > 50) & (df['column2'] == 'value')]
53. How to group a DataFrame by a time interval?
df.groupby(pd.Grouper(key='timestamp_column', freq='1H'))['value_column'].mean()
54. How to apply a function to a DataFrame column?
df['new_column'] = df['old_column'].apply(lambda x: x**2)
55. How to convert a DataFrame column from string to datetime?
df['datetime_column'] = pd.to_datetime(df['datetime_string_column'])
56. How to fill missing values in a DataFrame column?
df['value_column'].fillna(df['value_column'].mean(), inplace=True)
57. How to create a new DataFrame column based on the values of other columns?
df['new_column'] = df['column1'] + df['column2']
58. How to select a random sample of rows from a DataFrame?
df.sample(n=10, random_state=42)
59. How to create a new DataFrame column based on conditions?
df['new_column'] = np.where(df['column'] > 50, 'high', 'low')
60. How to convert a DataFrame column from object to category?
df['category_column'] = df['object_column'].astype('category')
61. How to split a DataFrame into two or more subsets based on a condition?
high_values = df[df['value_column'] > 50] low_values = df[df['value_column'] <= 50]
62. How to merge two DataFrames based on their indexes?
merged_df = df1.merge(df2, left_index=True, right_index=True)
63. How to pivot a DataFrame with multiple value columns?
df_pivot = df.pivot_table(index='index_column', columns='column_to_pivot', values=['value_column1', 'value_column2'])
64. How to select the top n rows within each group in a DataFrame?
df.groupby('group_column').apply(lambda x: x.nlargest(3, 'value_column'))
65. How to create a new DataFrame column based on the difference between two columns?
df['difference_column'] = df['column1'] - df['column2']
66. How to resample a DataFrame with a datetime index?
df.resample('1D').mean()
67. How to calculate the rolling average of a DataFrame column?
df['rolling_average'] = df['value_column'].rolling(window=3).mean()
68. How to create a rolling window calculation in Pandas?
You can use the rolling()
method in Pandas to create a rolling window calculation on a DataFrame.
For example, you can create a rolling average of the ‘Value’ column of a DataFrame with a window size of 3 using the following code:
## Creating a rolling window calculation in Pandas df['rolling_average'] = df['Value'].rolling(window=3).mean()
This code adds a new column to the DataFrame called ‘rolling_average’ which contains the rolling average of the ‘Value’ column with a window size of 3.
69. How to merge two DataFrames with different column names?
You can use the merge()
function in Pandas to merge two DataFrames with different column names. For example, suppose you have two DataFrames: one containing ‘ID’ and ‘Value1’ columns, and the other containing ‘ID’ and ‘Value2’ columns. You can merge them on the ‘ID’ column using the following code:
merged_df = pd.merge(df1, df2, on='ID')
This code merges df1
and df2
on the ‘ID’ column, creating a new DataFrame called ‘merged_df’ that contains the ‘ID’, ‘Value1’, and ‘Value2’ columns.
70. How to group a DataFrame by multiple columns and apply a custom function?
You can use the groupby()
method in Pandas to group a DataFrame by multiple columns, and then apply a custom function to the resulting groups.
For example, suppose you have a DataFrame containing ‘Year’, ‘Month’, and ‘Value’ columns, and you want to group the DataFrame by both ‘Year’ and ‘Month’, and then calculate the standard deviation of the ‘Value’ column for each group. You can use the following code:
## Grouping a DataFrame by multiple columns and apply a custom function on data grouped_df = df.groupby(['Year', 'Month'])['Value'].apply(lambda x: x.std())
This code groups the DataFrame df
by both ‘Year’ and ‘Month’, and applies a lambda function to the ‘Value’ column that calculates the standard deviation for each group.
The resulting grouped_df
DataFrame has a multi-level index with the ‘Year’ and ‘Month’ columns, and a single ‘Value’ column containing the standard deviation for each group.
71. How to convert a DataFrame to a dictionary with column names as keys and values as lists?
You can use the to_dict()
method in Pandas to convert a DataFrame to a dictionary with column names as keys and values as lists.
For example, suppose you have a DataFrame containing ‘Name’, ‘Age’, and ‘Gender’ columns, and you want to convert it to a dictionary with ‘Name’, ‘Age’, and ‘Gender’ as keys and lists of corresponding values as values. You can use the following code:
dict_data = df.to_dict('list')
This code converts the DataFrame df
to a dictionary called dict_data
, with ‘Name’, ‘Age’, and ‘Gender’ as keys and lists of corresponding values as values.
Conclusion
These are some of the intermediate level pandas interview questions with solutions in Python.
Practicing these pandas questions or Python problems will give you a good understanding of pandas and prepare you for your next interview.
Meet Nitin, a seasoned professional in the field of data engineering. With a Post Graduation in Data Science and Analytics, Nitin is a key contributor to the healthcare sector, specializing in data analysis, machine learning, AI, blockchain, and various data-related tools and technologies. As the Co-founder and editor of analyticslearn.com, Nitin brings a wealth of knowledge and experience to the realm of analytics. Join us in exploring the exciting intersection of healthcare and data science with Nitin as your guide.