Top 50 Pandas Interview Questions in Python (Basic)

All the Key Points...

In this article, we will explore 50 Basic pandas interview questions and answers in Python in short with code examples.

Pandas is an open-source data manipulation and analysis library for the Python programming language.

It provides data structures and functions needed to work with structured data seamlessly.

The core of Pandas is two primary data structures:
The Series (1-dimensional) and
DataFrame (2-dimensional),
which allow for easy manipulation and analysis of tabular data.

Pandas provides a wide range of functions for tasks such as reading/writing data to files, cleaning, transforming, filtering, merging, grouping, pivoting, and visualizing data.

It is widely used in data analysis, data science, and machine learning, and here are the top basic 50 pandas interview questions and answer in Python.

Pandas Interview Questions and Answers in Python

1. How to import the Pandas library in Python?

# Importing the Pandas Library

import pandas as pd

2. How to read a CSV file using Pandas?

# Reading the CSV file

df = pd.read_csv('filename.csv')

3. How to check the first 5 rows of a DataFrame?

df.head()

4. How to check the last 5 rows of a DataFrame?

df.tail()

5. How to check the shape of a DataFrame?

df.shape

6. How to check the data types of columns in a DataFrame?

df.dtypes

7. How to select a single column from a DataFrame?

df['column_name']

8. How to select multiple columns from a DataFrame?

df[['column_name1', 'column_name2']]

9. How to filter rows based on a condition in a DataFrame?

df[df['column_name'] > value]

10. How to filter rows based on multiple conditions in a DataFrame?

df[(df['column_name1'] > value1) & (df['column_name2'] < value2)]

11. How to group data by a column in a DataFrame?

df.groupby('column_name')

12. How to get the mean of a column in a grouped DataFrame?

df.groupby('column_name')['column_to_calculate'].mean()

13. How to get the maximum value of a column in a grouped DataFrame?

df.groupby('column_name')['column_to_calculate'].max()

14. How to sort a DataFrame by a column?

df.sort_values('column_name')

15. How to sort a DataFrame by multiple columns?

df.sort_values(['column_name1', 'column_name2'])

16. How to drop a column from a DataFrame?

df.drop('column_name', axis=1)

17. How to drop multiple columns from a DataFrame?

df.drop(['column_name1', 'column_name2'], axis=1)

18. How to drop a row from a DataFrame?

df.drop(index=0)

19. How to drop multiple rows from a DataFrame?

df.drop(index=[0,1])

20. How to fill missing values in a DataFrame?

df.fillna(value)

21. How to replace values in a DataFrame?

df.replace(old_value, new_value)

22. How to merge two DataFrames?

pd.merge(df1, df2, on='column_name')

23. How to concatenate two DataFrames?

pd.concat([df1, df2])

24. How to pivot a DataFrame?

df.pivot(index='column_name1', columns='column_name2', values='column_name3')

25. How to convert a DataFrame into a numpy array?

df.values

26. How to rename a column in a DataFrame?

df.rename(columns={'old_column_name':'new_column_name'}, inplace=True)

27. How to rename multiple columns in a DataFrame?

df.rename(columns={'old_column_name1':'new_column_name1', 'old_column_name2':'new_column_name2'}, inplace=True)

28. How to set a column as the index of a DataFrame?

df.set_index('column_name', inplace=True)

29. How to reset the index of a DataFrame?

df.reset_index(inplace=True)

30. How to get the unique values in a column of a DataFrame?

df['column_name'].unique()

31. How to count the number of unique values in a column of a DataFrame?

df['column_name'].nunique()

32. How to get the value counts of each unique value in a column of a DataFrame?

df['column_name'].value_counts()

33. How to Apply a Function to a Column of a DataFrame?

df['column_name'].apply(function_name)

34. How to apply a function to multiple columns of a DataFrame?

df[['column_name1', 'column_name2']].apply(function_name)

35. How to create a new column in a DataFrame based on a calculation?

df['new_column_name'] = df['column_name1'] + df['column_name2']

36. How to create a new column in a DataFrame based on a condition?

df['new_column_name'] = np.where(df['column_name'] > value, 'yes', 'no')

37. How to drop rows with missing values in a DataFrame?

df.dropna(inplace=True)

38. How to drop rows with missing values in a specific column of a DataFrame?

df.dropna(subset=['column_name'], inplace=True)

39. How to replace missing values in a specific column of a DataFrame with the mean value?

df['column_name'].fillna(df['column_name'].mean(), inplace=True)

40. How to replace missing values in a specific column of a DataFrame with the median value?

df['column_name'].fillna(df['column_name'].median(), inplace=True)

41. How to replace missing values in a specific column of a DataFrame with the mode value?

df['column_name'].fillna(df['column_name'].mode()[0], inplace=True)

42. How to drop duplicate rows in a DataFrame?

df.drop_duplicates(inplace=True)

43. How to drop duplicate rows based on a subset of columns in a DataFrame?

df.drop_duplicates(subset=['column_name1', 'column_name2'], inplace=True)

44. How to create a DataFrame from a dictionary?

data = {'column_name1': [value1, value2, value3], 'column_name2': [value4, value5, value6]}
df = pd.DataFrame(data)

45. How to create a DataFrame from a list of dictionaries?

data = [{'column_name1': value1, 'column_name2': value2}, {'column_name1': value3, 'column_name2': value4}]
df = pd.DataFrame(data)

46. How to create a DataFrame from a CSV file with a custom delimiter?

df = pd.read_csv('filename.csv', delimiter=';')

47. How to create a DataFrame from a JSON file?

df = pd.read_json('filename.json')

48. How to create a DataFrame from an Excel file?

df = pd.read_excel('filename.xlsx')

49. How to export a DataFrame to a CSV file?

df.to_csv('filename.csv', index=False)

50. How to export a DataFrame to an Excel file?

df.to_excel('filename.xlsx', index=False)

Conclusion

In conclusion, Pandas is a powerful and widely-used Python library for data manipulation and analysis.

It provides a rich set of tools and functions for tasks such as data cleaning, transformation, filtering, merging, grouping, pivoting, and visualization.

By leveraging the Series and DataFrame data structures, Pandas allows for easy and intuitive manipulation of tabular data.

With its simple syntax and powerful capabilities, Pandas has become a standard tool for data analysts, data scientists, and machine learning practitioners.

All 50 Basic pandas interview questions and answers in Python will help to get enough glimpses into pandas and the interview preparation process. happy learning!!

Related Article: Top 11 Data Visualization Libraries in Python.

Nitin Khandare

Meet Nitin, a seasoned professional in the field of data engineering. With a Post Graduation in Data Science and Analytics, Nitin is a key contributor to the healthcare sector, specializing in data analysis, machine learning, AI, blockchain, and various data-related tools and technologies. As the Co-founder and editor of analyticslearn.com, Nitin brings a wealth of knowledge and experience to the realm of analytics. Join us in exploring the exciting intersection of healthcare and data science with Nitin as your guide.