What are the key steps in a data analytics project?
Define the problem, collect data, clean and preprocess, explore (EDA), model, interpret results, and present findings.
What is the difference between Data Mining and Data Analysis?
Data Mining is discovering patterns in large datasets. Data Analysis is inspecting, cleaning, and modeling data to extract insights and support decision-making.
Explain Data Cleaning and why it's important.
The process of fixing incorrect, incomplete, or duplicate data. Crucial because "garbage in, garbage out" – poor data leads to inaccurate results.
What is EDA (Exploratory Data Analysis)?
Using summary statistics and visualizations to understand data characteristics, spot patterns, and detect anomalies before formal modeling.
What is the difference between Supervised and Unsupervised Learning?
Supervised: Uses labeled data to predict outcomes (e.g., classification, regression). Unsupervised: Finds hidden patterns in unlabeled data (e.g., clustering, association).
What is a JOIN in SQL? Name the most common types.
Combines rows from two or more tables. Common types: INNER, LEFT, RIGHT, and FULL.
What is the difference between WHERE and HAVING in SQL?
WHERE filters rows before aggregation. HAVING filters groups after aggregation (used with GROUP BY).
What is a subquery?
A query nested inside another query (e.g., in a SELECT, FROM, or WHERE clause).
What is a pivot table?
A data summarization tool used in spreadsheets or BI tools to count, average, or sum data, grouped by categories.
What is the difference between a Primary Key and a Foreign Key?
A Primary Key uniquely identifies a row in its own table. A Foreign Key in one table points to a Primary Key in another, establishing a relationship.
How do you handle missing or null values in a dataset?
Identify the cause. Then, either remove rows/columns or impute (fill) values using mean, median, mode, or a predictive model.
How would you explain a complex data model to a non-technical stakeholder?
Use simple analogies and focus on the business impact, not the technical details. Relate it to their goals and use clear visuals.
Describe a project where you used data to solve a problem.
Use the STAR method: Describe the Situation, Task, Action you took (tools, analysis), and the Result/impact.
What is A/B testing?
A method to compare two versions of a webpage or app to determine which one performs better on a specific metric (e.g., conversion rate).
How do you ensure your analysis is accurate?
Perform data validation, peer review of code/queries, sanity-check results, and document assumptions.
What BI tools are you familiar with?
(Name specific ones) Tableau, Power BI, Looker, Qlik. Be ready to discuss one you've used.
What is your experience with Python/R for data analysis?
Mention key libraries: Pandas for data manipulation, NumPy for computations, Scikit-learn for ML, and Matplotlib/Seaborn for visualization.
What is the difference between Correlation and Causation?
Correlation means two variables move together. Causation means one variable causes the change in another. Correlation does not imply causation.
How do you prioritize tasks when working on multiple projects?
Based on business impact, deadlines, and project dependencies. Clear communication with stakeholders is key.
How do you stay updated with the latest trends in data analytics?
Follow industry blogs (Towards Data Science), take online courses (Coursera, Udemy), and practice with projects on Kaggle.
Bonus Question: What is a Normal Distribution?
A symmetric, bell-shaped probability distribution where most values cluster around the mean. It's fundamental for many statistical tests.