Top 75+ DBMS Interview Questions and Answers(2026 Updated)

To consolidate your knowledge and concepts in DBMS, here we've listed the most commonly asked DBMS interview questions to help you ace your interview!

We have classified them into the following sections:

DBMS Basic Interview Questions

1. What is DBMS and what is its utility? Explain RDBMS with examples.

DBMS stands for Database Management System, is a set of applications or programs that enable users to create and maintain a database. DBMS provides a tool or an interface for performing various operations such as inserting, deleting, updating, etc. into a database. It is software that enables the storage of data more compactly and securely as compared to a file-based system. A DBMS system helps a user to overcome problems like data inconsistency, data redundancy, etc. in a database and makes it more convenient and organized to use it. Check this DBMS Tutorial by Scaler Topics.

Examples of popular DBMS systems are file systems, XML, Windows Registry, etc.

RDBMS stands for Relational Database Management System and was introduced in the 1970s to access and store data more efficiently than DBMS. RDBMS stores data in the form of tables as compared to DBMS which stores data as files. Storing data as rows and columns makes it easier to locate specific values in the database and makes it more efficient as compared to DBMS.

Examples of popular RDBMS systems are MySQL, Oracle DB, etc.

Learn More

2. What is a Database?

A Database is an organized, consistent, and logical collection of data that can easily be updated, accessed, and managed. Database mostly contains sets of tables or objects (anything created using create command is a database object) which consist of records and fields. A tuple or a row represents a single entry in a table. An attribute or a column represents the basic units of data storage, which contain information about a particular aspect of the table. DBMS extracts data from a database in the form of queries given by the user.

3. Mention the issues with traditional file-based systems that make DBMS a better choice?

The absence of indexing in a traditional file-based system leaves us with the only option of scanning the full page and hence making the access of content tedious and super slow. The other issue is redundancy and inconsistency as files have many duplicate and redundant data and changing one of them makes all of them inconsistent. Accessing data is harder in traditional file-based systems because data is unorganized in them.

Another issue is the lack of concurrency control, which leads to one operation locking the entire page, as compared to DBMS where multiple operations can work on a single file simultaneously.

Integrity check, data isolation, atomicity, security, etc. are some other issues with traditional file-based systems for which DBMSs have provided some good solutions.

4. Explain a few advantages of a DBMS.

Following are the few advantages of using a DBMS.

Data Sharing: Data from a single database can be simultaneously shared by multiple users. Such sharing also enables end-users to react to changes quickly in the database environment.
Integrity constraints: The existence of such constraints allows storing of data in an organized and refined manner.
Controlling redundancy in a database: Eliminates redundancy in a database by providing a mechanism that integrates all the data in a single database.
Data Independence: This allows changing the data structure without altering the composition of any of the executing application programs.
Provides backup and recovery facility: It can be configured to automatically create the backup of the data and restore the data in the database whenever required.
Data Security: DBMS provides the necessary tools to make the storage and transfer of data more reliable and secure. Authentication (the process of giving restricted access to a user) and encryption (encrypting sensitive data such as OTP, credit card information, etc.) are some popular tools used to secure data in a DBMS.

5. Explain different languages present in DBMS.

Following are various languages present in DBMS:

DDL(Data Definition Language): It contains commands which are required to define the database.
E.g., CREATE, ALTER, DROP, TRUNCATE, RENAME, etc.
DML(Data Manipulation Language): It contains commands which are required to manipulate the data present in the database.
E.g., SELECT, UPDATE, INSERT, DELETE, etc.
DCL(Data Control Language): It contains commands which are required to deal with the user permissions and controls of the database system.
E.g., GRANT and REVOKE.
TCL(Transaction Control Language): It contains commands which are required to deal with the transaction of the database.
E.g., COMMIT, ROLLBACK, and SAVEPOINT.

6. What is meant by ACID properties in DBMS?

ACID stands for Atomicity, Consistency, Isolation, and Durability in a DBMS these are those properties that ensure a safe and secure way of sharing data among multiple users.

Atomicity: This property reflects the concept of either executing the whole query or executing nothing at all, which implies that if an update occurs in a database then that update should either be reflected in the whole database or should not be reflected at all.

Consistency: This property ensures that the data remains consistent before and after a transaction in a database.

Isolation: This property ensures that each transaction is occurring independently of the others. This implies that the state of an ongoing transaction doesn’t affect the state of another ongoing transaction.

Durability: This property ensures that the data is not lost in cases of a system failure or restart and is present in the same state as it was before the system failure or restart.

7. Are NULL values in a database the same as that of blank space or zero?

No, a NULL value is very different from that of zero and blank space as it represents a value that is assigned, unknown, unavailable, or not applicable as compared to blank space which represents a character and zero represents a number.

Example: NULL value in “number_of_courses” taken by a student represents that its value is unknown whereas 0 in it means that the student hasn’t taken any courses.

Intermediate DBMS Interview Questions

8. What is meant by normalization and denormalization?

Normalization is a process of reducing redundancy by organizing the data into multiple tables. Normalization leads to better usage of disk spaces and makes it easier to maintain the integrity of the database.

Denormalization is the reverse process of normalization as it combines the tables which have been normalized into a single table so that data retrieval becomes faster. JOIN operation allows us to create a denormalized form of the data by reversing the normalization.

9. What is a lock. Explain the major difference between a shared lock and an exclusive lock during a transaction in a database.

A database lock is a mechanism to protect a shared piece of data from getting updated by two or more database users at the same time. When a single database user or session has acquired a lock then no other database user or session can modify that data until the lock is released.

Shared Lock: A shared lock is required for reading a data item and many transactions may hold a lock on the same data item in a shared lock. Multiple transactions are allowed to read the data items in a shared lock.
Exclusive lock: An exclusive lock is a lock on any transaction that is about to perform a write operation. This type of lock doesn’t allow more than one transaction and hence prevents any inconsistency in the database.

10. What is Data Warehousing?

The process of collecting, extracting, transforming, and loading data from multiple sources and storing them in one database is known as data warehousing. A data warehouse can be considered as a central repository where data flows from transactional systems and other relational databases and is used for data analytics. A data warehouse comprises a wide variety of an organization’s historical data that supports the decision-making process in an organization.

11. Explain different levels of data abstraction in a DBMS.

The process of hiding irrelevant details from users is known as data abstraction. Data abstraction can be divided into 3 levels:

Physical Level: it is the lowest level and is managed by DBMS. This level consists of data storage descriptions and the details of this level are typically hidden from system admins, developers, and users.
Conceptual or Logical level: it is the level on which developers and system admins work and it determines what data is stored in the database and what is the relationship between the data points.
External or View level: it is the level that describes only part of the database and hides the details of the table schema and its physical storage from the users. The result of a query is an example of View level data abstraction. A view is a virtual table created by selecting fields from one or more tables present in the database.

12. What is meant by an entity-relationship (E-R) model? Explain the terms Entity, Entity Type, and Entity Set in DBMS.

An entity-relationship model is a diagrammatic approach to a database design where real-world objects are represented as entities and relationships between them are mentioned.

Entity: An entity is defined as a real-world object having attributes that represent characteristics of that particular object. For example, a student, an employee, or a teacher represents an entity.
Entity Type: An entity type is defined as a collection of entities that have the same attributes. One or more related tables in a database represent an entity type. Entity type or attributes can be understood as a characteristic which uniquely identifies the entity. For example, a student represents an entity that has attributes such as student_id, student_name, etc.
Entity Set: An entity set can be defined as a set of all the entities present in a specific entity type in a database. For example, a set of all the students, employees, teachers, etc. represent an entity set.

13. Explain different types of relationships amongst tables in a DBMS.

Following are different types of relationship amongst tables in a DBMS system:

One to One Relationship: This type of relationship is applied when a particular row in table X is linked to a singular row in table Y.

One to Many Relationship: This type of relationship is applied when a single row in table X is related to many rows in table Y.

Many to Many Relationship: This type of relationship is applied when multiple rows in table X can be linked to multiple rows in table Y.

Self Referencing Relationship: This type of relationship is applied when a particular row in table X is associated with the same table.

14. Explain the difference between intension and extension in a database.

Following is the major difference between intension and extension in a database:

Intension: Intension or popularly known as database schema is used to define the description of the database and is specified during the design of the database and mostly remains unchanged.
Extension: Extension on the other hand is the measure of the number of tuples present in the database at any given point in time. The extension of a database is also referred to as the snapshot of the database and its value keeps changing as and when the tuples are created, updated, or destroyed in a database.

15. Explain the difference between the DELETE and TRUNCATE command in a DBMS.

DELETE command: this command is needed to delete rows from a table based on the condition provided by the WHERE clause.

It deletes only the rows which are specified by the WHERE clause.
It can be rolled back if required.
It maintains a log to lock the row of the table before deleting it and hence it’s slow.

TRUNCATE command: this command is needed to remove complete data from a table in a database. It is like a DELETE command which has no WHERE clause.

It removes complete data from a table in a database.
It can't be rolled back even if required. ( truncate can be rolled back in some databases depending on their version but it can be tricky and can lead to data loss). Check this link for more details
It doesn’t maintain a log and deletes the whole table at once and hence it’s fast.

Advanced DBMS Interview Questions

16. Explain the difference between a 2-tier and 3-tier architecture in a DBMS.

The 2-tier architecture refers to the client-server architecture in which applications at the client end directly communicate with the database at the server end without any middleware involved.
Example – Contact Management System created using MS-Access or Railway Reservation System, etc.

The above picture represents a 2-tier architecture in a DBMS.

The 3-tier architecture contains another layer between the client and the server to provide GUI to the users and make the system much more secure and accessible. In this type of architecture, the application present on the client end interacts with an application on the server end which further communicates with the database system.

Example – Designing registration form which contains a text box, label, button or a large website on the Internet, etc.

The above picture represents a 3-tier architecture in a DBMS.

Recommended Tutorials:

17. Explain different types of keys in a database.

There are mainly 7 types of keys in a database:

Candidate Key: The candidate key represents a set of properties that can uniquely identify a table. Each table may have multiple candidate keys. One key amongst all candidate keys can be chosen as a primary key. In the below example since studentId and firstName can be considered as a Candidate Key since they can uniquely identify every tuple.
Super Key: The super key defines a set of attributes that can uniquely identify a tuple. Candidate key and primary key are subsets of the super key, in other words, the super key is their superset.

Primary Key: The primary key defines a set of attributes that are used to uniquely identify every tuple. In the below example studentId and firstName are candidate keys and any one of them can be chosen as a Primary Key. In the given example studentId is chosen as the primary key for the student table.
Unique Key: The unique key is very similar to the primary key except that primary keys don’t allow NULL values in the column but unique keys allow them. So essentially unique keys are primary keys with NULL values.
Alternate Key: All the candidate keys which are not chosen as primary keys are considered as alternate Keys. In the below example, firstname and lastname are alternate keys in the database.
Foreign Key: The foreign key defines an attribute that can only take the values present in one table common to the attribute present in another table. In the below example courseId from the Student table is a foreign key to the Course table, as both, the tables contain courseId as one of their attributes.
Composite Key: A composite key refers to a combination of two or more columns that can uniquely identify each tuple in a table. In the below example the studentId and firstname can be grouped to uniquely identify every tuple in the table.

18. Explain different types of Normalization forms in a DBMS.

Following are the major normalization forms in a DBMS:

Considering the above Table-1 as the reference example for understanding different normalization forms.

1NF: It is known as the first normal form and is the simplest type of normalization that you can implement in a database. A table to be in its first normal form should satisfy the following conditions:
- Every column must have a single value and should be atomic.
- Duplicate columns from the same table should be removed.
- Separate tables should be created for each group of related data and each row should be identified with a unique column.

Table-1 converted to 1NF form

2NF: It is known as the second normal form. A table to be in its second normal form should satisfy the following conditions:
- The table should be in its 1NF i.e. satisfy all the conditions of 1NF.
- Every non-prime attribute of the table should be fully functionally dependent on the primary key i.e. every non-key attribute should be dependent on the primary key in such a way that if any key element is deleted then even the non_key element will be saved in the database.

Breaking Table-1 into 2 different tables to move it to 2NF.

3NF: It is known as the third normal form. A table to be in its third normal form should satisfy the following conditions:
- The table should be in its 2NF i.e. satisfy all the conditions of 2NF.
- There is no transitive functional dependency of one attribute on any attribute in the same table.

Breaking Table-1 into 3 different tables to move it to 3NF.

BCNF: BCNF stands for Boyce-Codd Normal Form and is an advanced form of 3NF. It is also referred to as 3.5NF for the same reason. A table to be in its BCNF normal form should satisfy the following conditions:
- The table should be in its 3NF i.e. satisfy all the conditions of 3NF.
- For every functional dependency of any attribute A on B
  (A->B), A should be the super key of the table. It simply implies that A can’t be a non-prime attribute if B is a prime attribute.

DBMS Indexing & Query Optimization

19. What is selectivity/cardinality and why does it matter for indexing?

Selectivity refers to how unique the values in a column are. High selectivity means most values are unique, like user IDs. Low selectivity means many rows share the same value, like status flags. Indexes work best on high-selectivity columns. Indexing low-selectivity columns often gives little benefit. Understanding data distribution helps decide which columns should be indexed.

20. What is an index and why can indexing slow down INSERT/UPDATE/DELETE?

An index is a data structure that helps the database find rows faster without scanning the full table. It improves read performance by reducing the amount of data searched. However, every time you insert, update, or delete data, the index also needs to be updated. This extra work slows down write operations. The more indexes a table has, the slower writes become. That’s why indexing is always a balance between read speed and write performance.

21. Difference between clustered and non-clustered index

A clustered index defines the physical order of data in the table. Because of this, a table can have only one clustered index. Non-clustered indexes store a separate structure that points to the actual rows. They don’t change how data is stored on disk. Clustered indexes are faster for range queries, while non-clustered indexes are better for lookups. Choosing the wrong type can lead to inefficient queries.

22. When does a query not use an index even if one exists?

A query may ignore an index if the optimizer thinks a full table scan is faster. This often happens when a large percentage of rows match the condition. Using functions on indexed columns can also prevent index usage. Mismatched data types or implicit conversions can break index access. Poorly written queries confuse the optimizer. Indexes help only when the query structure allows them to be used.

23. What is a composite index and the leftmost prefix rule?

A composite index is an index created on multiple columns together. The leftmost prefix rule means the index is used only if the query filters starting from the first column. For example, an index on (user_id, order_date) works for queries filtering by user_id, but not only by order_date. Column order matters a lot in composite indexes. Designing them without understanding query patterns leads to wasted indexes. Always match index order with real query usage.

24. What is a covering index and how does it reduce table lookups?

A covering index contains all the columns needed by a query. This allows the database to fetch results directly from the index without accessing the table. As a result, fewer disk reads are needed. This significantly improves performance for read-heavy queries. Covering indexes are especially useful for reporting queries. However, they increase index size and slow down writes.

25. What is an execution plan and what do you check first?

An execution plan shows how the database executes a query step by step. It tells you which indexes are used and how tables are accessed. The first thing to check is whether indexes are being used or ignored. Look for full table scans on large tables. Also check estimated vs actual row counts. A bad execution plan usually points to missing or incorrect indexes.

26. Index scan vs index seek and when they occur

An index seek happens when the database jumps directly to matching rows using the index. This is fast and efficient. An index scan means the database scans a large part or all of the index. Scans happen when many rows match the condition. Seeks usually indicate good indexing and selective filters. Scans are not always bad, but frequent scans on large datasets can hurt performance.

27. What is index fragmentation or bloat and its impact?

Index fragmentation happens when index pages become disorganized over time. This is caused by frequent inserts, updates, and deletes. Fragmented indexes require more disk reads. As a result, query performance slowly degrades. Rebuilding or reorganizing indexes helps fix this. Regular maintenance is important for long-running production systems.

28. Common production reasons for slow queries and debugging flow

Slow queries are often caused by missing indexes, bad joins, or large data growth. Sometimes queries were fast initially but slow down as data increases. The first step is checking query execution time and execution plans. Then look for full scans, high I/O, or blocking locks. Index tuning usually fixes most issues. Performance debugging is about data size, not just SQL syntax.

DBMS Security & Governance Interview Question

29. Risk of giving apps superuser DB permissions

Superuser access allows full control over the database. If an app is compromised, attackers gain unlimited power. This can lead to data loss or system damage. It also increases the impact of bugs or bad queries. Production apps should never use superuser accounts. Limiting permissions reduces risk significantly.

30. What is SQL injection and how do parameterized queries prevent it?

SQL injection happens when user input is treated as part of a SQL command. An attacker can modify the query to access or change data. Parameterized queries separate the SQL logic from user input. The database treats inputs only as values, not executable code. This completely blocks injection attacks. It is the safest and most common defense.

31. 2. DB authentication vs authorization and roles in practice

Authentication checks who you are, usually using a username and password. Authorization decides what actions you are allowed to perform. Roles group permissions like read, write, or admin access. Users are assigned roles instead of individual privileges. This makes access easier to manage and audit. Most production systems rely heavily on role-based access.

32. What is row-level security and a real use-case?

Row-level security restricts which rows a user can see in a table. The same query returns different results for different users. A common use-case is multi-tenant systems. Each customer sees only their own data. The logic is enforced by the database itself. This prevents accidental data leaks at the app layer.

33. Encryption at rest vs encryption in transit

Encryption at rest protects data stored on disk. It prevents data theft if disks or backups are accessed. Encryption in transit protects data moving between the app and database. It uses protocols like TLS. Both are required for full security. One does not replace the other.

34. How do you store DB credentials safely?

DB credentials should never be hardcoded in source code. They are stored in a secrets manager or secure vault. Applications fetch credentials at runtime. Access is controlled using IAM or service identities. Secrets can be rotated without code changes. This reduces the risk of leaks.

35. What does least privilege mean in DB access control?

Least privilege means giving only the permissions that are strictly required. Applications should not have admin or schema-altering access. Read-only users should not be able to write data. This limits damage if credentials are compromised. It also reduces the chance of human error. Least privilege is a basic security principle.

36. What is auditing and what should be logged?

Auditing tracks important actions performed in the database. This includes logins, permission changes, and schema updates. Sensitive data access is also commonly logged. Audit logs help in security investigations and compliance. They provide visibility into who did what and when. Logs must be protected from tampering.

37. How do you protect PII data?

PII can be protected using masking, tokenization, or encryption. Masking hides parts of the data for non-privileged users. Tokenization replaces sensitive values with safe references. Encryption protects data at rest and in backups. Different techniques are used for different access needs. The goal is minimizing exposure.

DBMS SQL & Querying Interview Questions

38. Difference between WHERE and HAVING

WHERE is used to filter rows before any grouping or aggregation happens in the query. It works on individual records and cannot use aggregate functions like COUNT, SUM, or AVG. HAVING is applied after GROUP BY and is meant for filtering aggregated results. For example, filtering orders by status goes in WHERE, but filtering customers with total orders greater than 5 goes in HAVING. Using HAVING instead of WHERE usually makes queries slower and harder to read. A good rule is to filter early using WHERE whenever possible.

39. INNER JOIN vs LEFT JOIN and when LEFT JOIN behaves like INNER JOIN

An INNER JOIN returns only rows that exist in both tables. A LEFT JOIN returns all rows from the left table, even if there is no matching row on the right side. However, a LEFT JOIN can accidentally behave like an INNER JOIN if you add conditions on the right table in the WHERE clause. This removes rows where the right table values are NULL. To avoid this issue, conditions related to the right table should be written inside the ON clause. This mistake is very common in real production queries.

40. What causes duplicate rows after a JOIN, and how to fix it

Duplicate rows usually appear when one row in a table matches multiple rows in another table. This often happens in one-to-many relationships, like customers and orders. SQL is doing the correct thing, but the result may look wrong if you are not expecting multiple matches. To fix this, you can aggregate the data before joining or ensure you are joining on the correct keys. Sometimes DISTINCT helps, but it can hide real data issues. Understanding the data structure is more important than forcing uniqueness.

41. What is a correlated subquery, and when is slower than a JOIN

A correlated subquery is a subquery that depends on values from the outer query. It runs once for each row returned by the main query. Because of this repeated execution, it can be slow on large datasets. In many cases, the same logic can be written using a JOIN, which executes more efficiently. Correlated subqueries are easier to write for simple logic, but can cause performance issues. They are best avoided in high-volume production queries.

42. UNION vs UNION ALL from a performance viewpoint

UNION combines results from multiple queries and removes duplicate rows from the final output. To do this, the database has to sort and compare rows, which takes extra time. UNION ALL simply appends the results without checking for duplicates. Because of this, UNION ALL is faster and uses fewer resources. If you are sure that the datasets do not overlap, UNION ALL should always be preferred. In real-world systems, UNION ALL is commonly used for better performance.

43. How to write Top-N per group queries

One common approach is using window functions like ROW_NUMBER() with PARTITION BY. This allows ranking rows within each group and then filtering the top N rows. Another approach is using a correlated subquery that compares values within the same group. Older databases often rely on subqueries, while modern systems prefer window functions. Window functions are clearer, faster, and easier to maintain. The best approach depends on database support and data size.

44. What is a CTE and how it differs from a subquery

A CTE, written using the WITH clause, is a temporary named result set used within a query. It makes complex queries easier to read and understand compared to nested subqueries. CTEs can be referenced multiple times in the same query, which improves clarity. Recursive CTEs are used to handle hierarchical data like employee reporting structures. Some databases materialize CTEs, which can impact performance. CTEs are mainly a readability and maintainability feature.

45. What is a VIEW and when should you avoid it

A view is a stored SQL query that behaves like a virtual table. It is used to simplify complex queries and provide consistent access to data. Views do not store data, so the underlying query runs every time the view is accessed. This can cause performance issues for large or complex queries. Views can also hide inefficient SQL, making debugging harder. Avoid using views for heavy calculations or frequently accessed large datasets.

46. What is a materialized view and its trade-offs

A materialized view stores the actual result of a query instead of recalculating it every time. This makes read operations very fast and is useful for reporting and analytics. The main drawback is that the data can become stale. Materialized views need to be refreshed, either manually or on a schedule. Refreshing them can be expensive and impact write performance. They are best used when fast reads are more important than real-time accuracy.

Replication, Sharding & Distributed DB Concepts

47. What is partitioning or sharding and when should you shard?

Sharding splits data across multiple databases based on a key. It is used when a single database can no longer handle the load or data size. Sharding improves scalability by distributing reads and writes. However, it increases system complexity. Queries across shards become harder. Teams usually shard only after vertical scaling is no longer enough.

48. What is replication lag and how should apps handle it?

Replication lag is the delay between when data is written on the primary and when it appears on replicas. It can be detected using timestamps, log positions, or monitoring tools. Apps must not assume replicas are always up to date. Critical reads should go to the primary database. Less critical reads can tolerate some delay. Handling lag correctly avoids user-facing inconsistencies.

49. Leader–follower vs multi-leader replication

In leader–follower replication, one node handles writes and others only replicate changes. It is simpler and safer but has a single write bottleneck. Multi-leader replication allows writes on multiple nodes. This improves write availability but introduces conflict risks. Resolving conflicts adds complexity and can cause data inconsistencies. Most teams prefer leader–follower unless multi-region writes are required.

50. What is replication, and why do teams add read replicas?

Replication means copying data from one database to one or more other databases. The main reason teams add read replicas is to scale read traffic. Reads can be sent to replicas while writes go to the primary database. This reduces load on the main system. Replicas also help with availability if the primary has issues.

51. Hash-based vs range-based sharding

Hash-based sharding distributes data evenly using a hash function. It avoids hot spots but makes range queries harder. Range-based sharding groups data by value ranges like date or ID. This is good for range queries but can cause uneven load. The choice depends on query patterns. Workload characteristics should guide the sharding strategy.

52. What is a hot shard or partition and how do you mitigate it?

A hot shard occurs when too much traffic hits a single shard. This usually happens due to skewed data or popular keys. Hot shards cause performance issues and uneven load. Mitigation includes better shard keys, adding randomness, or splitting hot shards. Caching hot data also helps. Monitoring is key to early detection.

53. What is eventual consistency in simple terms?

Eventual consistency means data will become consistent over time, not immediately. Different nodes may temporarily show different values. This happens because updates take time to propagate. The system prioritizes availability and performance over immediate accuracy. Eventual consistency works well for social feeds or analytics. It is not suitable for financial transactions.

54. Strong reads vs stale reads and when to use stale reads

Strong reads always return the latest committed data. They usually go to the primary database. Stale reads may return slightly outdated data from replicas. Stale reads are faster and more scalable. They are acceptable for non-critical data like dashboards. Choosing the right type depends on business correctness needs.

55. Why are distributed transactions hard and how do teams avoid them?

Distributed transactions span multiple databases or services. They require coordination and can fail partially. This makes error handling and recovery complex. Two-phase commit is slow and fragile at scale. Teams avoid distributed transactions by redesigning workflows. Event-driven systems and eventual consistency are common alternatives.

56. Designing a DB setup for multi-region high availability

Multi-region setups replicate data across geographic locations. One region usually acts as the primary for writes. Others serve reads and act as failover targets. Automated failover and health checks are critical. Latency and replication lag must be handled at the app level. The design balances availability, consistency, and cost.

Storage, Logging & Recovery

57. How do you design safe schema migrations with minimal downtime?

Safe migrations avoid locking tables for long periods. Changes are broken into small, backward-compatible steps. New columns are added first, then code is updated, and old columns are removed later. Migrations are tested on production-like data before rollout. This approach reduces downtime and rollback risk.

58. What are RPO and RTO and why interviewers ask them?

RPO defines how much data loss is acceptable in a failure. RTO defines how long the system can stay down. These metrics connect database design to business requirements. Interviewers ask this to check real-world thinking. It shows you understand backups, recovery, and system reliability. Databases are not just about queries.

59. Difference between replication and backup

Replication copies data continuously to another system, usually for availability. It helps keep systems running if one node fails. Backups are snapshots taken at intervals for recovery purposes. Replication does not protect against bad data changes. Backups allow restoring old correct data. Both are needed but serve different goals.

60. What happens during crash recovery (high-level steps)?

When a crash occurs, the database starts by reading the transaction logs. It redoes committed transactions that were not written to disk. Then it undoes uncommitted transactions to remove partial changes. This process restores the database to a consistent state. Recovery happens automatically when the database restarts.

61. Redo log vs undo log (conceptual difference)

Redo logs record changes that need to be reapplied after a crash. They help complete committed transactions during recovery. Undo logs store old values so changes can be rolled back. They are used when transactions fail or are aborted. Redo ensures durability, while undo ensures consistency. Both work together to keep data correct.

62. What are checkpoints and how do they reduce recovery time?

Checkpoints flush modified data from memory to disk at regular intervals. This reduces the amount of log data needed during recovery. After a crash, the database starts recovery from the last checkpoint instead of the beginning. This significantly shortens startup time. Frequent checkpoints improve recovery speed but can add overhead.

63. What is Point-in-Time Recovery (PITR)?

Point-in-Time Recovery allows restoring the database to an exact moment in the past. It uses a full backup along with transaction logs like WAL. This is useful when data is accidentally deleted or corrupted. Instead of restoring to the last backup, you recover just before the mistake happened. PITR is essential for real-world failure recovery.

64. Full vs incremental backups and when to use each

A full backup captures the entire database at a specific point in time. It is simple to restore but takes more time and storage. Incremental backups store only the changes since the last backup. They are faster and smaller but require multiple steps to restore. Full backups are used periodically, while incremental backups are taken frequently. Most production systems use a mix of both.

65. Difference between logical backup and physical backup

A logical backup stores data as SQL statements or logical records, like tables and rows. It is portable and can be restored on different systems or versions. A physical backup copies raw database files as they are on disk. Physical backups are much faster to restore but less flexible. Logical backups are easier to inspect, while physical backups are better for large databases.

66. What is Write-Ahead Logging (WAL) and why is it critical for durability?

Write-Ahead Logging means all changes are first written to a log before being applied to the actual data files. This ensures that even if the system crashes, the database knows what changes were intended. During recovery, the database replays the log to restore consistency. WAL is critical because it guarantees committed data is not lost. Without WAL, crashes could corrupt data permanently.

Transactions, Isolation & Concurrency

67. When should you use a read-only transaction or snapshot?

Read-only transactions are useful when you need consistent data for reports. Snapshot reads show data as it existed at a specific time. They prevent data changes from affecting long-running reads. Writers are not blocked during these operations. This improves concurrency and consistency.

68. What is a transaction and what does autocommit mean?

A transaction is a set of database operations treated as one logical unit. All changes succeed together or are rolled back together. Autocommit means each statement is automatically committed as soon as it runs. In this mode, every query acts like its own transaction. For multi-step operations, autocommit is usually turned off.

69. Explain the four isolation levels

Read Uncommitted allows reading uncommitted data and can give incorrect results. Read Committed ensures only committed data is visible, but values can change between reads. Repeatable Read guarantees that rows already read won’t change during the transaction. Serializable is the strictest and behaves as if transactions run one by one. Higher isolation reduces concurrency but improves consistency.

70. Dirty read, non-repeatable read, and phantom read

A dirty read happens when a transaction reads uncommitted data that may be rolled back. A non-repeatable read occurs when the same row shows different values within a transaction. Phantom reads happen when new rows appear in repeated queries with the same condition. These issues occur due to concurrent data access. Isolation levels exist to control these problems.

71. What is a lost update and how do you prevent it?

A lost update happens when two transactions update the same data based on an old value. The second update overwrites the first without any warning. This usually occurs under low isolation. It can be prevented using proper isolation levels or version checks. Optimistic locking is a common solution.

72. Optimistic locking vs pessimistic locking

Optimistic locking assumes conflicts are rare and checks for changes before committing. It usually relies on a version number or timestamp. Pessimistic locking locks data upfront to block other transactions. This avoids conflicts but reduces concurrency. Optimistic locking suits read-heavy systems.

73. What is MVCC and how does it improve read concurrency?

MVCC allows multiple versions of data to exist at the same time. Readers see a consistent snapshot without blocking writers. Writers create new versions instead of modifying existing rows. This greatly improves performance under heavy read load. Old versions are cleaned up later.

74. What is a deadlock and how do databases resolve it?

A deadlock happens when transactions wait on each other’s locks indefinitely. Each transaction holds a lock the other needs. Databases detect deadlocks by checking for wait cycles. One transaction is rolled back to resolve the issue. The rolled-back transaction can be retried.

75. What is lock escalation and why can it slow systems down?

Lock escalation replaces many small locks with a larger lock. For example, row-level locks may become a table-level lock. This reduces lock overhead but blocks more queries. As a result, system-wide slowdown can occur. It usually happens under heavy load.

76. What is write skew and why is it tricky?

Write skew occurs when two transactions read the same data and update different rows. Each transaction sees valid data, but the final result breaks a rule. This can happen even under Repeatable Read isolation. Since rows aren’t directly updated, locks don’t prevent it. Serializable isolation is usually required.

Free Mock Assessment

Powered by Scaler

Personalised feedback report with solutions

Real life Interview Questions

Identify exact topics to improve

Log in to your account