Skip to content

File Organization & Indexing

Short Notes

Storage of relations on physical disk and data structures to speed up retrieval.

Index Types

  • Primary Index: Ordered file, search key is primary key. (Sparse).
  • Clustering Index: Ordered file, search key is non-key. (Sparse).
  • Secondary Index: Unordered file or search key is not the ordering key. (Dense).

Key Theories & Formulas

1. B-Trees and B+ Trees

  • B-Tree: Data pointers in all nodes.
  • B+ Tree: Data pointers only in leaf nodes. Leaves are linked. (Better for range queries).
  • Order (\(p\)): Max number of pointers in a node.
  • Leaf Node size (B+ Tree): \((p \cdot \text{KeySize}) + (p \cdot \text{DataPtr}) + \text{NodePtr} \le \text{BlockSize}\).

Example Problems

Problem: How many levels in a B+ tree with 1 million records and order 100?

  • \(100^h \ge 10^6 \implies h=3\). Result: 3 levels (excluding root maybe).

Hardest GATE Questions

Topic: Node Capacity Calculation and Splitting Tricky Question (GATE 2011/2015/2021): Calculate the max order of a B+ tree node given Block Size, Key size, Pointer size.

  • Analysis:
  • Internal Node: \((p \cdot P) + (p-1) \cdot K \le B\).
  • Solving for \(p\) often gives a non-integer, you must floor it (\(p = \lfloor \dots \rfloor\)).
  • The "Trap": Leaf node vs Internal node structure.
  • Leaves in B+ trees have a "next leaf pointer", so their capacity formula is slightly different: \(p(K+P) + P_{next} \le B\).
  • Hard Aspect: Number of blocks accessed for a query.
  • For an index: \(Height + 1\) (for data block access).
  • Complexity: Static vs Dynamic Hashing.
  • Extendible Hashing: Uses a directory. Directory size doubles when a bucket overflows and local depth equals global depth

References