7. Find Median from Data Stream

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.For example,

[2,3,4], the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) - Add a integer number from the data stream to the data structure.

  • double findMedian() - Return the median of all elements so far.

Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3) 
findMedian() -> 2

Solution: (Bruteforce)

Inserting into a vector. Sorting the array and finding the median Time Complexity: (n log n) for 1 query

Solution: (Insertion Sort)

Median can be accessed in O(1) time Inserting the num always in the sorted position (Finding the position and inserting) O(n)

class MedianFinder
{
public:
      vector<int> v;
    
    int findPos(vector<int> v,int num){
        for(int i=0;i<v.size();i++){
            if(num<v[i]){
                return i;
            }
        }
        return -1;
    }
    /** initialize your data structure here. */
  
    
    MedianFinder()
    {
    }

    void addNum(int num)
    {
        int x = findPos(v,num);
        if(v.size() == 0 || x == -1){
            v.push_back(num);
        }
        else{
            
            v.insert(v.begin()+x,num);
            
        }
        
        
    }

    double findMedian()
    {

        double med;
        if (v.size() % 2 == 0)
        {
            int mid = v.size() / 2;
            med = ((double)v[mid] + (double)v[mid - 1]) / 2;
        }
        else
        {
            int mid = v.size() / 2;
            med = v[mid];
        }

        return med;
    }
};

Time Complexity: O(n)

Solution: (Using Heap)

We can maintain two heap :

Two priority queues:

  1. A max-heap lo to store the smaller half of the numbers

  2. A min-heap hi to store the larger half of the numbers

Insertion will take O(log n) . The max-heap lo is allowed to store, at worst, one more element more than the min-heap hi. When the heaps are perfectly balanced, the median can be derived from the tops of both heaps. Median can be assessed in O(1) time.

Adding a number num:

  • Add num to max-heap lo. Since lo received a new element, we must do a balancing step for hi. So remove the largest element from lo and offer it to hi.

  • The min-heap hi might end holding more elements than the max-heap lo, after the previous operation. We fix that by removing the smallest element from hi and offering it to lo.

Approach: [41, 35, 62, 5, 97, 108]

Adding number 41
MaxHeap lo: [41]           // MaxHeap stores the largest value at the top (index 0)
MinHeap hi: []             // MinHeap stores the smallest value at the top (index 0)
Median is 41
=======================

Adding number 35
MaxHeap lo: [35]
MinHeap hi: [41]
Median is 38
=======================

Adding number 62
MaxHeap lo: [41, 35]
MinHeap hi: [62]
Median is 41
=======================

Adding number 4
MaxHeap lo: [35, 4]
MinHeap hi: [41, 62]
Median is 38
=======================

Adding number 97
MaxHeap lo: [41, 35, 4]
MinHeap hi: [62, 97]
Median is 41
=======================

Adding number 108
MaxHeap lo: [41, 35, 4]
MinHeap hi: [62, 97, 108]
Median is 51.5
class MedianFinder
{
public:

    /** initialize your data structure here. */
    priority_queue<int> maxh;
    priority_queue<int, vector<int>, greater<int>> minh;
    
    MedianFinder()
    {
    }

    void addNum(int num)
    {

        maxh.push(num);

        minh.push(maxh.top());
        maxh.pop();

        if (minh.size() > maxh.size())
        {
            maxh.push(minh.top());
            minh.pop();
        }
    }

    double findMedian()
    {

        if (maxh.size() > minh.size())
        {
            return maxh.top();
        }

        return ((double)(maxh.top()) + (double)(minh.top())) * 0.5;
    }
};

Time Complexity: O(log n)

Last updated