# Tight Bounds for Hashing Block
Sources

Kai-Min Chung and Salil Vadhan

### Abstract

It is known that if a 2-universal hash function H is applied
to elements of a *block source* (X_{1},…,X_{T}),
where each item X_{i} has enough min-entropy conditioned on the
previous items, then the output distribution (H,H(X_{1}),…,H(X_{T}))
will be “close” to the uniform distribution. We provide improved
bounds on how much min-entropy per item is required for this to hold, both when
we ask that the output be close to uniform in statistical distance and when we
only ask that it be statistically close to a distribution with small collision
probability. In both cases, we reduce
the dependence of the min-entropy on the number T of items from 2log T in
previous work to log T, which we show to be optimal. This leads to corresponding improvements to
the recent results of Mitzenmacher and Vadhan (SODA `08) on the analysis of
hashing-based algorithms and data structures when the data items come from a
block source.

### Versions

- In
*Proceedings of the* *12th
International Workshop on Randomization and Computation (RANDOM `08),volume* 5171
of* Lecture Notes in Computer Science,
*pages 357-370.* *Springer-Verlag, 25-27 August 2008. [pdf][Springer page]
- Full version, June 2008. [pdf][arXiv:0806.1948v1]
- Slides from half-hour talk [pps]

[ back to Salil Vadhan's research]