Filtering a set of items -to select photographs presenting certain properties for instance- is a common application of Crowd-Sourcing.
When the workers are error-prone, each item is presented to multiple users, to limit the probability of misclassification. Since the Crowd is a relatively expensive resource, minimizing the number of questions per item may naturally result in big savings. Several algorithms to address this minimization problem have been presented in the CrowdScreen framework by Parameswaran et al. However, those algorithms do not scale well and therefore cannot be used in scenarios where high accuracy is required in spite of high user error rates.
We propose optimizations of those as well as new theoretical insights that lead to scalable filtering algorithms.