In this tutorial we present a portion of our six-year experience in solving real-world tasks that combine efforts made by humans and machines.
Modern Web applications employ sophisticated Machine Learning models to rank news, posts, products, and other items presented to the users or contributed by them. To keep these models useful, one has to constantly train, evaluate, and monitor these models using freshly annotated data, which can be done using crowdsourcing.
In this tutorial we present a portion of our six-year experience in solving real-world tasks with human-in-the-loop pipelines that combine efforts made by humans and machines. We introduce data labeling via public crowdsourcing marketplaces and present the critical components of efficient data labeling. Then, we run a practical session, where participants address a challenging real-world Information Retrieval for e-Commerce task, experiment with selecting settings for the labeling process, and launch their label collection project on real crowds within the tutorial session. We present useful quality control techniques and provide the attendees with an opportunity to discuss their annotation ideas. Methods and techniques described in this tutorial can be applied to any crowdsourced data and are not bound to any specific crowdsourcing platform.
The role of Human-in-the-Loop in building Search engines
Ranking and Quality Metrics
Break
Human-in-the-Loop Essentials
Hands-On Practice Session
Lunch Break
Results aggregation and implementation into the ML pipeline
Metric computation
Break
Results discussion and Conclusion
Retail Week Live 2023
Next-level ecommerce: A winning formula to surpass your competitors
Hosts:
Data Council Austin 2023
How to ensure your model does not drift? From Human-in-the-Loop concept to building fully adaptive ML models using crowdsourcing
Hosts: