NLP A-to-Z: From data collection to a trained model

In these talks you will learn how to gather quality data starting from data scraping and finishing with building the full model.

Watch on demand
Watch on demand
Watch on demand


Data is everywhere. It is used in every aspect of our life - health, insurance, finance, traveling, science, you name it. Collecting quality data in an efficient way is a challenging task — we need to define the right target in advance, aim at the right audience, and make sure the data quality is optimal. Good data can be used to train ML models which later serve the above applications. In these talks you will learn how to gather quality data starting from data scraping and data annotation and finishing with building the full model and its effect on results quality.

Data scraping

Fact: The internet is the largest database ever created. It is where our market, industry and the public’s reality happen by the second. To remain competitive and relevant every company, organization and business must tap into web data. Even the once most reluctant organizations have now turned to web data such as banks, finance services and more.

In this session, Bright Data Director of Data Products, Itamar Abramovich will discuss and show real-life examples of why and how web data has made and is still making a huge difference in a company’s growth strategy.

Bright Data is the industry-leading web data platform with over 15,000 customers and partners from across every industry. The company has made it its mission to deliver quality, reliable public web data with ease and simplicity.

Join this expert presentation to learn from up-close how web data can help you solve some of your most critical challenges today.

Itamar Abramovich
BrightDataDirector of Data ProductsProfile link

Off-the-shelf solutions will only get you so far

While knowledge is power, it’s often fragmented, disorganized, and inaccessible. SparkBeyond’s Knowledge Mining system parses the web’s wealth of unstructured data to deliver contextual answers to high-stakes problems. Our knowledge graph generator, Knomi, strings together multiple search engines with SOTA language models to produce structured responses. Off-the-shelf models produced impressive results yet for many applications the accuracy was not sufficient.

We needed to adapt the models to our needs. We will describe how we incorporate real-life labeled data to add a bespoke model on top of the off-the-shelf models. We will discuss the importance of diverse datasets and how this work improved accuracy and customer satisfaction.

Shay Hummel
SparkBeyondDirector of Knowledge MiningProfile link

End-to-end question answering on a handheld device for the benefit of people with reading difficulties

Dyslexia affects 15-20% of the world's population; it is a language-based learning disability that results in difficulties with specific language skills, particularly reading. Dyslexic people usually experience difficulties with other language skills such as spelling, pronouncing words, and reading comprehension.

In this talk, I will present a question-answering feature that helps to improve comprehension capabilities. Using a voiced-based interface, the user can query physical documents (e.g., books, newspapers) captured by the OrCam device, and the answer is played through the speakers. This feature incorporates models from multiple domains such as computer vision (CV), optical character recognition (OCR), automatic speech recognition (ASR), natural language processing (NLP), and text to speech (TTS).

Tal Rosenwein
OrCamVP of R&D, AI and AlgorithmsProfile link

Toloka with Adaptive ML models

David Baron
TolokaRegional Account LeadProfile link

Registration form

Whether you're looking for data solutions, have an idea for working together,
or need help getting started — sign up to join us!

Similar webinars

Don't miss out

Be sure to attend our informative workshops,
tutorials, and webinars.