office-hours

Hands on With Data Prep Kit (2025 Mar 20)

Event Details

Event sign up
🗓️: March 20, 2025 Thursday
⏰: 9 am PST / 11 am CST / 12 pm EST / 5pm GMT
Duration: 1 hour

Event recording will be available soon

Check resources - code, presentation slides ..etc

Q & A section


Agenda

Workshop: Hands-on with Data Prep Kit

Overview

When building machine learning and data applications, a significant portion of your time will be dedicated to data wrangling - from content extraction and cleaning to de-duplication and filtering out problematic data. In this hands-on session we will explore Data Prep Kit - an open source toolkit, designed to streamline these essential tasks. Attendees will learn first hand how to use the Data Prep Kit to accelerate data preparation, improve overall data quality, and enhance the efficiency of building robust LLM applications.

Description

Data Prep Kit is a comprehensive Python library that democratizes and accelerates data preparation by providing out-of-the-box solutions for common tasks. Engineered to scale from a single laptop to large cloud clusters, it has been successfully used to process terabytes of data for training IBM Granite Large Language Models (LLMs).

Data Prep Kit offers a robust feature set including duplicate elimination, advanced document and code handling, language detection (for both spoken and programming languages), removal of personally identifiable information (PII), as well as spam, hate speech, and malware detection.

More about Data Prep Kit : https://github.com/IBM/data-prep-kit

Join us for this hands-on session to explore how to use Data Prep Kit to accelerate data preparation, enhance data quality.

In this workshop we will do the following:

What do you need to participate in this workshop?

Session Type:
Hands on workshop

Audience:
LLM app developers, data scientists, data engineers

Technical Level:
Intermediate

Prerequisites:
None

Duration
45 mins

Resources

will be available soon.

Speaker: Sujee Maniyam

AI Engineer, Developer Advocate @ Node51 (Consulting for IBM / The AI Alliance)

Sujee Maniyam is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.

sujee@node51.com   •   Linkedin   •   portfolio


Q & A

Please review the session recording