Some popular Python libraries for data analytics, like Numpy, Pandas, Scikit-Learn, etc., usually work well if the dataset fits into the RAM on a single machine. When dealing with large datasets, it could be a challenge to work around memory constraints. This course introduces scalable and accelerated data analytics with Dask and RAPIDS. Dask provides a framework and libraries that can handle large datasets on a single multi-core machine or across multiple machines on a cluster. RAPIDS, on the other hand, can accelerate your data analytics by offloading analytics workloads to GPUs with less effort in code changes.
Level: Introductory
Length: Two 3-Hour Sessions (2 Days)
Format: Lecture + Hands-on
Prerequisites:
- Alliance Account
- Basic Python and Linux command line experience.
- Teacher: Jinhui Qin