CS 488/588 – Applied Data Science with Python¶
Course Syllabus¶
Course Description¶
Textbooks¶
There are no required textbooks for this course.
Learning Outcomes¶
Upon the completion of the course, the students should demonstrate the ability to:
Attain proficiency with commonly used Python frameworks for managing the life cycle of data science projects.
Develop pipelines for integrating data from multiple sources, designing predictive models, and deploying the models.
Apply Python tools for data collection, analysis, and visualization, such as NumPy, Pandas, Matplotlib, and Seaborn, to real-world datasets.
Implement machine learning algorithms for image processing, natural language processing, and time series analysis using Python-based frameworks, such as Scikit-Learn, Keras, TensorFlow, and PyTorch.
Understand the principles of model selection and evaluation, including hyperparameter tuning, cross-validation, and regularization.
Understand the primary characteristics of current Python libraries for deployment, continuous integration, and monitoring of data science projects.
Deploy data science projects as web applications using Flask, FastAPI, and Django, and to cloud servers using Microsoft’s Azure platform.
Prerequisites¶
The course requires to have basic programming skills in Python. While having knowledge of data science methods would be advantageous, it is not mandatory.
Grading¶
Student assessment will be based on 6 homework assignments (worth 60 pts), 3 quizzes (worth 30 marks), and class participation and engagement (worth 10 marks).
Lectures¶
- Lecture 6 - NumPy for Array Operations
- Lecture 7 - Data Manipulation with pandas
- 7.1 Introduction to
pandas
- 7.2 Importing Data and Summary Statistics
- 7.3 Rename, Index, and Slice
- 7.4 Creating New Columns, Reordering
- 7.5 Removing Columns and Rows
- 7.6 Merging DataFrames
- 7.7 Calculating Unique and Missing Values
- 7.8 Dealing With Missing Values: Boolean Indexing
- 7.9 Exporting A DataFrame to csv
- References
- 7.1 Introduction to
- Lecture 8 - Data Visualization with Matplotlib
- Lecture 9 - Data Visualization with Seaborn
- Lecture 10 - Statistical Data Analysis
- Lecture 11 - Databases and SQL
- 11.1 Introduction to SQL
- 11.2 Using SQLite with Python
- 11.3 Create a New Table
- 11.4 Database Example
- 11.5 Querying Databases with SELECT
- 11.6 Sorting Data with ORDER BY
- 11.7 Filtering Data
- 11.8 Conditional Expressions
- 11.9 Joining Multiple Tables
- 11.10 Return Data Statistics
- 11.11 Grouping Data
- 11.12 Modifying Data
- 11.13 Working with Tables
- 11.14 Constraints
- 11.15 Subqueries
- 11.16 Connect to an Existing Database
- References
- Lecture 12 - Data Exploration and Preprocessing
- Lecture 13 - Scikit-Learn Library for Data Science
- 13.1 Introduction to Scikit-Learn
- 13.2 Supervised Learning: Classification
- 13.3 Supervised Learning: Regression
- 13.4 Unsupervised Learning: Clustering
- 13.5 Hyperparameter Tuning
- 13.6 Cross-Validation
- 13.7 Performance Metrics
- 13.8 Model Pipelines
- 13.9 Flow Chart: How to Choose an Estimator
- Appendix
- References
- Lecture 14 - Ensemble Methods
- Lecture 15 - Artificial Neural Networks with Keras-TensorFlow
- Lecture 16 - Convolutional Neural Networks with Keras-TensorFlow
- Lecture 17 - Model Selection, Hyperparameter Tuning
- Lecture 18 - Neural Networks with PyTorch
- Lecture 19 - Natural Language Processing
- Lecture 20 - Transformer Networks
- Lecture 21 - NLP with Hugging Face
- Lecture 22 - Large Language Models
- Tutorial 1 - Working with Jupyter Notebooks
- Tutorial 2 - Terminal and Command Line
- Tutorial 3 - Python IDEs, VS Code
- Tutorial 4 - Virtual Environments
- Tutorial 5 - Web Scraping
- Tutorial 6 - Google Colab
- Tutorial 7- Image Processing with Python
- Tutorial 8 - TensorFlow
- Tutorial 9 - PyTorch
- Tutorial 10 - Bash Scripting
- Tutorial 11 - GitHub
- Tutorial 12 - TensorFlow Serving