Project: Final Project (100 Points)

Purpose

This capstone project allows you to integrate technical skills in machine learning and responsible AI design with communication skills that ensure explainability, reproducibility, and ethical reasoning.

Assignment Goals

The goals of this assignment are:

Design, implement, and evaluate a complete machine learning or AI project using appropriate methodologies.
Justify the rationale for algorithm choice, model architecture, hyperparameters, normalization, and regularization.
Analyze and interpret results through explainable AI techniques such as SHAP, LIME, or PCA.
Communicate the technical and ethical aspects of the model to a lay audience with transparency and clarity.

Concepts

The key concepts to be developed include:

Model selection and tuning requires reasoned justification based on data properties and task objectives.
Normalization, regularization, and cross-validation are essential for generalization and fair performance evaluation.
Explainable AI techniques help bridge the gap between model complexity and human interpretability.
Effective communication of AI system behavior enhances trust and supports responsible deployment.

Tasks

Your tasks in this assignment are to:

Identify a dataset and define a meaningful prediction or classification problem.
Select and justify an appropriate learning algorithm or model family (e.g., regression, tree-based, neural network).
Discuss preprocessing, feature engineering, normalization, and regularization choices and their rationale.
Train, evaluate, and compare models using appropriate metrics and validation methods.
Use explainability techniques (e.g., SHAP values, PCA, feature importance) to interpret and communicate your model.
Prepare a technical report and presentation demonstrating the process, results, and implications.

The Assignment

Overview

The final project synthesizes all concepts explored throughout the term in machine learning, model design, and AI explainability. You will select a dataset and problem domain, construct a suitable model, and justify every methodological choice you make. You will then assess both quantitative performance and qualitative interpretability of your model.

Stage 1 — Project Proposal

Submit a one-page proposal describing:

The dataset and its source (including ethical or bias considerations).
The problem to be solved and why it matters.
Preliminary thoughts on the model, algorithm, and features you might use.
How you intend to evaluate and interpret model results.

Stage 2 — Model Development and Justification

Develop your model in an iterative, documented manner. Each major design decision should be motivated by data analysis or experimental reasoning.

Checklist:

Perform exploratory data analysis and justify preprocessing decisions.
Normalize or standardize features as appropriate, explaining why your choice matters for your algorithm.
Select your model and justify its form (linear, tree-based, neural network, etc.).
Describe hyperparameters and regularization strategies to prevent overfitting.
Evaluate the model using validation splits or cross-validation.

Stage 3 — Explainability and Usability

Demonstrate interpretability and usability by applying explainable AI methods. Your audience should understand how and why your model behaves as it does.

Possible techniques include:

Feature Importance Analysis (tree-based models)
Partial Dependence Plots
SHAP or LIME visualizations for feature contributions
Principal Component Analysis (PCA) for understanding data or latent structure

You must also explain your model’s predictions in non-technical terms suitable for an informed layperson, such as policymakers, educators, or end users. Discuss how explainability contributes to trust and responsible deployment.

Stage 4 — Evaluation and Reflection

You will assess the effectiveness of your approach using appropriate metrics (e.g., accuracy, F1, RMSE, AUC). Your report should compare multiple models or configurations and include a critical reflection on trade-offs between accuracy, interpretability, and ethical implications.

Stage 5 — Submission and Presentation

Deliverables:

Full project code and README with reproducible setup instructions.
A 3–5 page report covering data, modeling choices, evaluation, and explainability analysis.
A 10-minute presentation summarizing your project for a mixed technical/non-technical audience.
Supporting figures, tables, and visuals that clarify your results and reasoning.

Submission Rubric

See the rubric section in this assignment for the detailed evaluation breakdown. Each stage contributes proportionally to your final project score.

Submission

In your submission, please include answers to any questions asked on the assignment page, as well as the questions listed below, in your README file. If you wrote code as part of this assignment, please describe your design, approach, and implementation in a separate document prepared using a word processor or typesetting program such as LaTeX. This document should include specific instructions on how to build and run your code, and a description of each code module or function that you created suitable for re-use by a colleague. In your README, please include answers to the following questions:

Describe what you did, how you did it, what challenges you encountered, and how you solved them.
Please answer any questions found throughout the narrative of this assignment.
If collaboration with a buddy was permitted, did you work with a buddy on this assignment? If so, who? If not, do you certify that this submission represents your own original work?
Please identify any and all portions of your submission that were not originally written by you (for example, code originally written by your buddy, or anything taken or adapted from a non-classroom resource). It is always OK to use your textbook and instructor notes; however, you are certifying that any portions not designated as coming from an outside person or source are your own original work.
Approximately how many hours it took you to finish this assignment (I will not judge you for this at all...I am simply using it to gauge if the assignments are too easy or hard)?
Your overall impression of the assignment. Did you love it, hate it, or were you neutral? One word answers are fine, but if you have any suggestions for the future let me know.
Using the grading specifications on this page, discuss briefly the grade you would give yourself and why. Discuss each item in the grading specification.

Any other concerns that you have. For instance, if you have a bug that you were unable to solve but you made progress, write that here. The more you articulate the problem the more partial credit you will receive (it is fine to leave this blank).

Assignment Rubric

Description Pre-Emerging (< 50%) Beginning (50%) Progressing (85%) Proficient (100%)

Model Design and Rationale (30%) Selects a model without clear justification or parameter reasoning. Identifies a model type and describes basic parameters with limited rationale. Provides detailed justification for model selection and tuning choices aligned with the problem domain. Demonstrates expert reasoning for model architecture, parameters, and alternatives with evidence-based justification.

Evaluation and Analysis (25%) Reports raw performance metrics without evaluation methodology. Applies basic validation and discusses results in general terms. Uses appropriate evaluation metrics, cross-validation, and clear comparison among methods. Provides a comprehensive performance analysis, discusses trade-offs, and interprets statistical reliability.

Explainability and Communication (20%) Minimal or unclear explanation of model decisions or features. Uses basic interpretability methods but lacks connection to stakeholder understanding. Employs established explainability tools (e.g., SHAP, PCA) and interprets key findings clearly. Integrates explainability throughout, presenting results transparently for both expert and lay audiences.

Implementation and Technical Quality (15%) Provides incomplete or poorly documented implementation. Functional implementation with limited structure or testing. Implements well-organized, reproducible code with modular design and documentation. Delivers a robust, maintainable, and reproducible implementation with version control and testing.

Presentation and Report (10%) Summarizes results without structure or clarity. Provides basic overview with limited visual or narrative support. Produces a coherent technical report and slides with meaningful visuals and data narratives. Delivers a professional presentation and concise, well-structured report integrating visualizations, interpretations, and reflections.

Please refer to the Style Guide for code quality examples and guidelines.

CS477

Artificial Intelligence and Machine Learning