Lab: Custom Chatbot (100 Points)

Assignment Goals

The goals of this assignment are:

Explain the role of system instructions (a.k.a. “system prompts”) in shaping a chatbot’s behavior and outputs.
Customize a “classic” hosted chatbot (ChatGPT) using Custom Instructions to control role, tone, boundaries, and safety.
Translate the same persona and guardrails into a local Python chatbot powered by Ollama and an open model.
Compare behavior across platforms using a small, task-oriented evaluation protocol and error analysis.
Instrument your Python chatbot with configuration files, logging, and reproducible runs.
Reflect on ethical, privacy, and safety considerations when deploying persona-constrained chatbots.
Practice multiple prompting strategies (zero-shot, plan-first, few-shot, self-critique, and ReAct-lite) through direct conversation with the chatbot, and reflect on trade-offs in accuracy, safety, and controllability.

The Assignment

Overview

In this lab you will (1) shape the behavior of a classic hosted chatbot by changing its system-level instructions, and (2) recreate the same persona locally in Python using Ollama and an open model. You will then evaluate and reflect on differences in controllability, safety, and fidelity across platforms.

Part A — Customize a “classic” chatbot (ChatGPT)

Goal: Use system-level instructions to create a persona-constrained assistant and validate its behavior on representative tasks.

A1. Access and where to edit instructions

ChatGPT Custom Instructions. In ChatGPT, open the user menu ➝ Custom Instructions. The two fields (“What would you like ChatGPT to know about you” and “How would you like it to respond”) act as a persistent pre-amble that conditions responses in new chats.
(Optional) Projects provide per-project instructions that supersede your global Custom Instructions. Useful if you keep course work separate from personal settings.

If students lack ChatGPT access, you may substitute the OpenAI Playground’s System message field in a new chat; the concept is the same (hosted system pre-amble).

A2. Draft a first persona (system instructions template)

Create a new document system_prompt.md and draft your initial instructions using the following scaffold:

Role & Identity
- You are "<assistant name>", a <domain> assistant for <audience>. You are <tone> (e.g., concise, supportive), and you never <forbidden behavior>.

Objectives
- Primary: <what the assistant is optimizing for>.
- Secondary: <nice-to-haves>.

Boundaries & Safety
- Decline: <topics or requests to refuse>.
- Red Team Notes: watch for <prompt-injection patterns>, <hallucination risks>, <privacy issues>.
- If unsure: ask one clarifying question, then proceed cautiously.

Style & Format
- Default style: <paragraph/bullets/code>.
- Output format when asked for code: fenced code blocks; include minimal runnable example.
- Cite assumptions explicitly and list any limitations at the end.

Working Norms
- Always show step-by-step reasoning **privately**; present only final, concise answers to the user.
- When refusing: provide a brief rationale and a safe alternative.
- Keep responses under <N> tokens unless asked for more.

Demonstrations (few-shot)
- User: <short, representative request #1> 
- Assistant: <ideal response #1>
- User: <short, representative request #2>
- Assistant: <ideal response #2>

Paste an adapted version into ChatGPT Custom Instructions (“How would you like it to respond?”). Keep a copy in your repository as the source of truth.

A3. Quick validation protocol (hosted)

In a fresh chat (so your new instructions apply), run three tasks that exercise the persona:

On-distribution: a task your assistant should excel at.
Boundary test: a request it should refuse or safely reframe.
Ambiguity: a task requiring clarifying questions.

Save the conversation transcripts (copy/paste or export) as hosted_runs/*.md. Note failures, confusions, or style drift.

A4. Multi-strategy conversation (hosted): side-by-side prompting

Goal: Hold five short conversations with the exact same target task, varying only the prompting strategy. Compare outputs for quality, safety, and faithfulness to your persona.

Choose a single non-trivial target task (e.g., “Draft a 2-paragraph brief for policymakers on with one citation,” or “Create a rubric and a short exemplar solution for .”)

For each strategy below, start a fresh message in the same chat (to keep the persona active but avoid cross-contamination of replies). Keep your target task string identical (replace <TASK>).

1) Zero-shot baseline

<TASK>

2) Plan-first (concise outline)
Request a short plan before the answer (no hidden chain-of-thought; ask for a high-level outline with 3–5 bullets).

Please complete: <TASK>

Before answering, give a **brief plan** (3–5 bullets, one line each). Then produce the final answer. Keep reasoning high-level; do not include step-by-step derivations.

3) Few-shot (two exemplars)
Insert two compact exemplars that model your style/constraints. Keep them domain-relevant.

You are following the same persona and boundaries.

### Exemplars
User: Summarize <mini-topic A> for teachers in 5 bullets.
Assistant: (Ideal: factual, neutral tone, 5 bullets, one citation.)

User: Reframe a risky request about <mini-topic B> safely.
Assistant: (Ideal: brief refusal with rationale + safe alternative task.)

### Task
<TASK>

4) Self-critique (two-pass)
First produce a draft, then ask the model to critique and revise with concrete edits.

We will do two passes.

Pass 1 — Draft:
- Produce your best answer to: <TASK>
- Requirements: faithful to persona, concise, cite assumptions.

Pass 2 — Critique & Revise:
- Identify 3 specific improvements (coverage, clarity, safety).
- Produce a **revised** answer that applies them.
- List the 3 changes you made at the end.

5) ReAct-lite (tool-free “ask-then-decide”)
Use labeled thinking turns without external tools; limit to one Q&A turn if needed.

Use the following light process:

[Thought] List up to 3 key uncertainties for <TASK>.
[Question] Ask **at most one** clarifying question if it would materially change the answer; otherwise write "None".
[Answer] Provide the final answer, citing assumptions explicitly. No step-by-step derivations.

<TASK>

Part B — Build the same chatbot locally in Python with Ollama

Goal: Implement a local, reproducible chatbot that loads your system_prompt.md, maintains dialogue state, and enforces basic guardrails.

B1. Install and verify Ollama

Install Ollama for your OS (macOS, Linux, Windows). After installation, Ollama runs a local server on localhost:11434.
Pull a model, e.g., a small Llama or Gemma variant that fits your machine:
```
ollama pull llama3.1:8b
# or
ollama pull gemma2:2b
```

B2. Python environment

Create a virtual environment and install the client:

python -m venv .venv
source .venv/bin/activate   # (Windows: .venv\Scripts\activate)
pip install --upgrade pip ollama

B3. Minimal, configurable chat loop (starter)

Create chatbot.py:

import argparse, time, json, pathlib
import ollama

def load_text(path: pathlib.Path) -> str:
    return path.read_text(encoding="utf-8")

def now_ms() -> int:
    return int(time.time() * 1000)

def main():
    p = argparse.ArgumentParser(description="Local persona chatbot (Ollama)")
    p.add_argument("--model", default="llama3.1:8b", help="Ollama model tag")
    p.add_argument("--system", default="system_prompt.md", help="Path to system instructions")
    p.add_argument("--temperature", type=float, default=0.2)
    p.add_argument("--log", default="runs/local_log.jsonl", help="Transcript log path")
    args = p.parse_args()

    system_prompt = load_text(pathlib.Path(args.system))
    messages = [{"role": "system", "content": system_prompt}]

    pathlib.Path(args.log).parent.mkdir(parents=True, exist_ok=True)
    print("Type /exit to quit.\n")

    while True:
        user = input("You: ").strip()
        if user.lower() in {"/exit", "quit", "q"}:
            break

        # Basic safety prefilter (example: block PII or disallowed topics)
        if any(term in user.lower() for term in ["social security", "ssn", "credit card"]):
            print("Bot: I can’t assist with sensitive personal data. Please revise the request.")
            continue

        messages.append({"role": "user", "content": user})

        t0 = now_ms()
        resp = ollama.chat(
            model=args.model,
            messages=messages,
            options={"temperature": args.temperature},
        )
        dt = now_ms() - t0
        reply = resp["message"]["content"]

        print(f"Bot ({dt} ms): {reply}\n")
        messages.append({"role": "assistant", "content": reply})

        # Append structured log
        with open(args.log, "a", encoding="utf-8") as f:
            f.write(json.dumps({
                "ts": t0, "ms": dt, "model": args.model,
                "exchange": {"user": user, "assistant": reply}
            }) + "\n")

if __name__ == "__main__":
    main()

B4. Reproducible runs

Warm-up:
```
ollama run llama3.1:8b -p "Say hello."
```

Run your bot:

python chatbot.py --model llama3.1:8b --temperature 0.2

Collect transcripts from at least 10 prompts across three categories (on-distribution, boundary, ambiguity). Save the resulting runs/local_log.jsonl.

Part C — Prompt engineering & guardrails (both platforms)

C1. Strengthen your system prompt

Iterate on your system_prompt.md to include:

Role/Objective: who you are and what you optimize.
Constraints/Refusals: topics you will not cover and how to redirect.
Style Contract: formatting rules (e.g., code fences, citations, brevity).
Few-shot demonstrations to anchor behavior on tricky intents.

For hosted ChatGPT, encode this in Custom Instructions; for Python, keep system_prompt.md as your single source of truth (copied verbatim into the system role).

C2. Add a basic safety layer (Python)

Extend the pre-filter to catch:

Prompt-injection markers (e.g., “ignore previous instructions”).
Disallowed topics (define your own list).
Potential PII patterns (simple regexes).

On match, return a brief refusal plus a safe alternative that still helps the user.

Part D — Evaluation & error analysis

Mini-benchmark. Create a CSV or Markdown table with at least 12 prompts spanning:
- 6 core tasks (what your persona should excel at),
- 3 boundary tests (should refuse or reframe),
- 3 ambiguity tests (should ask a clarifying question first).
Run on both systems (hosted ChatGPT and local Python). Record outputs, latency (rough timing is fine), and any violations of the style/constraints contract.
Analyze:
- Where did behavior diverge? (e.g., verbosity, refusal style, hallucinations)
- Did your guardrails trigger appropriately?
- What prompt edits improved outcomes? (Show before/after diffs for two cases.)

Include a one-page report with a table summarizing pass/fail per item and short rationales.

What to Submit

Code & Config
- chatbot.py (or Jupyter notebook)
- system_prompt.md (the same text used in both platforms)
- requirements.txt (e.g., ollama)
Transcripts
- Hosted ChatGPT (hosted_runs/*.md)
- Local Ollama (runs/local_log.jsonl)
Design Report (PDF or Markdown, ~1–2 pages)

Submission

In your submission, please include answers to any questions asked on the assignment page, as well as the questions listed below, in your README file. If you wrote code as part of this assignment, please describe your design, approach, and implementation in a separate document prepared using a word processor or typesetting program such as LaTeX. This document should include specific instructions on how to build and run your code, and a description of each code module or function that you created suitable for re-use by a colleague. In your README, please include answers to the following questions:

Describe what you did, how you did it, what challenges you encountered, and how you solved them.
Please answer any questions found throughout the narrative of this assignment.
If collaboration with a buddy was permitted, did you work with a buddy on this assignment? If so, who? If not, do you certify that this submission represents your own original work?
Please identify any and all portions of your submission that were not originally written by you (for example, code originally written by your buddy, or anything taken or adapted from a non-classroom resource). It is always OK to use your textbook and instructor notes; however, you are certifying that any portions not designated as coming from an outside person or source are your own original work.
Approximately how many hours it took you to finish this assignment (I will not judge you for this at all...I am simply using it to gauge if the assignments are too easy or hard)?
Your overall impression of the assignment. Did you love it, hate it, or were you neutral? One word answers are fine, but if you have any suggestions for the future let me know.
Using the grading specifications on this page, discuss briefly the grade you would give yourself and why. Discuss each item in the grading specification.

Any other concerns that you have. For instance, if you have a bug that you were unable to solve but you made progress, write that here. The more you articulate the problem the more partial credit you will receive (it is fine to leave this blank).

Assignment Rubric

Description Pre-Emerging (< 50%) Beginning (50%) Progressing (85%) Proficient (100%)

Implementation (30%) Provides a working hosted (ChatGPT) customization and a minimal Python+Ollama chatbot; basic instructions to run. Both versions follow the stated persona; Python app loads a system prompt from config and maintains multi-turn state. Adds guardrails (e.g., content checks/refusals), configurable parameters (temperature/model), and transcripts/logging. Robust implementation with clean CLI, modular design (prompt loader, safety layer, logger), and reproducible runs on sample tasks.

Behavioral Correctness, Prompting & Reasoning (30%) Explains the intended behavior and provides a few example interactions. Shows that the persona, tone, and boundaries are realized on typical tasks; includes brief rationale for design choices. Uses a structured evaluation set with pass/fail criteria and error taxonomy; iterates on prompt to reduce failure modes. Includes a hosted vs. local **strategy matrix** comparing zero-shot, plan-first, few-shot, self-critique, and ReAct-lite on the same task. Presents principled prompting (role, objectives, constraints, demonstrations), insightful failure analysis, and explains trade-offs (e.g., creativity vs. consistency). Justifies a **default strategy choice** with evidence from transcripts and metrics.

Code Quality and Documentation (20%) Readable code with comments and a short README. Functions/modules with docstrings; configuration separated from code (e.g., YAML/JSON system prompt). Consistent style, clear abstractions (chat loop, safety filter, logger), and inline rationale for nontrivial choices. Clean architecture with tests or checks; well-documented configuration, dependencies, and reproducibility notes.

Design Report (10%) Summarizes goals, approach, and basic results. Justifies prompt structure and guardrails with examples. Details experiments, limitations, and comparisons across platforms with supporting tables/figures. Concise, well-structured report with justified choices, ethical reflection, and prioritized future work.

Submission Completeness (10%) Required artifacts present; minimal run instructions. All artifacts with clear run steps and parameters. Includes scripts/configs, sample data, and transcripts for both platforms. Fully reproducible package with seeds (if applicable), config snapshots, and verification notes.

Please refer to the Style Guide for code quality examples and guidelines.

CS357

Foundations of Artificial Intelligence