P3
Project Idea: Auto-Adaptive Fine-tuning for Jac byLLM using RPG Game Generation#
This project focuses on creating an intelligent byLLM system that automatically collects training data from large model interactions and continuously fine-tunes smaller, local models to replace them. Using RPG game level generation as the primary use case, the system will learn to replicate the performance of large models with efficient, locally-running alternatives.
Problem Statement#
Current byLLM implementations rely heavily on expensive, cloud-based large language models for complex structured data generation tasks. This creates several challenges:
- High API Costs: Continuous use of large models for repetitive structured generation tasks becomes prohibitively expensive
- Latency Issues: Network calls to external APIs introduce delays in real-time applications
- Dependency Risk: Applications become dependent on external service availability and pricing changes
Proposed Solution#
-
Dataset Generation Using Large Models
-
Complex Use Case Selection: Focus on RPG game level generation as a representative complex structured data generation task that requires:
- Understanding of game mechanics and spatial relationships
- Generation of valid, interconnected game objects (players, enemies, terrain)
- Adherence to strict structural constraints and playability rules
-
Automated Game Level Generation Pipeline:
- Implement Jac abilities that use large models (GPT-4, Claude, Gemini Pro) via
by <llm>
calls to generate complete RPG game levels - Execute multiple generation runs to create a diverse corpus of level designs
- Each generation includes the original prompt context and the structured level output from
by <llm>
call sites
- Implement Jac abilities that use large models (GPT-4, Claude, Gemini Pro) via
-
Automated Evaluation and Filtering (Optional):
- Develop generalized automated evaluation methodologies that can assess output quality across different structured generation tasks
- For RPG levels specifically, implement optional filtering that validates:
- Presence of required player spawn points
- Inclusion of appropriate enemy placements
- Reachability analysis ensuring all areas are accessible
- Structural integrity of the level layout
- The system should support both filtered high-quality datasets and unfiltered comprehensive datasets depending on use case requirements
-
-
Small LLM Training Pipeline
-
Model Selection: Use TinyLLaMA as the target model for fine-tuning due to its balance of capability and efficiency for local deployment.
-
Training Methodology:
- Implement LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) techniques for parameter-efficient fine-tuning
- Utilize the dataset of RPG level generation examples from
by <llm>
interactions - Apply quantization techniques to further optimize model size and inference speed
-
-
byLLM Plugin Integration
-
Automatic Data Collection:
- Integrate data collection capabilities directly into the byLLM plugin
- Automatically capture prompt-response pairs from successful large model
by <llm>
interactions - Continuously expand the training dataset in the background
-
Dynamic Model Training:
- Implement background training processes that periodically fine-tune local models using accumulated data
- Schedule training during low-usage periods to minimize system impact
-
Intelligent Model Switching:
- Develop logic to automatically switch from large models to fine-tuned local models when confidence thresholds are met
- Implement per-
by <llm>
-call-site model caching, allowing different call locations to use specialized models
-
Model Persistence:
- Cache and save trained models specific to each
by <llm>
call site - Enable model versioning and rollback capabilities
- Cache and save trained models specific to each
-
-
Evaluation Framework
- Manual Correctness Assessment:
- Establish systematic manual evaluation protocols for generated RPG levels
- Create scoring rubrics that assess:
- Gameplay viability and fun factor
- Structural correctness and consistency
- Adherence to specified constraints and themes
- Compare performance between large model outputs and fine-tuned local model outputs from respective
by <llm>
calls
- Manual Correctness Assessment:
This approach transforms byLLM from a static large-model-dependent system into an intelligent, self-improving platform that automatically optimizes for cost, speed, and performance while maintaining output quality across all by <llm>
interactions.