Skip to content

CorridorKey: LLM Handover Guide

Welcome, fellow AI Assistant! You are picking up a highly specialized computer vision project called CorridorKey, an AI Chroma Keying engine designed for professional VFX pipelines.

This document is your technical entry point. It outlines the architecture, dataflow, design decisions, and common pitfalls of this codebase to help you assist the human user effectively.


1. Project Overview & Architecture

CorridorKey is a neural-network-based green screen removal tool. It takes an RGB image and a "Coarse Alpha Hint" (generated by the user with a rough chroma key or AI roto, or by utilizing the GVM or VideoMaMa modules) and produces mathematically perfect, physically unmixed Alpha and Foreground Straight color, with the greenscreen unmixed from semi-transparent pixels.

Core Architecture (The GreenFormer): * Backbone: A timm implementation of hiera_base_plus_224.mae_in1k_ft_in1k. * Input Modification: We patched the first layer to accept 4 channels (RGB + Coarse Alpha Hint). * Decoders: Multiscale feature fusion heads that predict "Coarse" Alpha (1ch) and Foreground (3ch) logits. * Refiner (CNNRefinerModule): A custom CNN head (dilated residual blocks) that takes the original RGB input and the Coarse predictions, outputting purely additive "Delta Logits" that are applied directly to the backbone's outputs before final Sigmoid activation.

Key Files: * CorridorKeyModule/core/model_transformer.py: The PyTorch architecture described above. * CorridorKeyModule/inference_engine.py: The CorridorKeyEngine class. It loads the CorridorKey.pth weights and handles the resizing API. * CorridorKeyModule/core/color_utils.py: Pure math functions for digital compositing. Crucial: Pay attention to srgb_to_linear(), premultiply(), and luminance-preserving despill(). * clip_manager.py: The user-facing Command Line Wizard. It handles scanning directories, prompting the user for inference settings, and piping data into the engine.


2. Critical Dataflow Properties (Do Not Break These)

The biggest challenge in this codebase revolves around Color Space and Gamma Math. When assisting the user with compositing bugs, check these rules first:

  1. Model Input/Output is strictly [0.0, 1.0] Float Tensors.
    • The model assumes inputs are sRGB.
    • The predicted Output Foreground (res['fg']) is natively sRGB and the model is currently trained to predict the un-multiplied straight color fg element.
    • The predicted Output Alpha (res['alpha']) is inherently Linear.
  2. EXR Handling (Processed Output pass):
    • EXRs are stored as Linear float data, premultiplied.
    • To build the Processed EXR, we take the sRGB foreground, pass it through cu.srgb_to_linear(), premultiply it by the Linear Alpha, pack them, and save them via OpenCV in cv2.IMWRITE_EXR_TYPE_HALF.
    • Bug History: Do not apply a pure mathematical Gamma 2.2 curve; use the piecewise real sRGB transfer functions defined in color_utils.py.
  3. Inference Resizing (img_size):
    • The engine is strictly trained on 2048x2048 crops.
    • In inference_engine.py, the process_frame() method uses OpenCV (Lanczos4) to upscale/downscale the user's arbitrary input resolution to 2048x2048, feeds the model, and then resizes the predictions back to the original resolution.

3. The Inference Pipeline (clip_manager.py)

Users generally run the system via local shell launcher scripts (CorridorKey_DRAG_CLIPS_HERE_local.bat or CorridorKey_DRAG_CLIPS_HERE_local.sh) which boot the clip_manager.py wizard.

The pipeline works as follows: 1. Scan: Looks for folders (or dragged-and-dropped paths) containing an Input sequence (RGB) and an AlphaHint sequence (BW). 2. Config: Prompts the user for settings (Gamma space, Despill strength, Auto-Despeckle threshold, Refiner Strength). 3. Execution: Loops frame-by-frame, passing [H, W, 3] Numpy arrays to engine.process_frame(). 4. Export: * FG directory: Half-float EXR, RGB (sRGB — the model predicts straight FG in sRGB; convert to linear before compositing). * Matte directory: Half-float EXR, Grayscale (Linear). * Processed directory: Half-float EXR, RGBA (Linear, Premultiplied). * Comp directory: 8-bit PNG (sRGB composite over a checkerboard, for quick preview).


4. Helpful Pointers for Future Work

  • Training Code: We deliberately stripped training-specific logic (like returning coarse logits, .detach() gradients, gradient checkpointing) out of the inference tool. these are built out in a separate program. To keep model_transformer.py pristine for inference speed. If the user wants to resume training the Hiera backbone, utilize the Corridor Key trainer (coming soon, maybe?).
  • PointRend: You may see "PointRend" mentioned in old commit messages. It was entirely replaced by the CNN Refiner.
  • GVM / VideoMaMa: There are sub-modules for generating the Coarse Alpha Hints. clip_manager.py: --action generate_alphas handles piping footage into these external repos.

5. Directives for the AI

  • Be Proactive: The user is highly technical (a VFX professional/coder). Skip basic tutorials and dive straight into advanced implementation, but be sure to document math thoroughly.
  • Prioritize Performance: This is video processing. Every .numpy() transfer or cv2.resize matters in a loop running on 4K footage.
  • Verify Gamma: If the user complains about "crushed shadows" or "dark fringes", the problem is almost certainly an sRGB-to-Linear conversion step happening in the wrong order inside color_utils.py.

Good luck, and build cool tools!