
题外话:最近身体不太舒服,之前一直硬扛着,后来去医院检查后发现必须调整生活方式,于是踏实休息了一阵。现在状态已经稳定下来,继续恢复学习节奏。也希望大家都多注意身体,不舒服千万别硬扛,及时检查、及时休息。
项目位置:
data/课程练习/RAG技术与应用/disney_help_rag.py目标:把“迪士尼知识库”做成可运行的 双索引 RAG(文本向量 + 图片向量),并在交互模式下默认接入 百炼全模态大模型(MultiModalConversation)。

text-embedding-v1)→ LangChain FAISS(L2);IndexFlatL2(L2);rank / l2_distance / metadata,便于后续拼 prompt 与路由。build_disney_rag_prompt 把文本 chunk 与图片(OCR + path)统一整理为结构化背景,且做 max_context_chars 截断。python disney_help_rag.py 无子命令时进入 ask,并 **默认 use_llm=True**,即:每轮先检索、再调用百炼全模态生成回答;可用 --no-use-llm 只检索。MultiModalConversation 的 messages[].content = [{"text":...}, {"image":...}, ...] 传入:text 块)image 块)doc_file_utils 提供:export_doc_and_docx_to_markdown、export_parsed_markdown_chunks_for_doc_paths、chunks_json_dir_to_faiss_chunks。source_file / chunk_id / department / update_time)。image_file_utils.image_to_text(Tesseract 风格)。openai/clip-vit-base-patch32)提取 512 维视觉投影向量。ocr_text 与 path_raw(后续给全模态模型附图/定位用)。text-embedding-v1)。FAISS.save_local 产物(index.faiss + index.pkl)。FAISS.similarity_search_with_score 返回 (Document, L2 distance)。faiss.IndexFlatL2(dim),落盘 images.index.faiss + images.index.pkl。index.search(query_vec, k) → l2_distance + metadata_store[idx] 映射。_retrieve_dual_async 使用 asyncio.gather + asyncio.to_thread 并发:_gate_image_by_vector_a_route_need_image_local_c + _tokenize_zh_simple_text_suggests_need_imageto_thread,避免主线程卡住交互。bailian_omni_multimodal.call_omni_multimodal → dashscope.MultiModalConversation.call--chat-modelDASHSCOPE_OMNI_MODEL / DASHSCOPE_CHAT_MODELDEFAULT_OMNI_MODEL = "qwen3.5-omni-plus-2026-03-15"build_disney_rag_omni_user_message:只生成本轮 user(含 RAG 文本 + 附图)ask 交互:[system] + history_textual + [user_rag]omni_answer_text 优先从 output.choices[0].message.content(字符串或多块)提取文本。get_image_features 返回结构变化导致向量 shape 错ndim!=1 或出现 (seq, hidden) 这类 2D/3D shape。transformers 里 get_image_features() 可能返回 BaseModelOutputWithPooling,直接 out[0] 会取到 last_hidden_state,不是最终投影向量。torch.Tensor 直接用;pooler_output,没有则手动取 last_hidden_state[:,0,:] 再过投影层。KMP_DUPLICATE_LIB_OK=TRUE(工程上是折中方案,确保练习可跑)。status_code=403,message 包含 Access denied。qwen3.5-omni-plus-2026-03-15)调用权限。omni_answer_text 报错时附带 code 字段并给出排查/替代建议;--chat-model qwen-vl-plus 或 --no-use-llm。python disney_help_rag.py
python disney_help_rag.py ask --no-use-llm
python disney_help_rag.py ask --print-prompt
python disney_help_rag.py ask --chat-model qwen-vl-plus
flowchart TB
subgraph CLI["disney_help_rag.py(CLI)"]
MAIN["main()"]
PARSER["_build_cli_parser()"]
ASK["_cmd_ask() 默认入口"]
RETR["_retrieve_dual_async() 并发检索+路由"]
PROMPT["build_disney_rag_prompt()"]
OMNIU["build_disney_rag_omni_user_message()"]
CALL["generate_answer_with_dashscope()"]
end
subgraph TEXT["文本向量(DashScope + LangChain FAISS)"]
EMB["dashscope_embedding.get_dashscope_embeddings()"]
TSS["similarity_search_text_topk()"]
LFAISS["LangChain FAISS.load_local / similarity_search_with_score"]
end
subgraph IMG["图片向量(CLIP + 原生 FAISS)"]
CLIP["CLIP: get_text_features / get_image_features"]
ISS["similarity_search_image_topk_clip()"]
IFAISS["faiss.IndexFlatL2.search + metadata_store"]
OCR["image_to_text()"]
end
subgraph ROUTE["图片路由 A/C/D"]
A["_gate_image_by_vector_a()"]
C["_route_need_image_local_c() + _tokenize_zh_simple()"]
D["_text_suggests_need_image()"]
end
subgraph LLM["百炼全模态(DashScope MultiModalConversation)"]
OMNICALL["bailian_omni_multimodal.call_omni_multimodal()"]
SDK["dashscope.MultiModalConversation.call"]
PARSE["bailian_omni_multimodal.omni_answer_text()"]
end
MAIN --> PARSER --> ASK
ASK --> RETR
RETR --> TSS --> LFAISS
TSS --> EMB
RETR --> ISS --> IFAISS
ISS --> CLIP
IFAISS --> OCR
RETR --> A
RETR --> C
RETR --> D
ASK --> PROMPT
ASK --> OMNIU --> CALL --> OMNICALL --> SDK --> PARSE
sequenceDiagram
participant U as User
participant ASK as _cmd_ask
participant RET as _retrieve_dual_async
participant OMNI as MultiModalConversation
U->>ASK: 输入问题 q
ASK->>RET: 并发检索(文本Top-K + 图片预取 + 路由)
RET-->>ASK: text_hits + image_hits
ASK->>OMNI: messages = [system] + history_textual + [user_rag(text + images)]
OMNI-->>ASK: assistant answer
ASK-->>U: 输出回答
ASK->>ASK: history_textual += (user纯文本, assistant纯文本)
disney_help_rag.py(双路检索 + 全模态对话)调用关系图适用:
data/课程练习/RAG技术与应用/disney_help_rag.py特点:文本向量(DashScope embedding + LangChain FAISS) 与 图片向量(CLIP + 原生 FAISS) 两套索引;ask默认进入 百炼全模态对话(MultiModalConversation)。
flowchart TB
subgraph disney["disney_help_rag.py"]
MAIN["main()"]
PARSER["_build_cli_parser()"]
ASK["_cmd_ask() 默认入口"]
RETR["_retrieve_dual_async()"]
TR["similarity_search_text_topk()"]
IR["similarity_search_image_topk_clip()"]
ROUTE_A["_gate_image_by_vector_a()"]
ROUTE_C["_route_need_image_local_c()"]
ROUTE_D["_text_suggests_need_image()"]
PROMPT["build_disney_rag_prompt()"]
OMNIU["build_disney_rag_omni_user_message()"]
LLM["generate_answer_with_dashscope()"]
end
subgraph ds["DashScope / 百炼"]
EMB["dashscope_embedding.get_dashscope_embeddings()"]
OMNICALL["bailian_omni_multimodal.call_omni_multimodal()"]
SDK["dashscope.MultiModalConversation.call"]
PARSE["bailian_omni_multimodal.omni_answer_text()"]
end
subgraph stores["本地向量库"]
TXT["LangChain FAISS (index.faiss/index.pkl)"]
IMG["faiss.IndexFlatL2 (images.index.faiss/images.index.pkl)"]
end
subgraph models["本地模型/工具"]
CLIP["CLIP: get_text_features / get_image_features"]
OCR["image_to_text (OCR)"]
end
MAIN --> PARSER --> ASK
ASK --> RETR
RETR --> TR --> TXT
TR --> EMB
RETR --> IR --> IMG
IR --> CLIP
IMG --> OCR
RETR --> ROUTE_A
RETR --> ROUTE_C
RETR --> ROUTE_D
ASK --> PROMPT
ASK --> OMNIU --> LLM --> OMNICALL --> SDK --> PARSE
课程练习 RAG技术与应用 目录(含 disney_help_rag.py 等):
Cyning12/auto-gpt-work-demo · data/课程练习/RAG技术与应用