Hugging Face：Blog（RSS）

精选65

Job Searcher

2026-06-06 23:36·26天前

精选理由

这个 hackathon 项目把教师蒸馏和 LoRA 微调 8B 模型的流程全部开源在 HF 上，做模型定制和部署的开发者能直接抄作业，尤其是推理部署踩的坑（ZeroGPU 上下文重用）很实用。

AI 摘要

Hugging Face 发布 Job Searcher，一个基于 AI 的求职搜索工具。用户上传简历并设定偏好后，系统使用教师模型 DeepSeek V4 Pro 生成 LinkedIn 搜索查询，通过 JobSpy 抓取职位，再对学生模型 Qwen3-8B（8B 参数）进行 LoRA 微调，对每个职位从技能匹配、经验相关性、教育背景、行业领域契合度和资历对齐五个维度给出评分和推理。训练在 Modal 平台单张 A100 上完成。推理部署于 Hugging Face ZeroGPU Space，使用 llama.cpp 实现流式输出。项目开源。

原文 · 未翻译

Job Searcher

Published June 6, 2026

Emre

emrekuruu

build-small-hackathon

Job hunting as a new grad is a full-time job by itself. You sift through hundreds of postings every week to find a handful worth applying to. You click "Easy Apply" until your eyes hurt. You write the same cover letter forty times. By month two of a search, you're applying to roles you wouldn't take, in industries you don't care about, because at that point the cost of thinking about each listing is higher than the cost of submitting to one.

Watch the short tour: drop a resume, watch the queries stream, read the per-job reasoning.

How it works

A run has three steps.

Queries. The student reads the resume and the preferences you set (job type, work modality, location, free-form notes) and drafts a small set of LinkedIn-shaped search queries, reasoning out loud as it goes.
Search. Those queries hit LinkedIn through JobSpy, one at a time.
Scoring. For each posting, the model reads the (resume, job) pair and writes a five-dimension fit score:
- skills match
- experience relevance
- education and certifications
- industry / domain fit
- seniority alignment

Figure 1. End-to-end steps of the framework.

What you get back isn't a list of fifty roles. It's a small shortlist with defensible reasoning. You can read why the model thinks the second-ranked job beats the third.

Technical Details

Dataset Curation - The teacher and the student

The teacher is DeepSeek V4 Pro. Strong at structured reasoning, willing to follow a strict output schema, cheap enough to run once over a large corpus offline. It is used as a label generator, not as an inference-time dependency.

The student is Qwen3-8B. Small enough to fit on a single ZeroGPU slice once quantized to Q4_K_M, large enough to absorb the teacher's structured judgement.

The corpus came from a closed loop, resume-aware end-to-end:

Resumes. 2,500, built on Divyaamith/Kaggle-Resume.
Queries. The teacher first drafted LinkedIn-shaped search queries from each resume.
Jobs. JobSpy then scraped LinkedIn for what those queries actually returned. About 10,000 postings, every one of them surfaced by a query the teacher itself wrote for that specific resume.
Labels. The teacher then scored every resulting (resume, job) pair across the same five dimensions used at inference, with one sentence of reasoning per dimension.

Everything ships in four foreign-key-clean configs at build-small-hackathon/job-search-distill.

Training (Modal)

Two LoRA SFT runs on a single A100 via Modal, one per task:

Adapter. Rank 16, alpha 16, dropout off, attention plus MLP projections.
Schedule. One epoch per task. Mid-epoch checkpoints every 200 steps so a partial run could be sanity-checked before the full one finished.
Output. Safetensors at build-small-hackathon/job-searcher-qwen3-8B, and a Q4_K_M base plus LoRA-GGUF sidecars at build-small-hackathon/job-searcher-qwen3-8B-gguf for the llama.cpp serving path.

LoraConfig(
    r=16,
    lora_alpha=16,
    task_type="CAUSAL_LM",
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
)

The Space - Inference (llama.cpp)

The Space runs llama-cpp-python with the pre-built CUDA wheel on a HuggingFace ZeroGPU Space. Two design choices that matter:

Llama inside @spaces.GPU. ZeroGPU recycles the CUDA context per call, so a module-level instance would hold a dead context on the second use.
One GPU call per submission, not per job. All fit evaluations for one submission run inside a single @spaces.GPU call. The model loads once and yields events for every job, instead of paying a fresh cold start and a fresh proxy-token request per posting.

Streaming uses the OpenAI-shaped create_chat_completion(stream=True) so the reasoning lands in the UI token by token. The live demo is at build-small-hackathon/job-search-assistant.

The traces

The entire Claude Code session that built this Space is published as an HuggingFace agent-traces dataset at build-small-hackathon/job-search-assistant-agent-trace. Raw JSONL events, native HuggingFace trace viewer, every dead end and recovery on the record. Useful if you want to see how this thing actually came together rather than read the cleaned-up version of it.

Try it

Drop your resume at huggingface.co/spaces/build-small-hackathon/job-search-assistant. Stop sifting.

What I learned

Two adapters beat one. I tried folding query generation and fit evaluation into a single LoRA. The model leaked formatting both ways, JSON on the query task and prose on the eval. Splitting them into two heads on the same base, hot-swapped per call, killed the whole class of bugs.

The teacher's prompt mattered more than the student's size. Rewriting the teacher's labelling prompt to score against specific resume details ("four years of Rust; the role asks for five" instead of "strong technical match") propagated through distillation. The student picked up the same habit.

Models mentioned in this article 2

Datasets mentioned in this article 3

Spaces mentioned in this article 1

Community

ThetMyoe

6 days ago

it says paused

emrekuruu

Article author 6 days ago

Spaces gets put to sleep if they are not used for a while. Feel free to restart and give it a try :)

Please let me know if there are any issues or questions about the project !

· or to comment

Models mentioned in this article 2

Datasets mentioned in this article 3

Spaces mentioned in this article 1

Hugging Face：Blog（RSS）

精选65导出 Markdown

Job Searcher

2026-06-06 23:36·26天前

阅读原文· huggingface.co

精选理由

AI 摘要

原文 · 保持原样，未翻译

Job Searcher

Published June 6, 2026

Emre

emrekuruu

build-small-hackathon

Watch the short tour: drop a resume, watch the queries stream, read the per-job reasoning.

How it works

A run has three steps.

Queries. The student reads the resume and the preferences you set (job type, work modality, location, free-form notes) and drafts a small set of LinkedIn-shaped search queries, reasoning out loud as it goes.
Search. Those queries hit LinkedIn through JobSpy, one at a time.
Scoring. For each posting, the model reads the (resume, job) pair and writes a five-dimension fit score:
- skills match
- experience relevance
- education and certifications
- industry / domain fit
- seniority alignment

Figure 1. End-to-end steps of the framework.

What you get back isn't a list of fifty roles. It's a small shortlist with defensible reasoning. You can read why the model thinks the second-ranked job beats the third.

Technical Details

Dataset Curation - The teacher and the student

The student is Qwen3-8B. Small enough to fit on a single ZeroGPU slice once quantized to Q4_K_M, large enough to absorb the teacher's structured judgement.

The corpus came from a closed loop, resume-aware end-to-end:

Resumes. 2,500, built on Divyaamith/Kaggle-Resume.
Queries. The teacher first drafted LinkedIn-shaped search queries from each resume.
Jobs. JobSpy then scraped LinkedIn for what those queries actually returned. About 10,000 postings, every one of them surfaced by a query the teacher itself wrote for that specific resume.
Labels. The teacher then scored every resulting (resume, job) pair across the same five dimensions used at inference, with one sentence of reasoning per dimension.

Everything ships in four foreign-key-clean configs at build-small-hackathon/job-search-distill.

Training (Modal)

Two LoRA SFT runs on a single A100 via Modal, one per task:

Adapter. Rank 16, alpha 16, dropout off, attention plus MLP projections.
Schedule. One epoch per task. Mid-epoch checkpoints every 200 steps so a partial run could be sanity-checked before the full one finished.
Output. Safetensors at build-small-hackathon/job-searcher-qwen3-8B, and a Q4_K_M base plus LoRA-GGUF sidecars at build-small-hackathon/job-searcher-qwen3-8B-gguf for the llama.cpp serving path.

LoraConfig(
    r=16,
    lora_alpha=16,
    task_type="CAUSAL_LM",
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
)

The Space - Inference (llama.cpp)

The Space runs llama-cpp-python with the pre-built CUDA wheel on a HuggingFace ZeroGPU Space. Two design choices that matter:

Llama inside @spaces.GPU. ZeroGPU recycles the CUDA context per call, so a module-level instance would hold a dead context on the second use.
One GPU call per submission, not per job. All fit evaluations for one submission run inside a single @spaces.GPU call. The model loads once and yields events for every job, instead of paying a fresh cold start and a fresh proxy-token request per posting.

Streaming uses the OpenAI-shaped create_chat_completion(stream=True) so the reasoning lands in the UI token by token. The live demo is at build-small-hackathon/job-search-assistant.