#Docling — blogs.social

Hugging Face Forums [Unofficial] @discuss.huggingface.co.web.brid.gy

Jun 3

Is it possible to create a Résumé parser using a Huggingface model?

For now, version as of June 2026… This probably changes quite a bit depending on whether you specifically need to make it work with LayoutLMv3, or whether any model/tool is acceptable as long as the resume parsing goal is met:

TL;DR

I would split this into two different tracks:

If you specifically need LayoutLMv3 , the main problem is not just “which model?” or “which…

Read more →

Hugging Face Forums [Unofficial] @discuss.huggingface.co.web.brid.gy

Jun 2

Creating a New App for Making BRD

For now, it looks like many relevant building blocks already exist, but there still seems to be plenty of room:

I think this is a strong idea, especially because you are focusing on existing-process transformation rather than generic document generation.

A BRD is usually not hard because of the writing itself. It is hard because the source material is messy, incomplete, inconsistent,…

Read more →

t01 KI-Journal @hello.t01.li.ap.brid.gy

May 22

DSGVO-konforme KI im Mittelstand: 40 Prozent lokal automatisierbar – oder Schnapsidee?

Wo sonst als unter der Dusche, ist mir vor meinem inneren Auge eine Zahl erschienen: 40 Prozent. Das, dachte ich, sei der Anteil der Geschäftsprozesse in einem typischen kleinen oder mittleren Unternehmen, der sich mit gezielten Workflows und KI-Automatisierungen sinnvoll bearbeiten lässt – komplett lokal, ohne dass auch nur ein Byte personenbezogene Daten einen US-Hyperscaler streift. Die…

Read more →

Hugging Face Forums [Unofficial] @discuss.huggingface.co.web.brid.gy

May 8

How I can create my own synthetic dataset based on my PDF?

Since PDFs come in a wide range of formats—from relatively well-structured documents to those that are almost entirely images—I believe OCR is the biggest challenge when creating a dataset based on PDFs. In any case, we can’t expect the same level of precision as with LaTeX. Furthermore, if we have an LLM generate the “Answer” part of Q&A pairs, the resulting data may be of questionable quality…

Read more →