Hugging Face Forums [Unofficial]
@discuss.huggingface.co.web.brid.gy
7,956 documents
0 likes
0 shares
Feb 2026 since
View on Bluesky
Error Generating Deep RL Course Certificate

Hi.

I am not a maintainer, but I investigated this in a duplicated version of the Certification Space.

Using your username, the app was able to find your models, parse the model-card metadata, and compute a pass percentage of about 81.8%, which should qualify for the completion certificate. So this does not look like a simple “below 80%” or “models not found” case.

In my duplicate, the…

Read more →
HoLo-ToLk: tokenizer-free speech (STT + TTS) on the 0-parameter HSL byte substrate

I think that’s probably the safer direction. If I were organizing that route, I’d frame it like this:


Short read

This route becomes easier to evaluate if each branch gets its own narrow label.

For the speech branch:

tokenizer-free speech on the fixed HSL substrate

For the BPE text-generation branch:

**embedding-table-free text generation over BPE-segmented HSL…

Read more →

AgentSeal: A Corpus-Availability Audit of SWE-bench Pro

SWE-bench Pro was introduced as a harder and more contamination-resistant successor to earlier software-engineering benchmarks. We built AgentSeal v5, a deterministic audit tool, to test one narrow question:

Are SWE-bench Pro public instances, gold patches, or test signals already visible in public code sources or corpus-like…

Read more →
Isn't there a simpler way to run LLMs / models locally?

There are a few fully multimodal / omni-style large models, but if the more general goal is “I want my OSS local chat/RAG setup to call T2I/T2V”, I would usually make the image/video models separate local server processes and connect them as a pipeline. The execution cost, debugging cost, and replacement cost are usually lower that way. Existing frameworks already cover a lot of this:


##…

Read more →
I analyzed hidden-state dynamics across 7 open-weight LLMs and found recurring functional patterns. Looking for feedback

Thanks, I really appreciate the thoughtful feedback.

I completely agree that probing and causality are two different questions. At this stage, my work is about measuring the organization of hidden-state trajectories, not claiming that the decoded properties are themselves causal. Causal interventions (activation patching, steering or targeted perturbations) are definitely where I’d like to…

Read more →
I analyzed hidden-state dynamics across 7 open-weight LLMs and found recurring functional patterns. Looking for feedback

Thank you for taking the time to write such a detailed review. This is exactly the kind of feedback I was hoping for.

I agree with your central point: decodability, proxy labeling, and causal function should remain clearly separated. At the moment I only claim that certain properties are linearly decodable from hidden-state trajectories under a specific extraction pipeline. Whether the model…

Read more →
We all start somewhere

Well, If I can assume your technical stack, the explanation can be fairly dense:


Direct answer

I would not start by looking for “the best model.” I would first split the problem into layers:

  • local runtime
  • model format and quantization
  • chat template
  • a small eval set
  • RAG / retrieval
  • fine-tuning / adapters
  • model choice
  • tool use / agents
  • offline and…
Read more →
[Research] From Functional Geometry to Dynamic Grammar: New LIMEN Audits (V23–V24) Across 7 Architectures

Hi everyone,
I am sharing recent results from my independent research project, LIMEN (Liminal Internal Metric for Emergent Navigation), which aims to characterize the internal dynamics of Transformers through hidden state analysis.
Following our previous findings that functional information is encoded in the relative geometry of representations rather than individual neurons (V22), this new phase…

Read more →
🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

@KnackAU I just realized we’re on the exact same wavelength! I’ve been meaning to circle back to something you mentioned in your earlier post — that idea about taking Gemma MTP heads and finetuning them for specific tasks like Python generation. That really stuck with me because I’ve actually been down a very similar rabbit hole, just from a slightly different angle.

A while back I took Google’s…

Read more →
DNA, LLM and Wick-Ledger Correspondance (2nd Rosetta Stone)

Yes, I think the most useful missing layer is probably not the broad DNA/LLM analogy itself, but the smaller HeTu–LuoShu control scaffold underneath it.

The way I am currently using HeTu–LuoShu is not as mysticism. I am treating it as a compact invariant grammar:

LuoShu = an upon-collapse (question asked but token not yet fixed) trace board with fixed slot capacities and line invariants.

HeTu…

Read more →
Hey everyone! waving_hand I just published an open-source, bilingual (EN/ES) guide on the inner workings of Transformers

Hey everyone! I just published an open-source, bilingual (EN/ES) guide on the inner workings of Transformers.

If you are interested in the exact math and mechanics behind attention collapse, KV-cache compression, or just want a solid visual step-by-step from scratch, you might find this very useful. It includes reproducible code and connects with my TAF Agent project for practical…

Read more →
🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

We do very much align in our thinking. I to used FunctionGemma for mobile, I also finetuned it for use in grammar correction, as a small model router, it is extremely useful and I love playing around with small focused models. I used Termux and Shizuku to get around most things. I agree with you that the Aiden project is a great idea, but..

I wish I could say that the Aiden project was mine. But…

Read more →
Cannot restart my private space: "503. Something went wrong when restarting this Space."

Um? here is a forum for users to communicate with each other. For example, in rare cases where the issue is caused by a widespread infrastructure problem, HF staff might check it out or take action, but that’s more of an exception.

Well, if it’s an infrastructure issue, it might resolve itself if you just wait.
However, **if it’s an individual account-level issue, it generally won’t…

Read more →
🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

Weekend Research Update — SFT Experiments, BPB Metric & New Pretraining BEST

Project Inkblot | Mateusz Piesiak | 26–28 June 2026

_A quick personal note before diving in — I genuinely wish I had more time to participate actively in the discussion here. Life has been quite full lately: day job as a Manufacturing Controller, family, kids, the household, and everything else that comes with…

Read more →
Cannot restart my private space: "503. Something went wrong when restarting this Space."

I think there is enough information in my post. The “503. Something went wrong when restarting this Space.” is a known HF-issued error that is user-unfixable. This is why I included the “Request ID : Root=1-6a42562f-1dfaffa84dbd985f5b11284a”

Restart / Factory rebuild fails, but no new Build or Container logs appear at all.
In that case, I would stop treating it as only an…

Read more →
Shannon Prime Lattice

The Latent Interceptor framework:

Draft body = the shared latent processor. The finetuned 4-layer draft, vocab head ripped off. It runs once per intercept, producing a 1024-d latent. Because there’s no 262k projection, it’s ~ms and CPU/Hexagon-pinnable (your <2 ms point holds — the body is tiny; the vocab matrix was the whole cost).
**A registry of specialized heads tapping that latent,…

Read more →
DNA, LLM and Wick-Ledger Correspondance (2nd Rosetta Stone)

I just had the time to give this a quick glance. But I like where your head seems to be at. I am working on a very similar framework. It’s extremely complicated and an extremly hard project to take on. Kudo’s to you for attempting it and for how far along you are.

When I get the time I will see if there is anything I can contribute after your latest post. In the mean time if you have any…

Read more →

Project UCTF: An Open Research Program on Machine-Native AI Training Representations

This post is a follow-up to my earlier concept proposal on the Universal Compressed Training Format (UCTF).

The discussion that followed—especially the detailed technical feedback from @John6666—made me realize that the original concept combined several different research questions into a single proposal.…

Read more →
[Concept] The Generational Context Architecture (GCA)

I am working on something very similar so I don’t want to contaminate what you are doing too much. But my immediate thoughts jump to using an “offline” finetuned version of the same model or a completely newly trained model that acts as a curator over the files. Store every output, store every input, store every injection, store every file access, store every datapoint you can as the system is in…

Cannot restart my private space: "503. Something went wrong when restarting this Space."

Hmm… with only a 503 error, it is hard to narrow this down. If you can see a concrete error in the build or container logs, there may be cases where this can be fixed from the user/repo side. But if no useful logs appear, issues like this often need Hugging Face support to check the Space from their side, e.g. viawebsite@huggingface.co:


I would first separate this into two different…

Read more →
Error Generating Deep RL Course Certificate

Hello,

I have tried generating certificate of completion for the aforementioned course. I entered username, first and last name and see an error. I attach a screenshot of the result I get. I couldn’t find any guidance online. How can I get it to work?

Thanks in advance

Isn't there a simpler way to run LLMs / models locally?

This is a topic that I myself am interested in often travelling for work to countries where their internet is far from being secure or satisfactory connection wise, I have had to lean on offline installs, ideally for me would be a claude or a sonnet usb offline model even better if the model had text-image/text-video capabilities but I am probably pushing the limits , along with is I would like…

🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

Yes - it is more of a human thing, but it is most obvious in the USA though also noticeable in the UK, and in both countries it started getting worse at the same time (Thatcher / Reagan).

But also yes - we cannot fix society or really even influence it - but we might perhaps be able to make a difference in a much smaller area of AI.

[Concept] UCTF — Universal Compressed Training Format: A Mediator Layer for Multilingual AI Training

Thank you. My intention is not to claim that existing multilingual embeddings already solve this problem. My goal is to investigate whether a machine-native intermediate representation for training can reduce cross-lingual redundancy while preserving the information required for downstream generation. I agree that a bottleneck probe and degradation analysis are the right first experiments

🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

@ProtopiaUk I am pretty sure we are both on the same page. I did not take any offense by the way. But as I say, the only conclusion to the problem that I have come up with is to engage and interact with others who are doing the same thing that I am interested in, or that I am doing. I don’t try to push my idea’s, rather I try to engage in a way that perhaps gets them past a bump in the road or…

Read more →
🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

@KnackAU Ray - I probably should have gone and looked at your contributions here to see what you had done, but I failed to do this earlier for which I apologise - however, honestly wasn’t pointing at you when I was commenting generally about a lack of teamworking.

I guess both you and @Mati83moni Mateusz essentially are doing fundamental experimental research - you have synergistic ideas on how…

Read more →
Page 1 Older →