Local AI and Vibecoding for FluentControl

Hey all,

I’ve been working on a small hobby project where I’m trying to use a local LLM to help write Tecan FluentControl protocols. I’d like to ask the community for input, advice, or experiences from anyone who has tried something similar.

The main point is honestly to get some practice with vibe coding and local AI, but if it ends up
being useful too, that would be a nice bonus.

The core limitation I’m working around is that local models don’t really have enough domain
knowledge, or enough coherent planning, to reliably pick sensible labware and use physical tools in a reasonable way.

A bit about the setup:

The model is Qwen3.6-27b-Q4_K_S, served through LM Studio. So far it has worked better than the other local models I tried.

The model gets access to a few tools to help it navigate a request:

  • a small SQL database of labware, based on the local FluentControl install
  • a Python-like DSL for writing the protocol
  • a simulator that checks whether the generated protocol has obvious problems, similar to things FluentControl context check would catch

The model produces a DSL .py file, which renders into an .xscr, which then imports into
FluentControl.

Very simple scripts work fine. A basic bead enrichment protocol also “works”, with a generous
definition of working and depending somewhat on moon phase. The simulator still needs some love.

More documentation is in the repo:
[repo link here]

If it feels very vibe coded, that is because it is very vibe coded.
Current best effort on a bead enrichment (still has issues):

I’m curious whether anyone here has tried anything similar, and what your experience was.

A few specific things I’d be interested in:

  • Did giving the model more detailed simulator feedback help, or did it mostly confuse things? I had an earlier, more lightweight iteration where the domain handling was worse, but the syntax was better.
  • Is it better to use structured error categories, or more natural-language repair hints?
  • Has anyone had better results with smaller, focused, or fine-tuned models instead of larger
    general models?
  • Any general tips for making an LLM handle protocol logic more reliably?
  • Any opinions on multi-agent setups with different roles? I tried this in an earlier iteration, but
    it was a bit of a mixed bag.

A lot of the issues would probably be different or non-existent if I used a frontier model, but the point is to a large degree to figure out what is possible within a local setup.

Truth be told, most of what this tries is one-shotting a complete protocol and I know that’s a gimmick. The useful route would probably be fleshing out the DSL and slot the AI in for autocomplete, but trying to one shot is more fun. :man_shrugging:

Let me know what you think and thanks for reading

1 Like

Cool project! Im confused about the DSL thing. My interpretation of what DSL means is like, a whole programming language. Did you invent a whole programming language for this? I think better understanding how your setup works would be helpful for giving advice.

I have had a lot of success with LLMs for basic workflows somewhat like what you’ve shown. I think you’re more likely to have success relying on context and examples rather than having an agent run simulations to do trial and error. Most biology protocols are very standardized and if you give enough examples your agent shouldn’t need to do trial and error. If it makes a mistake that usually means that something is fundamentally missing in the context. An agent is very unlikely to have good judgement at correcting its mistake with respect to a robot protocol in my opinion.

The SQL database seems like overkill too, since you’re relying on an agent doing tool use to make queries, and it could easily miss something important. I bet if you just put the relevant deck and labware information into the context window in JSON or markdown it will work better.

My last point is that it depends what you want to use the LLM for. I’ve found github copilot to be very good at giving autocomplete suggestions or translating inline comments to functional code. To me that’s as good as I could need. I’ve tried uploading a protocol PDF to Claude along with other code examples and context and it’ll do okay with oneshotting it (~80% correct) but you still have to manually inspect every single line anyway. That’s probably good for getting started with a template but I don’t think oneshotting protocols is the most productive use of LLMs.

Question for you, how are you deploying it? Does dropping the .scr file into the folder automatically pick it up?

With that said, I think you’d have a lot more success by remembering that these machines are general purposes machines, extended beyond their initial capabilities to maximize their market reach. They have capabilities, functions and labware that you will ever use and that perhaps are best avoided. Your lab is probably not a general purpose lab. You work at a company that does a thing and it’s centered around the execution of a series of protocols with a finite number of workflows. You probably have a subset of tools that you use, tips, liquid classes, plating strategies, etc… If you go through the exercise of defining the aforementioned, you can narrow the LLM’s scope with Skills (.MD files) that capture your best practices, your ontologies, and your lab.

BTW if you don’t have any of the aforementioned, defining them is still a good skillset!

DSL’s can be super powerful and this is a fun project! Thanks for sharing.

1 Like

Hey Stefan,
I wouldn’t say the DSL is a whole new language. I think it qualifies as a very basic embedded DSL though: structurally it is just Python, but the Python is used to build a protocol representation that is rendered into .xscr. Having a compact Python-shaped representation helps a lot with readability and saves a lot of tokens for the model compared with raw .xscr.

The issue that I had with examples is mostly how to get them and how to serve them. How did you get examples into context? Retrieval on keyword? A whole protocol or some sort of abstraction? I found that feeding whole protocols in xscr just overflows context, but I should probably try again since the .py abstraction is much smaller token wise.

The “simulator” does not really check protocol logic that would be learned from examples, I think. It really just takes the script the model outputs and walks through it to check that for this specific script it didn’t overdraw a tip or used the same position twice. It basically is supposed to be the build-in auto context check of Fuentcontrol.

How would you pass the information about labware? Specify exactly which to use for what?
In any case I would disagree the SQL is overkill, all components and deck segments are stored in the same folder as xcmp files with uuids for names. Even if I end up manually providing deck and labware, having the whole install in a place that I can search is really helpful.

For what kind of system does copilot autocomplete work? Like, I can see this working with something based on PyLabRobot or similar, but for FluentControl I just wouldn’t know how that would work. Maybe I’m missing some really obvious way of writing fluent scripts though, all I have found is the gui and the xscr :man_shrugging:t2:

Hey Luis

“Proper” deployment drops it into fcs database folder under a fresh GUID. It’s found by Fluentcontrol on startup. Since this would clutter the database folder with a lot of garbage, most of what I do for inspection in Fluentcontrol is to copy the body of the script into an existing one that is designated just for that. The checksum is wrong but it seems to work.

Your point about most labs having a narrow scope regarding labware and workflows is very true but since I don’t work at a company that does a thing, I wouldn’t really measure success in solving a defined problem, but learning where the limit is. I would actually argue that my scope is much narrower than the collection of workflows and tools we had in my last lab. Things like worklists or tip sorting are out of scope for now and liquid classes are really simplified at the moment, I.e. “make a liquid class variable for each reagent, put water free single as default value”

Defining Skills is probably very good advice, I’ll check how that’s implemented with langchain and try that. Any chance you would be able to share an example of a skill that worked for you?

Yes Python will be vastly easier for an LLM to reason about than xsrc files. Not only will it consume far fewer tokens but Python is probably one of the most represented languages in the training data and the semantics of variable and function names make it easy to intuit what is actually happening in a protocol.

Qwen27B 3.6 should support many thousands of lines of code in a context window. This could include whole protocols but more efficiently would be like annotated guides of how to do common steps. Think of how you would write a markdown tutorial for a human programmer to learn your system. An LLM will consume the tutorial in a similar way.

Likewise for your labware data. I don’t think SQL is totally wrong for this but it’s worth thinking about what the most efficient way is to explain to an LLM what labware is available and what properties the labware has. You can give all the information in context, or show the LLM how to retrieve the information, or something in between. What do you think the sql representation offers that the existing file system storage does not? SQL requires the overhead of showing the model what your table schemas are in context, and having the agent iteratively query until it finds what it needs. Is this the most efficient possible way for the model to figure our what labware to use? Maybe, but that is the question you want to ask to decide what approach to use.

My overarching suggestion is to find a set of notes that give you the best performance for your workflow. This can take some iterative refinement where you find what the model fails at and then include more notes on that. This is broadly the meta for basically all LLM productivity usecases.

As for the simulator I totally agree with that approach of checking for correctness, I guess I usually have found that agentic reasoning is not that good for fixing mistakes but if you’ve found otherwise then thats very good to hear.