In early February 2023, I built a Slack bot to make OpenAI LLMs readily accessible within our company for experimentation. It wasn’t planned, budgeted, or part of any official initiative—just something I hacked together out of excitement for large language models. I called it HAL, after the computer in 2001: A Space Odyssey. What started as a fun experiment quickly became a core tool that many of us now use every single day. HAL originally ran on OpenAI’s GPT-3.5-turbo model, and since then we’ve been quick to adopt every major new model release as they’ve become available. After that initial development to get core functionality working and deployed to all employees, HAL has been extended organically and mostly in my spare time as a hobby—guided by user feedback, the release of new models and features, and whatever seemed fun or useful to implement.
Why Slack?
Slack was a natural choice. It was already the company’s preferred communication platform, and by embedding HAL there:
- Everyone could use it instantly on desktop, web, and mobile.
- User identification, auditing, and access controls worked out of the box.
- It felt familiar from day one, no training required.
- It allowed us to move fast, deliver something that worked almost immediately, and required minimal ongoing support.
- Running the HAL code on-prem also makes it easier to integrate with internal systems and gives us full flexibility to customize it exactly the way we want.
Fun fact: our CEO, David, was one of the first to try HAL—and of course, one of his very first requests was to ask it to open the bay door. True to its 2001: A Space Odyssey roots, HAL politely refused.
Ways to Interact with HAL
The main way people interact with HAL is through direct messages. The following Slack features are also utilised by HAL:
- Threaded conversations – Each thread acts like its own private HAL session.
- Channels – HAL can join public or private channels. It only responds when mentioned, unless the channel is dedicated to HAL, in which case it behaves like a DM (optionally with multiple participants).
- File uploads – Users drag and drop code snippets, PDFs, or API specs into HAL chats and then ask questions directly about the content.
- Formatted replies – HAL returns its answers as Slack messages using markdown and emojis, supporting inline URLs, code blocks for short snippets, and other pre-formatted output.
- File outputs – For longer replies with code or structured text, HAL uploads them to the channel as files. Slack automatically applies syntax highlighting for supported languages, making the output easy to read, share, and reuse.
- Channel-based personalization – Different channels can run different models or settings. For example, I use lightweight models for quick questions and heavier models like GPT-5 (reasoning high) or o3-pro in coding channels.
- HAL CLI – There’s also an experimental command-line version of HAL, allowing interaction directly from a terminal session for those who prefer working outside of Slack.
One underrated feature is HAL’s ability to summarise recent discussions in a channel—perfect for catching up after being away.
Under the Hood: LLMs
While HAL started with OpenAI models, it was designed to be model-agnostic. Over time, we’ve run it against on-prem LLaMA, DeepSeek, and soon other models like Claude. HAL itself is written in Python and built on the Slack Bolt API, while the HAL CLI is written in Go for easy distribution as a single binary.
Currently:
- Default model – GPT-5 (reasoning: low), so that by default responses are quick. Since the release of GPT-5 we are still experimenting with which model works best as the default.
- Model routing – HAL can automatically consult with a stronger model for certain queries (see details on Spock below).
- Switching logic – HAL dynamically routes requests between OpenAI direct and Azure endpoints. On Azure we provision each model in multiple regions, and HAL automatically picks the fastest endpoint available for a given model.
- Per-channel models – Users can select different models per channel and set various parameters (such as verbosity or reasoning effort) to match their use case.
- Chat history management – All conversation histories are fully managed by HAL itself. It preserves encrypted tokens, injects full usernames and timestamps into messages, and handles continuity. This approach has proven the most flexible, avoids lock-in to any specific solution, and makes it easier to integrate with other LLM providers. When HAL was first built, LangChain wasn’t ready yet—later, there was never a real need to add it.
- JSON mode – HAL currently requests models to reply in JSON with a provided schema (OpenAI's JSON mode), which made implementing plugins far simpler. It also has logic to retry a query a few times if the model responds with invalid JSON or something that doesn’t match the schema—usually prompting the model to fix it itself. (More recent models rarely respond with invalid JSON.). At some point, I plan to switch to Structured Outputs.
Plugins
HAL has a concept of plugins that add additional capabilities beyond standard chat. Some are enabled by default, while others are opt-in depending on user needs. Here are some of the main ones:
- Spock – Our “power mode.” HAL calls a stronger model (GPT-5 reasoning high or o3-pro) with code execution + web search for tough science, math, and programming tasks.
- Web Search – GPT WebSearch or Google Search for real-time info, and for any additional information not already part of the model.
- Download URL – Pulls web page contents (including some internal sites) into the conversation context.
- Confluence Search – Allows HAL to retrieve and use selected internal Confluence pages for context.
- Image generation/editing – From DALL·E to GPT-image-1.
- Email – HAL can send a response directly to a user’s inbox.
- TTS (Text-to-Speech) – Generates an audio file of a response using OpenAI’s TTS models and uploads it to Slack.
- LaTeX rendering – Converts math into images (useful for equations).
- VDI ADM – Allows users to perform actions like reboot, reset, and similar operations on their VDI workstations.
Other plugins have been more experimental, such as Wolfram Alpha or PagerDuty integration, but they’ve demonstrated just how flexible the architecture is.
Three Notable Extensions
1. Spock
Spock is HAL’s “specialist mode.” It uses the heaviest model available, with web + code execution, to tackle deep technical queries. HAL auto-routes to Spock when users ask about physics, biology, or advanced coding—but users can also explicitly request it.
Yes, the answers may take minutes, but they’re often worth the wait.
2. VDI ADM
HAL can also help employees manage their virtual desktops. It allows users to perform actions like reboot, reset, and similar operations on their VDI workstations.
This feature has greatly reduced the number of callouts to our Helpdesk for requests to restart hung VDI workstations (researchers always find a way to hang theirs while experimenting). It’s also very easy for users to remember—no need to learn the name or syntax of yet another tool; you just ask HAL in whatever language you like, even from your mobile phone while away from your desk, and it handles it.
A humorous anecdote: early on, someone asked HAL if it could restart a specific application on their workstation. HAL responded that it couldn’t—but since it could restart the entire workstation (which would restart the application too), it went ahead and did just that. While the potential for this behaviour was known, it hasn’t actually happened for quite some time. Since that incident, the plugin was modified so that it now requires the user to confirm the requested action before it will be executed—HAL can no longer run the action without explicit confirmation by the user.
3. Confluence Search
HAL can also search and process selected internal Confluence pages. This works in two ways:
Force-feeding technique – A given documentation dump is exposed to HAL as a plugin/function it can call. When triggered, the entire dataset (with some metadata and URLs) is preloaded into a model with a large context window, along with instructions and the user query. HAL is instructed when to use it and can retrieve or synthesize information based on the plugin's reply without polluting the main chat history.
Download URL approach – For certain topics, HAL is instructed to fetch relevant Confluence pages directly at query time, pulling them into the conversation context to answer questions or for analysis or summarisation.
Together these techniques make Confluence search surprisingly effective, even without embedding databases (which we'll most likely add in the future).
Cost vs. Value
Despite usage growing, our actual cost of using HAL (API calls) has been going down. Model prices are dropping faster than usage is rising, and features like input token caching help further. Given the productivity boost HAL brings, the costs are negligible compared to the value delivered.
Looking Ahead
The LLM landscape is moving at lightning speed. On HAL’s roadmap:
- Support for MCP servers.
- Wider integration with non-OpenAI models.
- Deeper coding integrations.
- User customisations – the ability for users to provide their own additional instructions for HAL to follow, making interactions more tailored to their style or needs.
Lessons Learned
While having an LLM assistant directly in Slack is extremely convenient, it also comes with significant limitations. Some of the more notable ones include:
- Message size limits – Longer replies must be split into multiple messages, and Slack formatting often breaks when messages are split. This can be partially worked around in code, but it adds complexity.
- No “thinking” indicator – Slack’s old “user is typing…” signal isn’t supported by newer APIs. Injecting emojis or similar tricks isn’t the same.
- No streaming support – Users have to wait for an entire reply to be generated before they see anything, unlike in a dedicated client.
- Limited Slack markdown – Not rich enough to render more complex messages.
- File size constraints – File snippets have limits; larger files redirect to a browser.
- Basic interactivity – Interactive buttons and message support are limited, with little control over how they work on mobile or iPad.
- File upload limits – Maximum of 10 files per upload.
- Missing context signals – Slack doesn’t provide current estimated user location, which could sometimes be useful.
- Many smaller quirks – Dozens of other small issues that add friction.
Providing an LLM bot via Slack is always going to be constrained compared to a dedicated client. However, if your company already uses Slack, having the LLM available there is still very useful—even if for certain tasks it makes more sense to switch to another tool. In our case, many of us use both HAL in Slack and GitHub Copilot inside VS Code, depending on the situation. Sometimes one is more convenient, sometimes the other.
![]() |
Our headquarters in London... :) |