Compare commits

..

394 Commits

Author SHA1 Message Date
Evan
879d178057 add kimi tool parseing 2026-01-21 16:53:38 +00:00
Evan
daa31b4472 implement mlx-lm tool calling 2026-01-21 16:53:38 +00:00
ciaranbor
6a9251b920 Add mflux type stubs (#1234)
## Motivation

Simplify image generation review
2026-01-21 15:07:42 +00:00
rltakashige
758464703d Fix GPT OSS tensor sharding with upstream MLX LM (#1223)
## Motivation
MLX LM has given GPT OSS a shard method, but MLX does not have an update
to match.

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-20 18:24:54 +00:00
rltakashige
9e2179c848 Register original layer in CustomMlxLayer (#1229)
## Motivation
Kimi K2 Thinking Pipeline RDMA was broken before.

## Why It Works
No clue tbh

## Test Plan

### Manual Testing
Kimi K2 Thinking and GPT OSS work at the same time on Pipeline RDMA.
Needs exo bench to check more thoroughly

### Automated Testing
Layer composition tests still pass.
2026-01-20 18:20:01 +00:00
Evan Quiney
22b5d836ef swap all instances of model_id: str for model_id: ModelId (#1221)
This change uses the stronger typed ModelId, and introduces some
convenience methods. It also cleans up some code left over from #1204.

## Changes

`model_id: str -> model_id: ModelId`
`repo_id: str -> model_id: ModelId`

Introduces methods on ModelId, in particular ModelId.normalize() to
replace `/` with `--`.

This PR did introduce some circular imports, so has moved some code
around to try and limit them.

## Test Plan

Tests still pass, types still check. As this is about metadata, I
haven't tested inference.
2026-01-20 17:38:06 +00:00
Alex Cheema
ea9c6d6bdf Remove dead local paths code from download_shard (#1227)
## Motivation

The `download_progress_for_local_path` function and the "Handle local
paths" code block in `download_shard` are dead code that cannot be
reached in normal usage. The code checks if `model_id` (e.g.,
"mlx-community/Llama-3.2-3B-Instruct-4bit") exists as a filesystem path,
but model IDs are constrained to HuggingFace repo format and there's no
API pathway to pass local paths.

## Changes

- Removed `download_progress_for_local_path()` function (45 lines)
- Removed the "Handle local paths" block in `download_shard()` (7 lines)

## Why It Works

This code was added in PR #669 as part of a "feature-local-models"
branch, but the feature was never fully integrated. The check
`aios.path.exists(str(shard.model_card.model_id))` would only return
true if a directory literally named
"mlx-community/Llama-3.2-3B-Instruct-4bit" existed in the cwd, which
doesn't happen in practice. Offline caching is already handled by
`fetch_file_list_with_cache`.

## Test Plan

### Manual Testing
- Run exo normally and verify downloads still work

### Automated Testing
- Existing tests pass (this code had no test coverage)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 17:07:27 +00:00
Alex Cheema
4ea66d427b Reduce download log spam (#1225)
## Motivation

When `skip_download=True`, exo was logging a lot of unnecessary messages during periodic download status checks. This resulted in spammy logs that made it hard to see important messages.

## Changes

- Only log "Downloading ... with allow_patterns=..." when actually downloading (not when skip_download is true)
- Changed periodic download progress check logs from INFO to DEBUG level

## Why It Works

The `skip_download=True` parameter is used when checking download status without actually downloading. By guarding the log behind `if not skip_download:`, we avoid logging on every status check. Changing the periodic emitting logs to DEBUG level reduces noise while still keeping them available for debugging.

## Test Plan

### Manual Testing
- Run exo and observe that logs are less spammy during normal operation
- Use -v or -vv flags to see DEBUG logs when needed

### Automated Testing
- Existing tests cover this code path
2026-01-20 16:57:05 +00:00
rltakashige
8b709e68b2 Mark slow tests as slow (#1220)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-20 15:03:46 +00:00
Evan Quiney
4da6eeb11f fix a test broken by #1204 (#1219)
bad merge broke a test - fix it
2026-01-20 14:56:20 +00:00
Evan
3d2eee4884 quiet localhost log
this log is just noise - remove it
2026-01-20 14:51:26 +00:00
Evan
116558839e don't clear mdns discovered connections
pingers currently removes mdns discovered connections - these systems
should be independent
2026-01-20 14:46:20 +00:00
Evan Quiney
d4f551c602 Simplify model cards (#1204)
## Motivation

We have a lot of unneeded data in the model card - lets just keep the
necessary stuff and add back more data when we need it

## Test Plan

EXO still runs! (pipeline on 2)

Co-authored-by: rltakashige <rl.takashige@gmail.com>
2026-01-20 11:01:19 +00:00
Alex Cheema
176ab5ba40 Add GLM-4.7-Flash model cards (4bit, 5bit, 6bit, 8bit) (#1214)
## Motivation

Add support for GLM-4.7-Flash, a lighter variant of GLM-4.7 with the
`glm4_moe_lite` architecture. These models are smaller and faster while
maintaining good performance.

## Changes

1. **Added 4 new model cards** for GLM-4.7-Flash variants:
   - `glm-4.7-flash-4bit` (~18 GB)
   - `glm-4.7-flash-5bit` (~21 GB)
   - `glm-4.7-flash-6bit` (~25 GB)
   - `glm-4.7-flash-8bit` (~32 GB)

   All variants have:
   - `n_layers`: 47 (vs 91 in GLM-4.7)
   - `hidden_size`: 2048 (vs 5120 in GLM-4.7)
   - `supports_tensor`: True (native `shard()` method)

2. **Bumped mlx from 0.30.1 to 0.30.3** - required by mlx-lm 0.30.4

3. **Updated mlx-lm from 0.30.2 to 0.30.4** - adds `glm4_moe_lite`
architecture support

4. **Added type ignores** in `auto_parallel.py` for stricter type
annotations in new mlx-lm

5. **Fixed EOS token IDs** for GLM-4.7-Flash - uses different tokenizer
with IDs `[154820, 154827, 154829]` vs other GLM models' `[151336,
151329, 151338]`

6. **Renamed `MLX_IBV_DEVICES` to `MLX_JACCL_DEVICES`** - env var name
changed in new mlx

## Why It Works

The model cards follow the same pattern as existing GLM-4.7 models.
Tensor parallel support is enabled because GLM-4.7-Flash implements the
native `shard()` method in mlx-lm 0.30.4, which is automatically
detected in `auto_parallel.py`.

GLM-4.7-Flash uses a new tokenizer with different special token IDs.
Without the correct EOS tokens, generation wouldn't stop properly.

## Test Plan

### Manual Testing
Tested generation with GLM-4.7-Flash-4bit - now correctly stops at EOS
tokens.

### Automated Testing
- `basedpyright`: 0 errors
- `ruff check`: All checks passed
- `pytest`: 162/162 tests pass (excluding pre-existing
`test_distributed_fix.py` timeout failures)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 03:58:09 +00:00
rltakashige
f5e6aa82d2 Load layers individually (#1211)
## Motivation

Certain models hang at model loading in tensor parallel. 

Hopefully closes #1205 

## Changes

- Load layer by layer for tensor parallel sharding
- Move eval_with_timeout to auto_parallel.py to resolve circular import.

## Why It Works

The naive way to fix this is to use load model with lazy = False and
then shard in tensor parallel. However, this requires the entire model
to be loaded into memory.

Instead, we can load layer by layer and shard after loading. There is a
very small memory footprint to this, but it is negligible.

I tried loading layer by layer after the sharding, and this allowed
model loading but got stuck at warming up.

## Test Plan

### Manual Testing
GPT OSS loads with TP and FAST SYNCH. Kimi does too.

### Automated Testing
We need to run a suite of exo_bench before merging this!
2026-01-20 03:26:51 +00:00
Alex Cheema
39f0ed6018 Prepend <think> tag to stream for thinking models like GLM-4.7 (#1186)
## Motivation

For thinking models like GLM-4.7, the `<think>` tag is inserted by the
tokenizer's `apply_chat_template()` into the **prompt** (input). The
model generates tokens starting *after* this tag, so `<think>` never
appears in the streamed output. The frontend expects
`<think>...</think>` tags to extract and display thinking content.

**Log evidence:**
```
[gMASK]<sop><|system|>...<|user|>...<|assistant|><think>
```
The prompt ends with `<think>`, so the model generates content after it,
never returning the opening tag.

## Changes

- Added `detect_thinking_prompt_suffix()` helper function in
`utils_mlx.py` to detect if a prompt ends with `<think>` tag
- Added `parse_thinking_models()` generator wrapper in `runner.py` that
prepends the thinking tag to the output stream
- Modified the main generation loop to use the thinking wrapper for
non-GptOssModel models when a thinking prefix is detected
- Updated test mocks to handle the new `apply_chat_template` call

## Why It Works

The solution follows the same pattern as `parse_gpt_oss()` - a generator
wrapper that transforms the output stream. When the chat template ends
with `<think>`, we prepend this tag to the first generated token so the
frontend receives the complete `<think>...</think>` structure it
expects.

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
- Run exo: `uv run exo`
- Send a chat request to GLM-4.7:
  ```bash
curl http://localhost:52415/v1/chat/completions -H "Content-Type:
application/json" -d '{
    "model": "mlx-community/GLM-4.7-8bit-gs32",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "stream": true
  }'
  ```
- Verify the streamed response starts with `<think>` tag
- Verify the frontend dashboard correctly shows the thinking section
collapsed

### Automated Testing
- All 72 worker tests pass: `uv run pytest src/exo/worker/`
- Type checker passes: `uv run basedpyright`
- Linter passes: `uv run ruff check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>
2026-01-19 19:44:51 +00:00
Alex Cheema
ee43b598fe Split NodePerformanceProfile into granular state mappings (#1209)
## Motivation

The current `NodePerformanceProfile` is a monolithic object where every
update (even 1-second memory updates) replaces the entire profile,
touching unrelated data. Different fields update at vastly different
frequencies:

| Data | Update Frequency |
|------|------------------|
| Memory, System | 1 second |
| Thunderbolt | 5 seconds |
| Network interfaces | 10 seconds |
| Friendly name | 60 seconds |
| Model/Chip ID | Once at startup |

## Changes

Split into separate state mappings so each data type updates
independently:

- `node_identities`: Static and slow-changing data (model_id, chip_id,
friendly_name)
- `node_memory`: RAM and swap usage
- `node_system`: GPU usage, temperature, power, CPU metrics
- `node_network`: Network interface information
- `node_thunderbolt`: Thunderbolt interface identifiers

Added a backwards-compatible `node_profiles` property that reconstructs
`NodePerformanceProfile` from the granular mappings for dashboard
compatibility.

**Files modified:**
- `src/exo/shared/types/profiling.py` - Added `NodeIdentity`,
`NodeNetworkInfo`, `NodeThunderboltInfo` types
- `src/exo/shared/types/state.py` - Added 5 new mappings +
`node_profiles` property
- `src/exo/shared/apply.py` - Updated `apply_node_gathered_info` and
`apply_node_timed_out`

## Why It Works

Each info type now writes only to its specific mapping, avoiding
unnecessary updates to unrelated data. The `MacThunderboltConnections`
handler reads from `node_thunderbolt` instead of the old `node_profiles`
for RDMA connection mapping. The backwards-compatible property ensures
the dashboard continues to work unchanged.

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
- Start exo and verify dashboard shows node info
- Verify memory/GPU updates stream correctly
- Check that node timeout properly cleans up all mappings

### Automated Testing
- All 162 existing tests pass
- basedpyright: 0 errors
- ruff check: All checks passed
- nix fmt: Applied

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 18:24:15 +00:00
rltakashige
5fd55594c9 Wrap pipeline models for explicit mx.depends between cache and logits (#1206)
## Motivation

GPU timeouts often when prompt size > profile_step_size. It also happens
for seemingly random models.

## Changes

Add mx.depends for cache on the logits.
All gather at the model level rather than the layer level, reducing the
amount of data sent.

## Why It Works

mlx_lm's prefill loop only evaluates cache state, not logits.
When prompt > prefill_step_size, the all_gather is never evaluated,
causing GPU timeout.

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
Added failing test cases and then resolved them.
2026-01-19 17:49:42 +00:00
Jake Hillion
5ab1f8b3e2 NetworkSetupHelper: detect stale startup script content
The daemonAlreadyInstalled() function checked that the startup script
file existed and validated plist properties, but did not compare the
actual script content. If the setupScript constant was updated in a new
app version, the stale on-disk script would not be detected or replaced.

Added a guard clause that reads the installed script from disk and
compares it against the expected setupScript content (with whitespace
normalization). When content differs, the function returns false,
triggering the reinstallation flow with an admin privileges prompt.

Test plan:
- Installed on a cluster that had the previous network config. Got the
  popup asking for permissions. After accepting I could run Kimi K2
  Thinking Tensor RDMA on all 4 nodes.
2026-01-19 17:36:15 +00:00
Evan Quiney
2202685c3e refactor all information sources (including ipless rdma discovery) (#928)
## Motivation

Information gathering is tightly coupled to MacMon - we should start
generalizing our information sources so we can add more in future.

## Changes

Added a new system to gather any information. Currently, it is attached
to the Worker - though this is mostly to keep the data processing logic
simple. It could be made independent quite easily.

I also refactored topology to include different kinds of connections as
we can gather RDMA connections without having a pre-existing socket
connection, and made the relevant placement updates. We should no longer
need the network locations script in the app.

Other sources of information now include:
- static node information like "model" and "chip" (macos, "Unknown"
fallback)
- device friendly name (macos, falls back to device hostname)
- network interfaces + ips (cross platform)
- thunderbolt interfaces (macos)
- thunderbolt connections (macos)
- RAM usage (cross platform)
- per-device configuration written to EXO_HOME/config.toml

## Limitations

Model and Chip are not cross platform concepts.

We do not differentiate between unified and non-unified memory systems.

A lot of this data collection is based on simple timers. Watching the SC
store on macos is the correct way to gather some of this information,
but requires a detour into rust for macos.

## Why It Works

The InfoGatherer is a generic subsystem which returns a union of metric
datatypes. It writes them to an event, which is applied to state. It is
currently re-spawned with the worker so each cluster receives the
correct information.

As for topology, macOS identifies TB ports with a uuid in
SPThunderboltDataType, and also stores remote uuids if it can find them.
These changes read that data with the system_profiler, hopefully not so
often as to cause notable performance impacts (though this should be
tuned) but frequently enough for moderate responsiveness.
As we can identify TB connections between devices without needing ips
attached to each interface, we can remove the network setup script
(almost) completely.

## Test Plan

### Manual Testing
Spawn RDMA instances without enabling DHCP on the RDMA interfaces.

### Automated Testing
Updated the current master and shared tests to cover the topology
refactor and new events.

---------

Co-authored-by: Sami Khan <smsak99@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: Jake Hillion <jake@hillion.co.uk>
2026-01-19 16:58:09 +00:00
Andrei Onel
ce3ad391b1 Update README.md with some changes from release 1.0.61 (#1157)
Updated README.md with documentation for four new features:

- added a "Benchmarking" section documenting the exo-bench tool for
measuring model performance across different placement configurations
- documented the custom namespace feature for cluster isolation in the
macOS app section
- added a "Configuration Options" subsection explaining the --no-worker
CLI flag for coordinator-only nodes
- added a "File Locations (Linux)" subsection documenting XDG Base
Directory Specification compliance on Linux systems

Issue #930
2026-01-19 16:43:18 +00:00
Jake Hillion
fb0151630d shard_downloader: make on_progress callback async
The on_progress callback was synchronous but always invoked from async
contexts, forcing the use of send_nowait() which could raise WouldBlock
if the channel buffer was full, potentially dropping progress updates.

Changed the callback type from `Callable[[ShardMetadata,
RepoDownloadProgress], None]` to return a coroutine, updated all
implementations to be async, and replaced send_nowait() with await
send() in the worker's download progress handler.

This allows proper backpressure handling when sending download progress
events through the channel, eliminating the "Footgun!" that was
previously documented in the code.

Test plan:
- Built a DMG and ran it on one node. All existing models showed as
  downloaded.
- Downloaded a new model. The progress bar on the download page worked.
- Downloaded another new model. The progress bar on the home page
  worked.
2026-01-19 16:19:37 +00:00
Alex Cheema
346b13e2c9 Enhance LaTeX rendering in dashboard markdown (#1197)
## Motivation

When models output LaTeX-formatted math proofs, the dashboard was not
rendering them correctly. Issues included:
- `\documentclass`, `\begin{document}`, `\usepackage` showing as raw
text
- `$...$` inline math with complex expressions (like `\frac`, `\ldots`)
not rendering due to markdown escaping backslashes
- `\begin{align*}...\end{align*}` and other math environments showing as
raw text
- `\emph{...}`, `\textbf{...}` LaTeX formatting commands not being
converted
- `$\require{...}$` (MathJax-specific) causing KaTeX errors
- `\begin{proof}...\end{proof}` showing as raw text

## Changes

Enhanced `MarkdownContent.svelte` with comprehensive LaTeX support:

**Math extraction before markdown processing:**
- Extract `$...$`, `$$...$$`, `\(...\)`, `\[...\]` into placeholders
before markdown processes the text
- Use alphanumeric placeholders (`MATHPLACEHOLDERINLINE0END`) that won't
be interpreted as HTML tags
- Restore and render with KaTeX after markdown processing

**LaTeX document command removal:**
- Strip `\documentclass{...}`, `\usepackage{...}`, `\begin{document}`,
`\end{document}`
- Strip `\maketitle`, `\title{...}`, `\author{...}`, `\date{...}`
- Strip `\require{...}` (MathJax-specific, not KaTeX)
- Replace `tikzpicture` environments with `[diagram]` placeholder
- Strip `\label{...}` cross-reference commands

**LaTeX math environments:**
- Convert `\begin{align*}`, `\begin{equation}`, `\begin{gather}`, etc.
to display math blocks

**LaTeX text formatting:**
- `\emph{...}` and `\textit{...}` → `<em>...</em>`
- `\textbf{...}` → `<strong>...</strong>`
- `\texttt{...}` → `<code>...</code>`
- `\underline{...}` → `<u>...</u>`

**LaTeX environments styling:**
- `\begin{proof}...\end{proof}` → styled proof block with QED symbol
- `\begin{theorem}`, `\begin{lemma}`, etc. → styled theorem blocks

**Display math enhancements:**
- Wrapped in styled container with subtle gold border
- "LaTeX" label and copy button appear on hover
- Dark theme KaTeX color overrides for better readability
- Custom scrollbar for overflow

## Why It Works

The key insight is that markdown processing was escaping backslashes in
LaTeX before KaTeX could see them. By extracting all math expressions
into alphanumeric placeholders *before* markdown runs, then restoring
them *after*, the LaTeX content passes through to KaTeX unmodified.

Using purely alphanumeric placeholders like `MATHPLACEHOLDERINLINE0END`
instead of `<<MATH_INLINE_0>>` prevents markdown from interpreting them
as HTML tags and stripping them.

## Test Plan

### Manual Testing
- Hardware: Any machine with the dashboard
- What you did:
  - Ask model to "write a proof in latex"
  - Verify inline math like `$x \in S$` renders correctly
- Verify display math like `\begin{align*}...\end{align*}` renders as
block
  - Verify `\documentclass`, `\begin{document}` are stripped (not shown)
  - Verify `\emph{...}` converts to italics
  - Verify copy button works on display math blocks
- Test edge cases: `$5` (currency) stays as text, `\$50` (escaped)
becomes `$50`

Before:
<img width="799" height="637" alt="Screenshot 2026-01-19 at 11 51 22 AM"
src="https://github.com/user-attachments/assets/62a705b8-b3c2-47b8-afd0-5d0c1b240e44"
/>

After:
<img width="809" height="642" alt="Screenshot 2026-01-19 at 11 46 58 AM"
src="https://github.com/user-attachments/assets/4f35fa1d-333c-4285-bc68-58a50f8f148e"
/>


### Automated Testing
- Dashboard builds successfully with `npm run build`
- Existing functionality preserved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 14:50:41 +00:00
rltakashige
ea0588429b Custom mlx layer composition (#1201)
## Motivation

With a single pipeline layer, PipelineFirstLayer gets composed with
PipelineLastLayer.

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing


### Automated Testing
Made failing tests. Fixed them!
2026-01-19 12:36:25 +00:00
rltakashige
73b3f87e07 Set swa_idx and ga_idx for single layer (#1202)
## Motivation

Layer types does not contain either "sliding_attention" or
"full_attention" for pipeline parallel (single layer).

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
Manually tested single layer of GPT OSS. Doesn't crash

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-19 12:31:11 +00:00
Evan Quiney
746589ba6b tidy: remove context manager from api (#1199) 2026-01-19 11:58:13 +00:00
rltakashige
f82f862fd7 Fix several issues with placement (#1200)
## Motivation

Uneven placements were causing issues for some users with lopsided
setups. While fixing, I ran into another issue with impossible
allocation of memory.

## Changes

- Allocate at least 1 layer per device.
- Catch overallocation of memory with an error.

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
Tested that GPT OSS is placed correctly.

### Automated Testing
Added breaking tests in the first commit. Resolved with new placement
algorithm in the second one.
2026-01-19 11:52:35 +00:00
Alex Cheema
7ff937d8a1 Add dashboard screenshots to README (#1185)
## Motivation

The README showcases exo's features and benchmarks but doesn't show what
the dashboard actually looks like. Adding a screenshot helps users
understand what they'll get when they run exo.

## Changes

- Added dashboard screenshot to `docs/imgs/dashboard-cluster-view.png`:
Shows the cluster topology view with 4 × 512GB M3 Ultra Mac Studio
running DeepSeek v3.1 (8-bit) and Kimi-K2-Thinking (4-bit)
- Added a new "Dashboard" section to README.md below Features,
displaying the screenshot with caption

## Why It Works

Visual documentation helps users understand what exo offers before they
install it. The screenshot demonstrates the cluster management
capabilities.

## Test Plan

### Manual Testing
- Verified image renders correctly in GitHub markdown preview

### Automated Testing
- N/A - documentation only change

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 10:43:27 +00:00
Evan Quiney
d19bf02404 re-raise exceptions in the runner (#1198)
## Motivation

Runners that crash can swallow errors - we should re-raise. Also the
exception handler annoyed me.

## Changes

The try: except in the runner's chat now re-raises.
2026-01-19 10:35:23 +00:00
rltakashige
618cee5223 Resolve test event ordering flakiness (#1194)
## Motivation

mp sender occasionally does not have time to flush its events before
collect() is called, making the event ordering test fail.

## Changes

- Replace mp_channel with simple collector for event ordering test
- Also suppress warning for <frozen importlib._bootstrap>:488 <frozen
importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject
has no __module__ attribute


## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
Ran the test 100 times without it failing.
2026-01-18 20:33:20 +00:00
Antonio Lujano Luna
9c29eb7d48 Add proxy and custom SSL certificate support for corporate networks (#1189)
Support HTTPS_PROXY/HTTP_PROXY environment variables for proxy
configuration and SSL_CERT_FILE for custom CA certificates, enabling use
in corporate environments with SSL inspection.

## Motivation
Users in corporate environments often need to route traffic through HTTP
proxies and use custom CA certificates for SSL inspection. Without this
support, exo cannot download models in these network configurations.

## Changes
- Added `HTTPS_PROXY`/`HTTP_PROXY` environment variable support to
`create_http_session()` in `download_utils.py`
- Added `SSL_CERT_FILE` environment variable support for custom CA
certificate bundles, falling back to certifi's default bundle

## Why It Works
- `aiohttp.ClientSession` natively supports the `proxy` parameter for
routing requests through HTTP proxies
- `ssl.create_default_context(cafile=...)` accepts a custom CA bundle
path, allowing corporate CAs to be trusted
- Using environment variables is consistent with the codebase's existing
configuration patterns (e.g., `EXO_HOME`, `HF_ENDPOINT`)

## Test Plan
### Manual Testing
- Set `HTTPS_PROXY` environment variable and verified model downloads
route through proxy
- Set `SSL_CERT_FILE` to custom CA bundle and verified SSL verification
succeeds with corporate SSL inspection

### Automated Testing
- No automated tests added; this change is configuration-only and does
not alter existing behavior when environment variables are unset
2026-01-18 12:05:50 +00:00
Alex Cheema
c5158bee53 Add pre-commit checks documentation to AGENTS.md (#1184)
## Motivation

CI failures can be avoided by running checks locally before committing.
This adds clear documentation to AGENTS.md so that AI agents (and
humans) know exactly which checks must pass before pushing code.

## Changes

Added a new "Pre-Commit Checks (REQUIRED)" section to AGENTS.md that:
- Lists all 4 required checks (basedpyright, ruff, nix fmt, pytest)
- Provides a one-liner to run all checks in sequence
- Notes that `nix fmt` changes must be staged before committing
- Explains that CI runs `nix flake check` which verifies everything

## Why It Works

Clear documentation prevents CI failures by ensuring contributors run
checks locally first. The one-liner command makes it easy to run all
checks before committing.

## Test Plan

### Manual Testing
- Verified the documented commands work correctly

### Automated Testing
- N/A - documentation only change

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:50:24 +00:00
rltakashige
5c8a237940 Handle model timeouts (#1177)
- Add eval with a timeout.
- Add fast synch flag

## Motivation

Because of the experimental FAST SYNCH flag, some models may not work.
This PR catches when this occurs and allows users to specify a run
without fast synch

## Changes

- Adds a flag to enable or disable fast synch (--fast-synch and
--no-fast-synch)
- Adds a heuristic timeout
- Reduces exo_bench default timeout to 10 minutes.

## Why It Works

Heuristic timeout assumes normal loading times on Mac devices (60 +
model size in gb / 5: e.g. DeepSeek takes up to 120 seconds to load on
tensor parallel, and timeout is set to 60 + 120 = 180s.

We could raise this value if necessary.

## Test Plan

### Manual Testing
Catches that GPT OSS fails to load in Tensor RDMA
Can launch with --no-fast-synch flag to launch GPT OSS.

**GPT OSS 20B**
TP with fast synch
<img width="3064" height="456" alt="image"
src="https://github.com/user-attachments/assets/f6e25cd8-8621-4e99-99fe-292ee05c4035"
/>

TP without fast synch
<img width="3098" height="496" alt="image"
src="https://github.com/user-attachments/assets/d36453d9-6686-4cfe-aa7c-a7d458369d4d"
/>
[Note: the performance is really not great as fast synch is off]

(As a sanity check)
PP with fast synch
<img width="3124" height="496" alt="image"
src="https://github.com/user-attachments/assets/e97d4547-c6fa-483d-badb-4b371b900b4c"
/>

PP without fast synch
<img width="3078" height="508" alt="image"
src="https://github.com/user-attachments/assets/b2e20dfd-4b0e-4295-8a92-417dfe745c28"
/>

PP without RDMA
<img width="3070" height="498" alt="image"
src="https://github.com/user-attachments/assets/a8509d68-0aef-4cda-bca5-a67d39a0801e"
/>

TP without RDMA
<img width="3068" height="496" alt="image"
src="https://github.com/user-attachments/assets/b5691429-89f4-4369-bcf2-8fde2ad7154a"
/>
2026-01-16 20:25:12 +00:00
rltakashige
745343c705 Return error responses for Chat Completions (#1173)
- Error chunks
- Use error handling in exo_bench.py

## Motivation

Return when an error occurs so that generation stops. Adding timeouts is
a separate TODO for model loading and chat completions.

## Changes

- Return HTTP exceptions as JSON responses in an OpenAI compatible
format.
- Context manager for generation to catch and return error messages.
- Use error handling in exo_bench.py.

## Test Plan

### Manual Testing
Manually tested that exo_bench returns on failures within and outside
generation

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-16 19:24:37 +00:00
Alex Cheema
5e28664c41 Fix draft release detection (attempt 3) (#1176)
## Motivation

Previous fix still failed in CI. Suspecting permissions issue with
GITHUB_TOKEN not being able to see draft releases via API.

## Changes

1. Add explicit `permissions: contents: write` to the job
2. Use `gh release list` first to check if draft exists (this uses a
different code path that might work better)
3. Add debug echo statements

## Test Plan

Delete v1.0.63 tag and re-push after merging.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 17:26:06 +00:00
Alex Cheema
ae0a804ccb Fix draft release detection query (#1175)
## Motivation

Fixes the draft release detection that failed on the v1.0.63 release
attempt.

## Changes

The jq query was piped to `head -1` which truncated multi-line JSON
output to just `{`, causing the empty check to fail.

Changed to use `first // empty` in jq instead.

## Test Plan

Tested locally:
```bash
GITHUB_REF_NAME="v1.0.63"
gh api repos/exo-explore/exo/releases --jq "[.[] | select(.draft == true) | select(.name == \"$GITHUB_REF_NAME\")] | first // empty"
# Returns the full draft release JSON (2711 chars)
```

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 17:05:24 +00:00
Alex Cheema
07cf2c1aa1 Add GitHub releases with Sparkle release notes integration (#1172)
## Motivation

Closes #1140

Currently releases are uploaded to S3 for Sparkle updates but there's no
GitHub Release created, and Sparkle update dialogs don't show release
notes. Users have no visibility into what changed.

## Changes

- Added release workflow documentation comment at top of `build-app.yml`
- Added "Fetch release notes for Sparkle" step that converts markdown
from draft GitHub release to HTML
- Added "Inject release notes into appcast" step that embeds HTML in
appcast.xml with CDATA
- Added "Publish GitHub Release" step that attaches DMG and publishes
the draft

## Why It Works

- Sparkle's `<description>` tag supports HTML wrapped in CDATA for
rendering in update dialogs
- GitHub's markdown API (`/markdown`) converts the release notes to HTML
with proper formatting
- Draft releases allow writing polished notes before the build, then the
workflow publishes them automatically
- The workflow fails if no draft release exists, ensuring release notes
are always provided

## Test Plan

### Manual Testing
1. Create a draft GitHub release for a new tag with markdown release
notes
2. Push the tag to trigger the workflow
3. Verify the GitHub release is published with DMG attached
4. Download appcast.xml from S3 and verify
`<description><![CDATA[...]]></description>` contains HTML
5. Test Sparkle update dialog on macOS to confirm release notes appear

### Automated Testing
No automated tests added - this is CI workflow configuration.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 16:47:33 +00:00
Evan
83c5285a80 reduce logs
previous commits logs were too verbose, this tones them down a bit
2026-01-16 14:05:47 +00:00
Evan Quiney
39ee2bf7bd switch from synchronous threaded pinging to an async implementation (#1170)
still seeing churn in our networking - lets properly rate limit it

## changes

added an httpx client with max connections with a persistent AsyncClient

## testing

deployed on cluster, discovery VASTLY more stable (the only deleted
edges were those discovered by mdns)
2026-01-16 13:20:03 +00:00
Sami Khan
991adfbd6f fix local network warning (#1136)
## Motivation

Local network warning banner was showing on fresh install even though
mDNS was working. The check would fail before the user had a chance to
grant permission via the macOS prompt.

## Changes

- Added `hasWorkedBefore` flag persisted in UserDefaults
- Only show warning if permission previously worked but now doesn't

## Why It Works

On fresh install, the check may fail (no permission yet), but
`hasWorkedBefore` is false so no warning shows. Once the user grants
permission and a check succeeds, we record it. Future failures (zombie
permission after restart) will show the warning since `hasWorkedBefore`
is now true.

## Test Plan

### Manual Testing
Run locally

### Automated Testing
N/A
2026-01-16 13:10:50 +00:00
rltakashige
4b3de6b984 Fix exo bench for transformers 5.x (#1168)
## Motivation
Prompt Sizer was broken as transformers 5.x tokenizers create
BatchEncodings which are essentially a dictionary of {input_ids: []}
instead of the list of input ids.

## Test Plan

### Manual Testing
Tested that exo bench runs as expected.

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-16 12:39:22 +00:00
Evan
c8de3b90ea quiet rust logs
rust logs were too verbose - now only warnings propagate to python

entirely happy not to merge this and to clean up rust logging instead,
but this felt saner right now
2026-01-16 12:34:28 +00:00
Sami Khan
6e6567a802 resolve issue #1070 (#1076)
## Motivation

https://github.com/exo-explore/exo/issues/1070

## Changes

Added check in ChatForm.svelte to reset selectedChatModel when it no
longer matches any running instance.

## Why It Works

The $effect now detects when the selected model is stale (not in
availableModels()) and resets to the first available model.

## Test Plan

### Manual Testing

1. Create instance of Model A → Delete it → Create instance of Model B →
Chat
2. Verify request goes to Model B (not Model A)

---------

Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com>
2026-01-15 20:00:41 +00:00
rltakashige
a735dad667 Parse GPT OSS in runner (#1160)
## Motivation

Simplification of API + moving model specific code to the runner

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Test Plan

### Manual Testing
Tested that GPT OSS outputs are parsed correctly on the dashboard.

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-15 19:53:55 +00:00
rltakashige
aaf4e36bc3 FIX GPT OSS (#1165)
## Motivation

Adds several unmerged fixes for GPT OSS.
Also adds GPT OSS 20B MXFP4 Q8 instead of Q4 for numerical stability (as
this is unstable for MLX LM too)
<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->


## Test Plan

### Manual Testing
Manually tested. No further gibberish responses.

### Automated Testing
Ran EXO Bench - pipeline, tensor and single node work on both 20B and
120B models
2026-01-15 19:20:17 +00:00
Evan Quiney
3e623ccf0d up http timeout to 3 seconds and retry on BadStatusLine (#1164)
we're seeing a lot of network churn - perhaps this is a connection
timing out issue? lets also re-try after a second

## testing
none yet

---------

Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 18:15:12 +00:00
Evan Quiney
c22dad8a7d dashboard: add peer: true to package lock (#1162)
this happens every time i run npm install - lets upstream it

## testing
dashboard builds and renders
2026-01-15 17:01:43 +00:00
Evan
4bc4d50685 rust: remove dead code
the system custodian has been made unnecessary with the swift app - we
can remove it

## testing
everything still builds
2026-01-15 16:51:46 +00:00
Jake Hillion
e0aab46fd8 model_cards.py: clean up commented out code
Clean up the commented out code and make sure the comments are unified.
Carrying around the commented out code means people making changes to
model_cards are supposed to update it, but that's not clear and won't be
picked up by type checking etc. Drop it for now - it's in the git
history.

Also make the rest of the comments a bit more uniform, and place
comments about a specific model card inside the model card (instead of
above) so they don't get lost when code is added/moved around.

Test plan:
- my eyes
2026-01-15 13:21:58 +00:00
Evan Quiney
82ba42bae9 add glm-47, minimax-m21 (#1147)
Adds support glm 4.7 and MiniMax M2.1

Manual testing:
Tensor + Pipeline execution of both models.

Closes #1141 and #1142
2026-01-14 16:33:17 +00:00
Jake Hillion
3671528fa4 nix: add dashboard build with dream2nix
Continue working towards a fully Nix based build by building the
dashboard with Nix. Continuing the theme of using the existing lock
files, use dream2nix to parse the lock file and build the tree of
dependency derivations.

dream2nix doesn't like the bundleDependencies, so we apply a small patch
to the lock file that drops all dependencies that are bundled. This
should ideally be contributed upstream but that can be done later.

Use this new dashboard build in the build-app CI workflow, meaning
future macOS apps will include this reproducible dashboard.

Test plan:
- Built a DMG, shipped to a cluster, loaded in a browser with no cache
  and the dashboard looks good.

- Directory layout is as expected:
```
$ nix build .#dashboard
$ find result/
...
result/_app/immutable/entry
result/_app/immutable/entry/app.CTPAnMjf.js
result/_app/immutable/entry/start.fUSEa-2O.js
result/_app/immutable/nodes
result/_app/immutable/nodes/3.DqQr1Obm.js
result/_app/immutable/nodes/0.DgEY44RO.js
result/_app/immutable/nodes/2.BjZg_lJh.js
result/_app/immutable/nodes/1.D6vGUYYT.js
result/_app/env.js
result/_app/version.json
result/exo-logo.png
result/favicon.ico
result/index.html
```
2026-01-14 15:58:16 +01:00
Jake Hillion
e6434ec446 nix: add Rust builds with crane and fenix
The Rust workspace lacked Nix build support, making it difficult to
build packages reproducibly or run checks in CI.

Added a flake-parts module at rust/parts.nix that uses crane for Rust
builds and fenix for the nightly toolchain. The source filter isolates
rust/ and root Cargo files to prevent Python/docs changes from
triggering Rust rebuilds. Exports packages (system_custodian,
exo_pyo3_bindings wheel, exo-rust-workspace) and checks (cargo-nextest,
cargo-doc) for all three target platforms.

The devShell now uses inputsFrom to inherit build dependencies from the
workspace package, removing the need for manual pkg-config/openssl setup.

Test plan:
- Ran `nix flake check` successfully
- Built `nix build ".#checks.x86_64-linux.cargo-nextest"` and tests pass
- Built `nix build ".#exo_pyo3_bindings"` and wheel is produced
2026-01-14 11:52:29 +00:00
Jake Hillion
bdb43e1dbb nix: drop noisy echos from devshell
Drop all the printing when entering a devshell. It's annoying, and not a
super accurate description of how to develop exo anyway.
2026-01-14 10:04:57 +00:00
Jake Hillion
e4a01e2b0e chore(deps): nix lock file maintenance
Update nix flake inputs. Add a second input as Swift is currently broken
in nixpkgs on Linux for `swift-format` as we want `nix fmt` to continue
being reproducible everywhere.
2026-01-13 19:57:14 +01:00
Evan Quiney
1200a7db64 Add tensor sharding for GPT-OSS (#1144)
## Motivation

GPT OSS did not previously support tensor sharding

## Changes

Add GPT sharding support in tensor_auto_parallel.
Code is mostly @rltakashige's

## Test Plan

### Manual Testing
Tested GPT-OSS - MLX Fast Sync causes issues in Tensor RDMA - this is a general problem at the moment.
2026-01-13 17:25:52 +00:00
Evan Quiney
47ceb54bc1 up the rlimit (#1148)
Fixes #1117 

Manual testing:
Launched 100 instances. worked. yay.
2026-01-13 15:00:54 +00:00
Jake Hillion
f8112fdf25 nix: convert to flake-parts
Preparing to add a flake-parts module for Rust builds. The flake-utils
library doesn't support the module system needed for cleanly separating
the Rust build configuration.

Converted from flake-utils to flake-parts, switching to the treefmt-nix
flakeModule import pattern. The devShell and formatter outputs remain
functionally equivalent.

Test plan:
- Ran `nix flake check` successfully
- Verified `nix develop` provides the same environment
2026-01-13 15:06:44 +01:00
Alex Cheema
e388f59480 docs: add AGENTS.md for AI coding agents guidance (#1132)
## Motivation

Add documentation to help AI coding agents (Claude Code, Cursor, GitHub
Copilot, etc.) understand the exo codebase and contribute effectively.

## Changes

- Add `AGENTS.md` with guidance for AI agents working on the codebase
- Add symlink `CLAUDE.md -> AGENTS.md` for backwards compatibility with
Claude Code

## Why It Works

`AGENTS.md` is becoming a standard convention for AI agent instructions.
The symlink ensures Claude Code (which looks for `CLAUDE.md`) continues
to work while supporting the broader `AGENTS.md` convention.

## Test Plan

### Manual Testing
- Verified symlink works correctly

### Automated Testing
- N/A (documentation only)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 13:05:47 +00:00
Alex Cheema
e5e74e1eef Upgrade mlx-lm to 0.30.2 with transformers 5.x compatibility (#1125)
## Motivation

Upgrade mlx-lm to version 0.30.2 which requires transformers 5.0.0rc2 as
a prerelease dependency. This enables support for newer models like Kimi
K2 Thinking while maintaining compatibility with existing models.

The transformers 5.x release includes breaking changes that affect
custom tokenizers like Kimi's TikTokenTokenizer, requiring compatibility
fixes.

## Changes

### Core Changes
- **mlx-lm upgrade**: Bump to 0.30.2 with locked exact versions for
mlx/mlx-lm to prevent breaking changes
- **transformers 5.x compatibility**: Enable prerelease transformers
dependency

### Kimi K2 Tokenizer Fixes
- Add `bytes_to_unicode` monkey-patch to restore function moved in
transformers 5.0.0rc2
- Load `TikTokenTokenizer` directly instead of via `AutoTokenizer` to
bypass transformers 5.x bug with `auto_map` fallback
- Patch `encode()` to use tiktoken directly with `allowed_special="all"`
to handle special tokens from chat templates

### Other Changes
- Dashboard: Show disk usage for completed model downloads
- CI: Add `workflow_dispatch` trigger to build-app workflow
- Docs: Add basic API documentation

### Testing
- Add comprehensive tokenizer unit tests for all supported models
- Tests verify encode/decode, special token handling, and chat template
encoding

## Why It Works

**bytes_to_unicode issue**: transformers 5.0.0rc2 moved
`bytes_to_unicode` from `transformers.models.gpt2.tokenization_gpt2` to
`transformers.convert_slow_tokenizer`. Kimi's `tokenization_kimi.py`
imports from the old location. The monkey-patch restores it at module
load time.

**AutoTokenizer issue**: transformers 5.x has a bug where
`tokenizer_class_from_name('TikTokenTokenizer')` returns `None` for
custom tokenizers with `auto_map`. Loading the tokenizer directly
bypasses this.

**encode() issue**: transformers 5.x's `pad()` method fails for slow
tokenizers. Using tiktoken's encode directly with
`allowed_special="all"` avoids this path and properly handles special
tokens like `<|im_user|>` from chat templates.

## Test Plan

### Manual Testing
- Hardware: 2x Mac Studios connected via Thunderbolt 5 (mike22 and
james21)
- Tested Kimi K2 Thinking, GPT-OSS-120B, GPT-OSS-20B, LLama-3.1-8B-bf16, qwen3-30B-A3B-8bit model with pipeline parallelism across both
nodes
- Verified warmup inference completes successfully
- Verified chat completions work with special tokens

### Automated Testing
- Added `test_tokenizers.py` with 31 tests covering:
- Basic encode/decode for all model families (deepseek, kimi, llama,
qwen, gpt-oss, glm)
  - Special token encoding (critical for chat templates)
  - Chat template application and encoding
  - Kimi-specific and GLM-specific edge cases
- All tests pass: `uv run pytest
src/exo/worker/tests/unittests/test_mlx/test_tokenizers.py`

### Failing Tests
RDMA with all models.

---------

Co-authored-by: Evan <evanev7@gmail.com>
2026-01-13 12:06:04 +00:00
Jake Hillion
b968d6f0a0 ci: remove old commented out job 2026-01-13 12:42:04 +01:00
Jake Hillion
3bfffd9b4f ci: build all Nix outputs on all platforms and push to cachix
The CI was only running `nix flake check` on ubuntu-latest, missing
builds for other platforms and not caching packages or devShells.

Added a matrix-based `nix-build` job that runs on macos-26 (aarch64-darwin),
ubuntu-latest (x86_64-linux), and ubuntu-24.04-arm (aarch64-linux). Each
job enumerates all packages and devShells via `nix flake show --json`,
builds them in a single `nix build` call for parallelization, then runs
`nix flake check`. The cachix-action pushes all built outputs automatically.

This ensures all Nix outputs are built and cached for every supported
platform, speeding up local development and CI runs.

Test plan:
- Tested jq enumeration command locally, correctly outputs devShell paths
- Verified xargs pipeline works with the enumerated outputs
2026-01-13 12:37:12 +01:00
Jake Hillion
007eb80029 nix: enable cachix
Enable cachix and push to it in the pipeline.yml workflow. This won't
cache a huge amount yet but will automatically extend our caching as we
build more of the repo with Nix in CI. It can also be used by local
users by accepting our cache to improve the speed of local builds.

Test plan:
- CI
2026-01-12 17:24:59 +01:00
Jake Hillion
8d7b6789b3 dashboard: show disk usage for completed models
The downloads dashboard showed "Completed" for finished model downloads
but provided no indication of how much disk space each model or the
total models on a node were using.

Added total_bytes field to DownloadCompleted type so the size is
preserved when a download completes. Updated the dashboard to display
the model size next to "Completed" status (e.g., "Completed (251.1GB)")
and a total disk usage line below the model count for each node (e.g.,
"502.2GB on disk").

Test plan:
- Ran unit tests for download apply and planning logic
- Type checked all modified files with basedpyright
2026-01-12 16:34:29 +01:00
Jake Hillion
3c5b7ea670 ci: add workflow_dispatch trigger to build-app
Build app is the most convenient way to get a DMG for testing, but
currently it's a bit limited. You have to push to test-app every time
which is far from ideal and requires a bit too much force pushing for my
liking.

Add the workflow_dispatch trigger. This adds a button in the actions UI
to trigger a workflow for a named branch, which means you can use your
normal dev branch instead of having to push to test-app. We'll leave
that behaviour there for now too, though it may change in future.

Filter on `"${{ github.event_name }}" == "workflow_dispatch"` and set
those to alpha as well. Will verify by pushing the first version from
`main` just in case. Unfortunately we do have to merge this before we
can test it.

Test plan:
- Looking really hard.
2026-01-12 12:14:21 +01:00
PG
b74a610537 Add a basic documentation to the api interface (#1122)
## Motivation

Adds basic api documentation

## Changes

- Add docs/api.md
- Modify README.md
2026-01-11 18:44:40 +00:00
Jake Hillion
18c4e49f91 nix: put treefmt in devshell
treefmt is a useful to be able to access directly for some formatters like
`jj fix`. Expose it in the devshell.

Test plan:
- Used with `jj fix` on a large branch. It worked.
2026-01-09 17:53:50 +01:00
Sami Khan
d85b5d3781 feat: uninstall button (#1077)
## Motivation

https://github.com/exo-explore/exo/issues/1075

## Changes

- Added in-app "Uninstall" option under Advanced menu that cleanly
removes all system components
- Added NetworkSetupHelper.uninstall() to remove LaunchDaemon, scripts,
logs, and restore network settings
- Added LaunchAtLoginHelper.disable() to unregister from login items
- Created standalone uninstall-exo.sh script for users who already
deleted the app
- Added uninstall documentation to README

<img width="386" height="577" alt="image"
src="https://github.com/user-attachments/assets/6bbcd18a-992a-409d-8791-ed5e13bbcfe0"
/>
<img width="372" height="432" alt="image"
src="https://github.com/user-attachments/assets/ee76b45d-c111-4807-ab28-3f2f20e01140"
/>


## Why It Works

The in-app uninstaller runs a privileged shell script (via AppleScript)
to launchctl bootout the daemon, remove files, and restore the
"Automatic" network location. The standalone script provides the same
cleanup for users who already deleted the app.

## Test Plan

### Manual Testing
Hardware: MacBook Pro
- Built and ran app, verified LaunchDaemon and network location were
created
- Used in-app Uninstall, verified all components removed and network
restored to Automatic
- Rebuilt app, quit normally, ran sudo ./uninstall-exo.sh, verified same
cleanup

### Automated Testing
N/A

---------

Co-authored-by: Evan <evanev7@gmail.com>
2026-01-09 14:49:08 +00:00
Evan Quiney
caafc48693 Forward tools to the models chat template properly (#1106)
We did not properly forward tools to the chat template before. This is not a full tool calling impl - but it should improve things slightly.

## Changes made

Pass tools to the hf tokenizers chat template
Join message chunks into a larger message (opencode does this sometimes - we were ignoring before)

## Future work

We need to parse the model output and normalise the return format to be compatible with the openai api.
2026-01-09 13:28:41 +00:00
Evan
cca8c9984a cleanup unused dependencies
we have a lot of dependencies we have no intent of using. kill them with
fire!

## testing
exo still launches and does the worst inference known to man on my Qwen3
instance. tests pass too!!
2026-01-09 13:11:58 +00:00
Sami Khan
d1e88def42 scrollbars fixed (#1113)
## Motivation

Fixes https://github.com/exo-explore/exo/issues/1107 - Horizontal
scrollbar always appears in instances section, and vertical scrollbar
appears too early (with just 1-2 instances on large screens).


## Changes

- Added overflow-x-hidden to remove horizontal scrollbar
- Added xl:max-h-96 for responsive vertical height (384px on xl+ screens
vs 288px default)
- Added py-px to accommodate corner accent decorations that extend 1px
outside cards

## Why It Works

- overflow-x-hidden prevents horizontal scroll regardless of content
- Larger max-height on xl screens fits 2 instances without scrollbar;
3rd triggers it
- 1px vertical padding accommodates the -top-px/-bottom-px positioned
corner accents that caused tiny overflow

## Test Plan

### Manual Testing
<img width="1190" height="868" alt="image"
src="https://github.com/user-attachments/assets/2a582328-5b4f-4490-a488-52106f2e85ef"
/>

### Automated Testing
N/A
2026-01-09 12:51:05 +00:00
Sami Khan
59e7594e34 UNKNOWN to PREPARING (#1112)
## Motivation

The "UNKNOWN" status shown when first launching an instance is confusing
and unhelpful. "PREPARING" better describes what's actually happening.

![telegram-cloud-photo-size-4-5981245965962251168-x](https://github.com/user-attachments/assets/65b0802b-fb64-4fa7-bff7-c13757035b3a)


## Changes

- Renamed status from "UNKNOWN" to "PREPARING" in dashboard
(+page.svelte)
- Renamed unknown state to preparing in macOS app
(InstanceViewModel.swift, InstanceRowView.swift)

## Why It Works

The status appears when an instance exists but runners haven't reported
status yet. "PREPARING" accurately describes this transitional state.

## Test Plan

### Manual Testing
Hardware: MacBook Pro
<img width="319" height="200" alt="image"
src="https://github.com/user-attachments/assets/9a1c3caf-026d-47ea-80d1-63c6e41d93aa"
/>

### Automated Testing
N/A
2026-01-09 11:46:51 +00:00
Chris A
c65320acd3 Fix mlx seed (#1094)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>
2026-01-09 01:40:15 +00:00
Jake Hillion
b9a78f6f3a ci: compute CURRENT_PROJECT_VERSION from semver
Previous Sparkle builds were cut from a different repo with different
build numbers, breaking version ordering. Users aren't receiving updates
because CFBundleVersion values don't reflect the actual version sequence.

Added a step to compute the build version deterministically from semver:
PRERELEASE + (1000 * PATCH) + (1_000_000 * MINOR) + (1_000_000_000 * MAJOR).
Release versions use prerelease=999 to ensure they're always higher than
their prereleases (e.g., 1.0.61 > 1.0.61-alpha.3).

This ensures consistent version ordering across repos, allowing Sparkle
to correctly identify and deliver updates to users.

Test plan:
- Verified formula with test script:

```sh
compute_version() {
  VERSION="$1"
  BASE_VERSION="${VERSION%%-*}"
  MAJOR=$(echo "$BASE_VERSION" | cut -d. -f1)
  MINOR=$(echo "$BASE_VERSION" | cut -d. -f2)
  PATCH=$(echo "$BASE_VERSION" | cut -d. -f3)

  if [[ "$VERSION" == *-* ]]; then
    PRERELEASE_PART="${VERSION#*-}"
    PRERELEASE_NUM="${PRERELEASE_PART##*.}"
    if ! [[ "$PRERELEASE_NUM" =~ ^[0-9]+$ ]]; then
      PRERELEASE_NUM=0
    fi
  else
    PRERELEASE_NUM=999
  fi

  BUILD_VERSION=$((PRERELEASE_NUM + 1000 * PATCH + 1000000 * MINOR + 1000000000 * MAJOR))
  printf "%-20s -> %12s\n" "$VERSION" "$BUILD_VERSION"
}

compute_version "1.0.61-alpha.2"
compute_version "1.0.61-alpha.3"
compute_version "1.0.61"
compute_version "1.0.62-alpha.1"
compute_version "1.1.0-alpha.1"
compute_version "2.0.0-alpha.1"
compute_version "0.0.0-alpha.0"
compute_version "0.0.1-alpha.1"
compute_version "1.2.3"
compute_version "1.2.3-beta.5"
```

- Output:

```sh
Version              -> Build Number
----------------------------------------
1.0.61-alpha.2       ->   1000061002
1.0.61-alpha.3       ->   1000061003
1.0.61               ->   1000061999
1.0.62-alpha.1       ->   1000062001
1.1.0-alpha.1        ->   1001000001
2.0.0-alpha.1        ->   2000000001
0.0.0-alpha.0        ->            0
0.0.1-alpha.1        ->         1001
1.2.3                ->   1002003999
1.2.3-beta.5         ->   1002003005
```

- Confirmed ordering: alpha.2 < alpha.3 < release < next-alpha
2026-01-08 19:52:33 +01:00
Jake Hillion
8f7f0e893a ci: avoid uploading alpha appcasts
Currently alpha appcasts get uploaded. It turns out these overwrite the
standard appcast, so even though no one will update to the alpha
channel, everyone will miss regular updates while the latest build was
an alpha one.

Ideally we should combine the source of truth for both the alpha and
release channels, but as no one is using the alpha channel for yet let's
stop uploading it for now.

Test plan:

![eyes](https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExeGNwdDk0dmdscjlkZnd6eGxhcjJzdDBsYndmc2t2cnlpZDNxZnZhYSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/gKHGnB1ml0moQdjhEJ/giphy.gif)
2026-01-08 18:52:10 +01:00
Alex Cheema
4759b09d4c Use presigned URLs for bug report uploads (#1109)
## Motivation

Previously we hardcoded AWS credentials into the app.
This is not good practice.

## Changes

Use presigned URLs instead.

## Why It Works

Presigned URLs are an S3 feature for this kind of thing. They provide an
expiring presigned URL with certain permissions. In this case we have a
presigned URL with `s3:PutObject` permission that expires after 5
minutes. The client uses this presigned URL to upload a bug report
instead of using its own credentials to sign a request. This also
simplifies a lot of the Swift code.

## Test Plan

### Manual Testing
On a single MacBook, I downloaded the app and sent a bug report. It
worked and appeared in the bucket.
2026-01-08 17:17:48 +00:00
Alex Cheema
ca680185f3 Display RDMA debug info in macOS app. (#1072)
## Motivation

Often users are running into issues with RDMA. See
https://github.com/exo-explore/exo/issues?q=is%3Aissue%20rdma
Having some debug info in the macOS app will help to debug these issues.

## Changes

Displays output of the following commands in the debug info section of
the macOS app:

1. `rdma_ctl status`
2. `ibv_devices`
3. `ibv_devinfo`

## Why It Works

It displays RDMA debug info in the debug info section of the macOS app.

## Test Plan

### Manual Testing
We need to make a new build of the macOS app and check the output under
the following conditions:

1. No RDMA enabled.
2. RDMA enabled but no devices connected over TB5.
3. RDMA enabled and devices connected over TB5.
2026-01-08 15:17:00 +00:00
Jake Hillion
383309e24e fmt: add typescript formatting
Add typescript auto formatting with Prettier and treefmt-nix. Added a
.prettierrc to useTabs, which isn't the default, to reduce churn. The
rest looks okay and will be checked by CI.

Test plan:
- CI
2026-01-08 13:47:27 +00:00
Jake Hillion
55463a9806 fmt: add swift formatting
Swift code currently has no auto formatting. Add `swift-format` to the
`treefmt-nix` config to get this formatted.

As our existing Swift code uses 4-space formatting instead of the
default 2-space, also adds a custom `.swift-format

Test plan:
- CI
2026-01-08 13:34:45 +00:00
Evan Quiney
56af61fac9 add a server for distributed testing in /tests until we work out a stable solution. (#1098)
## Motivation

Testing multiple devices simultaneously requires coordination, and we
don't necessarily want to run a full EXO to test single components. We
need a mid-scale integration testing framework for distributed tests.

## Changes

Add a simple python server + bash query that runs Jaccl and Ring tests
without constructing a worker/master/networking. The query relies on all
devices being accessible over tailscale, currently.

## Test Plan

Manually tested RDMA + Ring inference on 2 nodes.
2026-01-08 12:50:04 +00:00
Evan Quiney
f76d543d98 We shouldn't fail on an HTTPException in the tier-2 discovery system. (#1104)
## Motivation

Fixed a crash we found

## Changes

try/catch return None if we get an exception instead of crashing exo

## Test Plan

### Manual Testing
Exo launches. Couldn't repro the original case this arose.
2026-01-08 12:43:34 +00:00
Sami Khan
ea841aca37 local network check (#1103)
## Motivation

After machine restart, macOS local network permission can appear enabled
in System Settings but not actually work. EXO fails to discover other
machines, and the only fix is manually toggling the permission off/on
and relaunching. Users had no way to know this was happening.

## Changes

- Added LocalNetworkChecker service that detects if local network access
is actually functional
- Added warning banner with instructions and "Open Settings" button when
blocked
- Added NSLocalNetworkUsageDescription and NSBonjourServices to
Info.plist (required by macOS)

<img width="386" height="712" alt="image"
src="https://github.com/user-attachments/assets/c6fc873d-2c6a-4c9b-89cb-f7bc7322e25b"
/>

## Why It Works

Uses NWConnection to UDP multicast address 224.0.0.251:5353 (mDNS),
which is subject to the app's actual TCC permission state. Other
approaches (NWBrowser, dns-sd subprocess) either require additional
entitlements or run with their own permissions, giving false results.

## Test Plan

### Manual Testing
Hardware: MacBook Pro
  - Toggle local network OFF in System Settings → warning banner appears
  - Toggle local network ON → warning disappears
  - Verified detection correctly reflects actual permission state

### Automated Testing
N/A
2026-01-08 12:24:46 +00:00
rltakashige
077b1bc732 exo-bench (Benchmark model pp & tg speed) (#1099)
## Motivation

This PR implements benchmarking in the style of llama-bench. The main
difficulty here is the fact that exo is not a library - it exposes an
endpoint. This means that benchmarking numbers will be inaccurate if the
API is measured.

The solution assumes nodes are set up with uv run exo (or via the app),
and then hits the new endpoint /bench/chat/completions to retrieve
generation statistics directly from mlx_lm.
<!-- Why is this change needed? What problem does it solve? -->

This will allow us to release benchmarks for models and perform
regression tests.

TODO: Performance benchmarking.
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->
- Adds /bench/chat/completions endpoint
- Adds BenchChatCompletion/Response
- Adds a logits processor to prevent response from ending early
- Adds a "Prompt Sizer" which downloads the tokenizer and dynamically
adjusts the prompt of "a" to fit the desired prompt size.
- Reduce prefill step size to 2048 for now (in future, dynamically
adjust this value)

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->
Benchmarked Llama, Qwen, DeepSeek and Kimi models. Will require several
fixes to run consistently on all configurations (to be done in the
future).
Manually tested the normal API to verify chat requests complete as
expected.

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
Not really possible. Type checker passes.
2026-01-06 17:39:09 +00:00
Alex Cheema
4963c33162 Fix Discord link in README.md. Fixes #1096 (#1097)
## Motivation

Discord link expired.

## Changes

Replace discord invite link with permanent link.

## Why It Works

It's permanent now.

## Test Plan

Clicked the link. It works.
2026-01-06 14:05:09 +00:00
madanlalit
4f6fcd9e93 feat(macos-app): add custom namespace UI for cluster isolation
Add Advanced Options section with custom namespace field that allows
users to override EXO_LIBP2P_NAMESPACE environment variable. This
enables splitting machines that can see each other into separate
clusters.

- Added customNamespace property with UserDefaults persistence
- Added Advanced Options collapsible section with text field
- Added Save & Restart button that auto-restarts exo process
- Namespace replaces buildTag when custom value is set
- Falls back to buildTag (version) when namespace is empty
2026-01-05 15:25:00 +01:00
Evan Quiney
839b67f318 [feat] Add an option to disable the worker (#1091)
## Motivation

Workerless machines can be used for networking without running any gpu
jobs - add a cli flag that adds this basic functionality.

## Changes

Adds the --no-worker cli flag

## Test Plan

### Manual Testing

Exo starts as expected

### Automated Testing

None
2026-01-05 12:05:03 +00:00
Drifter4242
47b8e0ce12 feat: remember last launch settings (model, sharding, instance type) (#1028)
## Motivation

Saves the last launch settings, so that the next time you run exo it
will default to the same launch settings.
This is just a small quality of life improvement.

## Changes

When you launch it saves the settings to the web browser local storage.
When it fills out the model list, it reads the settings and sets the
default.

I reviewed, tested and edited the code, but some of the code was written
by Claude Opus. I hope that's ok.

## Why It Works

See above

## Test Plan

### Manual Testing

I have two Macbook Studio M3 Ultras, each with 512Gb ram, connected with
Thunderbolt 5. I ran Kimi K2 Thinking with MLX Ring and Tensor Split.
I ran exo multiple times to confirm that the default works.

### Automated Testing

No changes to automated testing.
2026-01-05 11:27:14 +00:00
Evan Quiney
17f9b583a4 Task Deduplication (#1062) 2026-01-03 20:01:49 +00:00
RickyChen / 陳昭儒
844bcc7ce6 fix: prevent form submission during IME composition (#1069)
## Problem
When typing in Chinese (or other IME-based languages like
Japanese/Korean), pressing Enter to select a character from the IME
candidate list would incorrectly submit the message instead of
confirming the character selection.

## Solution
Added IME composition state detection in the `handleKeydown` function in
`ChatForm.svelte`:
- Check `event.isComposing` to detect active IME composition
- Fallback to `event.keyCode === 229` for broader browser compatibility
- Return early when IME is active, allowing normal character selection

## Changes
- Modified `dashboard/src/lib/components/ChatForm.svelte` 
- Added IME composition check before Enter key handling

Co-authored-by: Ricky Chen <rickychen@Rickys-MacBook-Pro.local>
2025-12-31 17:11:04 +00:00
Evan Quiney
c1be5184b2 Fix tests broken by 283c (#1063)
Some tests were broken by #1058 and #1046 - this fixes them.
2025-12-31 01:53:55 +00:00
Alex Cheema
1ec550dff1 Emit download progress on start, and change downloads to be keyed by model_id (#1044)
## Motivation

We added a download page to the dashboard which shows the currently
download status of each model on each node. Users have reported this to
be extremely useful.

However, we don't currently fetch the download progress on start, so it
doesn't show any model's download status.

## Changes

Fetch and emit model download status on start of worker, and
periodically every 5 mins.
Also to support this, I changed download_status to be keyed by model_id
instead of shard, since we want download_status of each model, not each
shard.

## Why It Works

The dashboard already implements the correct functionality, we just
weren't populating the download status in the state. Now it gets
populated and shows correctly.

## Test Plan

### Manual Testing
On a cluster of 2 x 512GB M3 Ultra Mac Studio, I launched an instance
onto one node that hadn't been downloaded. I checked the download page
and it showed the in progress download. I downloaded it to completion,
restarted exo on both nodes, and then opened the download page and it
showed the model as 100% downloaded and other models as 0% that hadn't
been downloaded.

---------

Co-authored-by: Evan <evanev7@gmail.com>
2025-12-31 01:18:10 +00:00
Alex Cheema
283c0e39e4 Placement filters for tensor parallel supports_tensor, tensor dimension and pipeline parallel deepseek v3.1 (#1058)
## Motivation

Certain placements are not valid. Added filters to exclude these placements. There were invalid placement previews being shown in the dashboard which would then fail when the user actually tries to launch an instance with that placement.


## Changes

Three filters added:

1. Certain models do not support tensor parallel at all. Checks `supports_tensor` on the model_meta.
2. For models that do support tensor parallelism, certain tensor parallel sizes are not valid. This check is actually not correct right now but it works fine for now. The actual correct check is more involved.
3. For unknown reasons, deepseek v3.1 (8-bit) does not work with tensor parallelism.

## Why It Works

`place_instance` now raises an `Exception` for invalid placements.

## Test Plan

### Manual Testing
Since `/instance/previews` enumerates all possible placements and runs `place_instance`, I checked the dashboard to see if invalid placements are still shown.
2025-12-31 00:33:40 +00:00
Alex Cheema
35be4c55c3 prioritise mlx jaccl coordinator ip (en0 -> en1 -> non-TB5 -> other) 2025-12-31 00:10:19 +00:00
Alex Cheema
31d4cd8409 set KV_CACHE_BITS to None to disable quantized kv cache 2025-12-31 00:03:30 +00:00
Alex Cheema
8a6da58404 remove mx.set_cache_limit 2025-12-30 23:58:15 +00:00
Alex Cheema
16e2bfd3b3 log EXO_LIBP2P_NAMESPACE on start 2025-12-30 04:08:47 +00:00
Alex Cheema
ade3ee7ec5 fix warmup order. should be rank!=0 then rank=0 2025-12-30 03:29:34 +00:00
Evan Quiney
fea42473dd Place local node at the top of the dashboard. (#1033)
@samiamjidkhan and @AlexCheema's work moving the topology to place the
local node at the top of the topology in the app dashboard.
2025-12-28 21:12:47 +00:00
Alex Cheema
ca7adcc2a8 Update README.md with instructions to enable RDMA. (#1031)
## Motivation

We didn't have instructions for enabling RDMA on macOS.

## Changes

I added instructions for enabling RDMA on macOS.

## Why It Works

Tried it on my M4 Max MacBook Pro and works.

## Test Plan

### Manual Testing
Tried it on my M4 Max MacBook Pro and works.

### Automated Testing
In the future, we could automate this from fresh macOS builds using KVM
over IP. See #1030
2025-12-28 20:56:26 +00:00
Evan Quiney
9d9e24f969 some dashboard updates (#1017)
Mostly @samiamjidkhan and @AlexCheema's work in progress.

---------

Co-authored-by: Sami Khan <smsak99@gmail.com>
Co-authored-by: Alex Cheema
2025-12-28 20:50:23 +00:00
Jake Hillion
b5d424b658 placement: generate per-node host lists for MLX ring backend
Pipeline + MLX Ring worked with 2 nodes but failed to initialize with
3 or more nodes. The MLX ring backend requires each node to know its
specific left and right neighbors in the ring, but the previous
implementation provided a single flat host list shared by all nodes.

With 2 nodes, a flat list [host0, host1] accidentally worked because
each node could find its only neighbor. With 3+ nodes, each node needs
a customized view:
- Rank 0: [self, right_neighbor, placeholder]
- Rank 1: [left_neighbor, self, right_neighbor]
- Rank 2: [placeholder, left_neighbor, self]

Changed MlxRingInstance from `hosts: list[Host]` to
`hosts_by_node: dict[NodeId, list[Host]]` with `ephemeral_port: int`.

Added `get_mlx_ring_hosts_by_node()` which generates per-node host
lists where:
- Self position uses 0.0.0.0 for local binding
- Left/right neighbors use actual connection IPs
- Non-neighbors use 198.51.100.1 (RFC 5737 TEST-NET-2 placeholder)

Also added IP prioritization (en0 > en1 > non-Thunderbolt > any) to
prefer stable network interfaces.

Fixed topology discovery recording loopback addresses (127.0.0.1) as
valid connections to remote nodes. The reachability check now verifies
node identity via HTTP GET /node_id rather than just checking if the
port is open.

Test plan:

- Built a DMG [0]
- Installed on all Macs and started cluster.
- Requested a 3 node Pipeline + MLX Ring Llama 3.3 70B (FP16).
- It started and I was able to send a few chat messages.

Eventually my instance seemed to get into a broken state and chat
stopped working, but this commit is a clear step forward.

[0] https://github.com/exo-explore/exo/actions/runs/20473983471/job/58834969418
2025-12-28 20:38:20 +00:00
Drifter4242
b465134012 Fix Kimi K2 Thinking download by adding tiktoken.model to download patterns (#1024)
Kimi-K2 Thinking uses tiktoken.model for its tokenizer, which wasn't
being downloaded. This adds it to the default_patterns alongside
tokenizer.model.
I'm a bit confused why this isn't a problem for other people - I know
that others have used Kimi K2 (I wonder if they manually fixed the
download).

## Motivation

I downloaded Kimi K2 Thinking and it didn't work because it didn't
download tiktoken.model file.

## Changes

Added tiktoken.model to the default patterns.

## Why It Works

Now downloads the file.

## Test Plan

### Manual Testing

I have two Macbook Studio M3 Ultras, each with 512Gb ram, connected with
Thunderbolt 5. I ran Kimi K2 Thinking with MLX Ring and Tensor Split. It
ran successfully.

### Automated Testing
No automated test changes. I don't think they are needed.
2025-12-28 19:30:31 +00:00
Matiwos Kebede
eabdcab978 Fix linux docs (#1022)
This PR updates the "Run from Source (Mac & Linux)" section in README.md
to clarify Linux instructions.

Changes include:
- Split the section into macOS and Linux subsections.
- Added native Linux package manager commands (apt, dnf, pacman) for
dependencies: uv, node, npm.
- Clarified that macmon is macOS-only.
- Noted that Homebrew on Linux is optional, with native package managers
preferred.

These changes improve clarity for Linux users and fix confusion from the
previous macOS-centric instructions.
2025-12-27 19:56:44 +00:00
Evan Quiney
8e9332d6a7 Separate out the Runner's behaviour into a "connect" phase and a "load" phase (#1006)
## Motivation

We should ensure all runners are connected before loading the model -
this gives us finer grained control in the future for the workers
planning mechanism over the runners state.

## Changes

- Introduced task ConnectToGroup, preceeding LoadModel
- Introduced runner statuses Idle, Connecting, Connected
- Separated out initialize_mlx from shard_and_load
- Single instances never go through the connecting phase

## Test Plan

# Automated Testing
Added a test for checking event ordering in a standard workflow.

# Manual testing
Tested Llama 3.2 1b and Kimi K2 Thinking loads and shuts down repeatedly
on multiple configurations.
Not exhaustive, however.

---------

Co-authored-by: rltakashige <rl.takashige@gmail.com>
2025-12-27 16:28:42 +00:00
Heath Dutton🕴️
4b65d5f896 Fix race condition in mlx_distributed_init with concurrent instances (#1012)
## Motivation

Fixes #1005

When multiple instances initialize concurrently with the same rank, they
overwrite each other's coordination files (hosts_{rank}.json), causing
"[jaccl] Malformed device file" errors and initialization failures.

## Changes

- Changed coordination filename from `./hosts_{rank}.json` to
`./hosts_{instance_id}_{rank}.json` to make it unique per instance
- Added cleanup in a finally block to remove coordination files after
initialization completes
- Applied fix to both MlxRingInstance and MlxJacclInstance cases

## Why It Works

Each instance now gets a unique coordination file based on its
instance_id, preventing concurrent instances from overwriting each
other's files. The cleanup logic ensures files are removed after use,
preventing accumulation and handling both success and failure cases.

## Test Plan

### Manual Testing
Code review and logic verification. The fix prevents the race condition
by ensuring filename uniqueness per instance.

### Automated Testing
No new tests added. Existing tests continue to pass.

---------

Co-authored-by: Ryuichi Leo Takashige <rl.takashige@gmail.com>
2025-12-27 16:13:26 +00:00
Jake Hillion
1c1792f5e8 mlx: update to 0.30.1 and align coordinator naming with MLX conventions
The Jaccl distributed backend requires MLX 0.30.1+, which includes the
RDMA over Thunderbolt support. The previous minimum version (0.29.3)
would fail at runtime with "The only valid values for backend are
'any', 'mpi' and 'ring' but 'jaccl' was provided."

Bump MLX dependency to >=0.30.1 and rename ibv_coordinators to
jaccl_coordinators to match MLX's naming conventions. This includes
the environment variable change from MLX_IBV_COORDINATOR to
MLX_JACCL_COORDINATOR.

Test plan:

Hardware setup: 3x Mac Studio M3 Ultra connected all-to-all with TB5

- Built a DMG [0]
- Installed on all Macs and started cluster.
- Requested a 2 node Tensor + MLX RDMA instance of Llama 3.3 70B (FP16).
- It started successfully.
- Queried the chat a few times. All was good. This didn't work
  previously.
- Killed the instance and spawned Pipeline + MLX Ring Llama 3.3 70B (FP16).
  Also started succesfully on two nodes and could be queried.

Still not working:
- Pipeline + MLX Ring on 3 nodes is failing. Haven't debugged that yet.

[0] https://github.com/exo-explore/exo/actions/runs/20467656904/job/58815275013
2025-12-24 16:47:01 +00:00
Jake Hillion
9afc1043ef exo: handle -c flag for multiprocessing helpers in frozen apps
When Python's multiprocessing spawns child processes on macOS (using the
"spawn" method), it also spawns helper processes like the resource tracker
by executing:

    ./frozen_app -c "from multiprocessing.resource_tracker import main; main()"

A frozen PyInstaller app doesn't understand `-c` natively - it just runs
main(). This causes the resource tracker to fail silently.

This adds a minimal `-c` handler that intercepts the flag, extracts the
inline code, and exec()s it before main() runs. This is required for the
Process() spawn in runner_supervisor.py to work correctly in the DMG.

Note that the pyinstaller docs say `freeze_support` is supposed to make
this work, but it doesn't.

Test plan:

Hardware setup: 3x Mac Studio M3 Ultra connected all-to-all with TB5

- Built a DMG[0].
- Installed on the Macs.
- Started an instance. Got an error this time in ~/.exo/exo.log. The
  last DMG from main doesn't show anything when an instance starts, this
  now shows the errors.

[0] https://github.com/exo-explore/exo/actions/runs/20464409279/job/58804485197
2025-12-23 17:08:50 +00:00
Evan Quiney
70c423f5e0 feat: conform to XDG Base Directory Specification on Linux (#988)
This is an extension of #964 with some cleanup.

---------

Co-authored-by: majiayu000 <1835304752@qq.com>
2025-12-23 17:02:55 +00:00
Jake Hillion
a24bdf7680 exo: enable multiprocessing support in PyInstaller bundles
Model loading fails silently when running from the DMG-packaged app,
despite working correctly with `uv run exo`. The bundled app spawns
child processes for model inference via multiprocessing, but these
processes fail to start in a frozen (PyInstaller) environment.

Add `freeze_support()` which is required for multiprocessing to work
in frozen applications.

Test plan:

Hardware setup: 3x Mac Studio M3 Ultra connected all-to-all with TB5

- Built a DMG using a modified .github/workflows/build-app.yml[0] to avoid
  publishing it.
- Installed on all 3 Macs, replacing the existing Exo.
- Downloaded Llama 3.3 70B (FP16).
- Downloaded Qwen3 Coder 235B A22B (8-bit).

Things that work now but didn't on the previous app:
- Topology looks good, previously there was no discovery.

What didn't work:
- Started an instance with Pipeline + MLX Ring + 3 Nodes. Failed.
- Started an instance with Tensor + MLX RDMA + 2 Nodes. Failed.

Will continue debugging the instance starting issues separately.

[0] https://github.com/exo-explore/exo/actions/runs/20461320368
2025-12-23 14:34:21 +00:00
Jake Hillion
e8855959c1 build-app: add branch trigger from named branch
As I've been working on the .dmg, it's become clear we need a way to
test changes to the app. It's too hard to reproduce the full DMG locally
to be reasonable and much more convenient to test if it's signed.

Add a feature to the build-app workflow where if you push specifically
to the `test-app` branch it'll perform a build. The version is stubbed
to `0.0.0-alpha.0`, which is about as low as it gets in semver so you'll
always update away from it automatically with Sparkle. The resulting DMG
won't be pushed to S3 but will be uploaded as a GitHub Actions artifact.

I've been using similar commits to this for a while for testing. It's
worked well and not interfered with auto updating at all.

Test plan:
- Pushed this change to `test-app`.
- Generated action at
  https://github.com/exo-explore/exo/actions/runs/20447213358/job/58752909332
- Installed the DMG on a Mac. It worked as intended.
2025-12-23 12:53:30 +00:00
Jake Hillion
0a7fe5d943 ci: migrate build-app to github hosted runners 2025-12-22 19:51:48 +00:00
rltakashige
51a5191ff3 format readme (#978)
## Motivation

README looks weird after last update. 
<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->
I actually checked the file on GitHub this time.

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2025-12-22 18:06:27 +00:00
Evan Quiney
1efbd26388 add architecture.md, move images to docs/imgs (#968)
## Motivation

Documentation will make contribution easier and communicate our
development philosophy and decision process. Closes #967

## Changes

Added `architecture.md` to docs/ and moved the images out of docs and
into their own docs/imgs/ folder
2025-12-22 17:57:43 +00:00
Jake Hillion
02c915a88d pyproject: drop pathlib dependency 2025-12-22 17:52:44 +00:00
rltakashige
fc41bfa1f1 Add all prerequisites to README (#975)
## Motivation

Addresses #974 
```
INFO: pip is looking at multiple versions of exo to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement exo-pyo3-bindings (from exo) (from versions: none)
ERROR: No matching distribution found for exo-pyo3-bindings
```

## Changes

Describes Rust dependency for building from source

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->
Tested locally and runs after this setup without exo-pyo3-bindings error

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2025-12-22 17:38:51 +00:00
Jake Hillion
dd0638b74d pyproject: add pyinstaller to dev-dependencies 2025-12-22 15:49:27 +00:00
majiayu000
e06830ce0b fix: update macOS app to use correct API port (52415)
Fixes #960

The macOS app was incorrectly using port 8000 instead of the default
exo API port 52415. This caused confusion as the README correctly
documents port 52415 but the app was connecting to a different port.
2025-12-22 13:24:09 +00:00
Jake Hillion
1df5079b98 ci: avoid pushing alpha build as latest 2025-12-22 13:00:49 +00:00
Nightguarder
1e75aeb2c2 Add Prerequisites to Readme (#936)
## Motivation
Users need to know what **prerequisites** they need in order to run exo.
Simple addition to docs prevents future raised issues.

## Changes

Updated ``README.md``:
- to include installation instructions for
**[uv](https://github.com/astral-sh/uv)** and
**[macmon](https://github.com/vladkens/macmon)**.

Updated ``CONTRIBUTING.md``:
-  to verify these prerequisites are met before starting development.

- Standardized on brew installation instructions for macOS users to keep
the guide simple.

## Why It Works

By listing these prerequisites upfront, users will set up their
environment correctly before attempting to run exo.

## Test Plan

### Manual Testing
MacBook Pro M4
- Verified that ``uv`` and ``macmon`` were missing initially, causing
failures
- after installing them via brew (as documented), uv run exo starts
successfully.

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->

---------

Co-authored-by: Evan Quiney <evanev7@gmail.com>
2025-12-22 02:28:08 +00:00
Heath Dutton🕴️
c582bdd673 bugfix: Handle MacMon errors gracefully 2025-12-22 02:21:29 +00:00
Jake Hillion
1bae8ebbf6 ci: add build-app workflow 2025-12-22 02:12:30 +00:00
Alex Cheema
abaeb0323d Update README.md. (#956)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
Made a mistake on the merge of the last PR.
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2025-12-21 23:09:44 +00:00
Alex Cheema
7d15fbdaab readme tweaks5 (#954)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2025-12-21 22:48:35 +00:00
Alex Cheema
4a6e0fe171 Update README.md. (#949)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2025-12-21 18:31:23 +00:00
Olimbek Nizomov
f4792dce14 fix(downloads): use certifi for robust SSL certificate verification (#941)
fix(downloads): use certifi for robust SSL certificate verification

## Description
This change updates the SSL context creation in \`download_utils.py\` to
explicitly use the \`certifi\` CA bundle. This ensures that the
application has access to a reliable, up-to-date set of root
certificates, which is critical for verifying SSL connections to
external services like Hugging Face.

## Problem
On macOS environments (and potentially others), Python's default SSL
context often fails to locate the system's root certificates. This leads
to \`aiohttp.client_exceptions.ClientConnectorCertificateError\` errors
when attempting to download models.

## Solution
By passing \`cafile=certifi.where()\` to
\`ssl.create_default_context()\`, we force the application to use the
trusted certificate store provided by the \`certifi\` package. This is a
standard best practice for cross-platform Python applications and
resolves the verification failure.
2025-12-21 12:03:52 +00:00
rltakashige
a1b14a272e Extend eos_token_id fix for other models (#938)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
We currently use mlx_lm's load_tokenizer instead of load. This means
that some models are missing some configurations, such as eos_token_id.
This is clear for a model like GLM, which does not finish token
generation.

## Changes

<!-- Describe what you changed in detail -->
A small stopgap, to allow eos_token_ids to be added, and a TODO for us
to migrate to load. The reason we don't want to do this now is that a
solid testing framework is not configured in this repo yet.

## Why It Works

<!-- Explain why your approach solves the problem -->
It just uses the eos_token_ids I obtained from loading a tokenizer in
mlx_lm and calling `tokenizer.eos_token_ids` .

## Test Plan

### Manual Testing
Tested on several Macs.

### Automated Testing
None yet, as described.

---------

Co-authored-by: Evan <evanev7@gmail.com>
2025-12-20 20:18:17 +00:00
Alex Cheema
f8483cfc18 Update README.md. (#932)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2025-12-19 21:23:25 +00:00
Alex Cheema
8bafd6fe68 Update README.md (#925)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2025-12-19 14:38:40 +00:00
Jake Hillion
f16afd723d nix: get rust build working on linux 2025-12-19 13:51:15 +00:00
Alex Cheema
4da0043253 Update README.md (#917) 2025-12-18 20:38:00 +00:00
Jake Hillion
9e2bdeef92 LICENSE: Fix company name/year 2025-12-18 20:24:44 +00:00
Jake Hillion
379744fe5c exo: open source mac app and build process 2025-12-18 20:06:03 +00:00
Jake Hillion
74bae3ba6d Update README.md 2025-12-18 19:18:59 +00:00
Evan Quiney
9815283a82 8000 -> 52415 (#915)
* 8000 -> 52415

* dont grab the api port for placement

---------

Co-authored-by: rltakashige <rl.takashige@gmail.com>
2025-12-18 18:39:44 +00:00
Evan Quiney
5bd39e84d9 Merge pull request #914 from exo-explore/remove-old-cli-flag
remove old tb_only flag from master
2025-12-18 18:30:45 +00:00
Evan
658cf5ccf9 remove tb_only from master 2025-12-18 17:39:02 +00:00
rltakashige
170d2dcbaf Add Windows as a potential planned platform 2025-12-18 17:33:25 +00:00
Evan Quiney
ba66f14299 Merge pull request #912 from exo-explore/update-dashboard-error-message 2025-12-18 17:12:28 +00:00
Evan
274e35f926 update readme 2025-12-18 17:05:35 +00:00
Evan
3fe7bd250f update error message 2025-12-18 17:02:52 +00:00
Evan
004fea6935 clarify platform support 2025-12-18 16:27:43 +00:00
Evan
5c2d254fd1 add platform support information 2025-12-18 15:45:53 +00:00
Jake Hillion
19ca48c4f1 more readme fixups 2025-12-18 14:47:04 +00:00
Jake Hillion
57d3813692 re-add LICENSE 2025-12-18 14:35:40 +00:00
Evan
7cd1527ce3 update CONTRIBUTING 2025-12-18 14:35:20 +00:00
Evan Quiney
423c066ecc Merge pull request #906 from exo-explore/jj/sluxkvlmwons
re-add logos
2025-12-18 14:29:29 +00:00
Jake Hillion
ebf0e18c0e re-add logos 2025-12-18 14:26:27 +00:00
Evan
28a6151b8e remove discord link from README 2025-12-18 14:02:38 +00:00
Jake Hillion
2c16e00be9 github docs 2025-12-18 13:49:07 +00:00
Jake Hillion
f64d17fac0 exo v1 2025-12-18 13:46:40 +00:00
Jake Hillion
0fcee70833 prep repo for v1 2025-12-17 15:31:02 +00:00
Evan Quiney
09593c5e85 backport the dashboard to staging 2025-12-17 12:22:22 +00:00
Evan Quiney
880a18d205 fix disconnects
Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>
2025-12-15 15:23:13 +00:00
rltakashige
70298ce0a9 Negative index nack request 2025-12-09 07:57:28 -08:00
Jake Hillion
ac3a0a6b47 ci: enable ruff check in CI through nix 2025-12-09 12:26:56 +00:00
rltakashige
859233a279 Reduce RequestEventLog spam 2025-12-09 11:43:54 +00:00
Evan Quiney
c9e2062f6e switch from uvicorn to hypercorn 2025-12-05 17:29:06 +00:00
Jake Hillion
e8566a3f95 placement: pass different ibv_coordinator per node 2025-12-05 17:23:22 +00:00
Jake Hillion
39d76aa0a5 nix: move formatting checks to nix and enable in ci 2025-12-05 17:00:33 +00:00
Jake Hillion
5629983809 fmt: format all python/rust/nix files 2025-12-05 16:58:55 +00:00
Evan Quiney
7312a7e000 plan fix 2025-12-05 16:43:11 +00:00
Evan Quiney
9e0a1c23ef rename ibv to jaccl inline with mlx 2025-12-05 16:42:43 +00:00
Evan Quiney
f5783d6455 proper collection of rdma ports in placement 2025-12-05 16:42:20 +00:00
Evan Quiney
e702313b32 pingers
Co-authored-by: Jake Hillion <jake@hillion.co.uk>
2025-12-05 16:41:19 +00:00
Evan
a3f8ecba9e prioritise LL4 2025-12-05 15:08:18 +00:00
Jake Hillion
5ef1df1e10 rust: move Cargo.toml to the root 2025-12-05 12:01:44 +00:00
Evan
40a0d47de8 jaccl 2025-12-03 13:53:12 +00:00
rltakashige
2b243bd80e Consolidate!!! Fixes 2025-12-03 12:19:25 +00:00
Evan Quiney
10c905c8dd worker no longer gets stuck after shutdown 2025-12-02 11:35:02 +00:00
Evan
93f699b660 add aarch64-linux for the spark 2025-11-28 11:08:18 +00:00
Alex Cheema
b43d30563d todo for layer-independent parameters in get_allow_patterns 2025-11-27 19:26:02 +00:00
Alex Cheema
20d73e90cd fix dashboard case sensitive model id 2025-11-26 18:16:32 +00:00
Alex Cheema
e56daa7c23 render download progress properly 2025-11-26 11:48:30 +00:00
Alex Cheema
63c85e1298 get rid of spammy Finished tokenizing log 2025-11-25 13:02:06 +00:00
Evan
7088988a65 bump pyo3 stub-gen 2025-11-25 12:13:53 +00:00
rltakashige
7b3e3fd66c Worker tests 2 2025-11-21 16:42:52 +00:00
rltakashige
de50811313 Worker tests on staging 1
Test plan
2025-11-21 15:22:40 +00:00
rltakashige
b45cbdeecd Consolidate cleanup 2025-11-21 14:54:02 +00:00
rltakashige
28a91787e8 Demo
Co-authored-by: Evan <evanev7@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-11-20 20:03:51 +00:00
Alex Cheema
d793f5f96c fix kimi eos token ids 2025-11-13 18:39:14 +00:00
Evan Quiney
b62f68474a improved master error handling
Co-authored-by: Ryuichi Leo Takashige <rl.takashige@gmail.com>
2025-11-11 18:04:40 +00:00
Alex Cheema
631cb81009 kimi k2 thinking 2025-11-11 18:03:39 +00:00
Evan Quiney
364087b91f five billion percent better shutdown handling 2025-11-11 17:43:53 +00:00
Evan Quiney
aa519b8c03 Worker refactor
Co-authored-by: rltakashige <rl.takashige@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-11-10 23:31:53 +00:00
Alex Cheema
9058b117c0 pipeline parallel fix 2025-11-08 02:19:19 +00:00
rltakashige
612f58c78d Revert dumb merge mistake 2025-11-07 02:39:08 +00:00
Evan
6bcac37d98 stop benching on all pushes 2025-11-06 22:26:30 +00:00
rltakashige
ff00b165c5 MLX LM type stubs 2025-11-06 21:59:29 +00:00
Alex Cheema
19e90572e6 set max_transmit_size on gossipsub to 1MB. Fixes large message erorr 2025-11-06 19:18:48 +00:00
Alex Cheema
e60681963f show ips on dashboard 2025-11-06 19:18:07 +00:00
rltakashige
0bb621b653 Add mlx nn stubs 2025-11-06 11:59:37 +00:00
Alex Cheema
699fd9591e fix exo scripts 2025-11-05 21:47:08 -08:00
rltakashige
6bbb6344b6 mlx.distributed.Group type stubs 2025-11-06 05:26:04 +00:00
rltakashige
16f724e24c Update staging 14
Co-authored-by: Evan <evanev7@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: David Munha Canas Correia <dmunha@MacBook-David.local>
Co-authored-by: github-actions bot <github-actions@users.noreply.github.com>
2025-11-05 01:44:24 +00:00
Evan Quiney
3b409647ba Squash merge merging_clusters into tensor_parallel94 2025-10-31 17:41:57 +00:00
Alex Cheema
d46c7e6a76 fix race condition with downloads where it cancels the download before renaming 2025-10-30 19:03:23 -07:00
rltakashige
91c635ca7a Update mlx and mlx-lm packages
Co-authored-by: Evan <evanev7@gmail.com>
2025-10-31 01:34:43 +00:00
Alex Cheema
5f18faec17 Update. 2025-10-30 11:59:59 -07:00
Alex Cheema
a346af3477 download fixes 2025-10-22 11:56:52 +01:00
Alex Cheema
56f783b38d Update. 2025-10-21 17:29:48 +01:00
Evan Quiney
363c98a872 leaf placement
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-10-15 12:47:26 +01:00
Evan Quiney
f25689d9c2 fix a race condition 2025-10-15 10:49:53 +01:00
Evan Quiney
1c6b5ce911 new tagged union
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Sorry Andrei!
2025-10-10 16:22:09 +01:00
Alex Cheema
76ed8a516b typecheck on ubuntu with install-nix-action
Co-authored-by: Evan <evanev7@gmail.com>
2025-10-10 16:15:39 +01:00
Evan Quiney
e8a6efe281 add kimi k2 2025-10-07 17:17:06 +01:00
Evan Quiney
a4e8335241 add just clean 2025-10-07 16:29:51 +01:00
Alex Cheema
84dfc8a738 Fast memory profiling
Co-authored-by: Evan <evanev7@gmail.com>
2025-10-07 16:23:51 +01:00
Alex Cheema
e01f9cf739 Disable build macos app 2025-10-07 15:39:15 +01:00
Alex Cheema
35ab6b376e fix: master tests
Co-authored-by: Evan <evanev7@gmail.com>
2025-10-07 15:36:05 +01:00
Evan Quiney
962e5ef40d version bump for brew consistency 2025-10-07 15:18:54 +01:00
Evan Quiney
b1721e941b nix cleanup 2025-10-01 09:47:00 +01:00
Evan Quiney
22f0ca2a59 FIX: OpenWebUI compat 2025-09-30 16:28:38 +01:00
Evan Quiney
57486a4305 kill go
Fairwell Gelu, Chief Lunch Officer
2025-09-30 11:10:55 +01:00
Evan Quiney
38ff949bf4 big refactor
Fix. Everything.

Co-authored-by: Andrei Cravtov <the.andrei.cravtov@gmail.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
2025-09-30 11:03:04 +01:00
Matt Beton
7040c9508f Multiprocessing Runner 2025-09-17 09:31:49 +01:00
Matt Beton
35c4311587 Dashboard Status & Bugfixes 2025-08-29 17:34:17 +01:00
Matt Beton
a33787f5fd Prompt length 2025-08-29 16:07:36 +01:00
Matt Beton
1b8b456ced full mlx caching implementation 2025-08-26 17:15:08 +01:00
Matt Beton
84c90a6d35 feat: mlx memory cache for faster ttft
Co-authored-by: Evan <evanev7@gmail.com>
Co-authored-by: s17 <s17@s17s-Mac-Studio.local>
2025-08-26 13:05:42 +01:00
Evan Quiney
5efe5562d7 feat: single entrypoint and logging rework 2025-08-26 11:08:09 +01:00
Andrei Cravtov
ef5c5b9654 changes include: ipc, general utilities, flakes stuff w/ just, autopull script 2025-08-25 17:33:40 +01:00
Alex Cheema
5bfc99b415 add EXO logo to dashboard 2025-08-25 16:41:13 +01:00
Evan Quiney
11f8b4ef33 tidy: fix justfile, run.sh, run formatter 2025-08-21 18:44:53 +01:00
Evan Quiney
be6f5ae7f1 feat: build system and homebrew compatibility 2025-08-21 16:07:37 +01:00
Evan Quiney
40efed4436 unvendored macmon 2025-08-20 13:04:46 +01:00
Gelu Vrabie
ea9e573409 Refactor runner supervisor
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-08-18 18:37:52 +01:00
Gelu Vrabie
345fafd80d Forwarder versioning
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-08-18 15:08:50 +01:00
Evan Quiney
ea3eeea826 improved go caching with nix
Co-authored-by: Gelu Vrabie <gelu.vrabie.univ@gmail.com>
2025-08-15 15:24:58 +01:00
Gelu Vrabie
a2a37c0ebe discovery fixed
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-08-15 15:23:20 +01:00
Gelu Vrabie
57073f35c3 collection of fixes for Shanghai demo
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-08-15 15:21:51 +01:00
Andrei Cravtov
7e19804aa5 Integrate flake parts 2025-08-13 09:55:22 +01:00
Matt Beton
dbcd09aa53 No 70b 2025-08-12 18:42:27 +01:00
Matt Beton
c1d5b381f4 70B model unit test only runs if its downloaded 2025-08-07 10:41:56 +01:00
Alex Cheema
473512ddd0 r1 size 2025-08-04 22:57:31 +08:00
Alex Cheema
817c5993f0 fix dem model cards yo 2025-08-04 22:56:06 +08:00
Gelu Vrabie
75ecda55a9 fix gitignore
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
2025-08-04 13:49:49 +01:00
Alex Cheema
c560c55c4e build and release on staging 2025-08-04 07:41:09 +08:00
Sami Khan
f51f8f72f8 app launches python modules 2025-08-04 06:18:31 +08:00
Seth Howes
407796d18f Minor dashboard fixes 2025-08-04 06:15:01 +08:00
Alex Cheema
6daf7f31f7 clean model cards 2025-08-04 05:31:30 +08:00
Alex Cheema
f352ddfc5f run configure_mlx.sh in run.sh 2025-08-04 03:59:42 +08:00
Alex Cheema
6855a7727d set a 15 sec timeout for getting initial download progress 2025-08-03 20:37:20 +08:00
Matt Beton
1fe4ed3442 Worker Exception & Timeout Refactor
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
2025-08-02 08:28:37 -07:00
Alex Cheema
92c9688bf0 Remove rust 2025-08-02 08:16:39 -07:00
Sami Khan
a46f8c3cd1 app
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-08-01 19:14:27 -07:00
Seth Howes
71bafabc63 Dashboard with instances 2025-08-01 14:38:07 +01:00
Gelu Vrabie
0e32599e71 fix libp2p + other prs that were wrongly overwritten before (111,112,117,118,1119 + misc commits from Alex)
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com>
Co-authored-by: Seth Howes <71157822+sethhowes@users.noreply.github.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-07-31 20:36:47 +01:00
Alex Cheema
2031d9481d fix api get_state 2025-07-30 07:15:15 -07:00
Matt Beton
b350ededb2 Test Supervisor Errors. 2025-07-30 13:30:54 +01:00
Gelu Vrabie
ff3d11c748 just run
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-29 16:58:27 +01:00
Gelu Vrabie
25fa46c6f6 Update CODEOWNERS 2025-07-29 13:08:29 +01:00
Seth Howes
3f192f20cc Reinstate dashboard 2025-07-28 23:18:23 +01:00
Alex Cheema
a2b4093d25 add metrics: gpu_usage, temp, sys_power, pcpu_usage, ecpu_usage, ane_… 2025-07-28 23:02:33 +01:00
Alex Cheema
12566865d5 better profiling 2025-07-28 22:15:04 +01:00
Gelu Vrabie
b88abf1cc2 fix topology disconnects and add heartbeat
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-28 22:00:05 +01:00
Alex Cheema
dbd0bdc34b fix ci linter 2025-07-28 20:12:48 +01:00
Alex Cheema
20241e3290 some finishing touches to get this working e2e 2025-07-28 13:07:29 +01:00
Seth Howes
176d077c87 Fix IPv4 serialisation for topology 2025-07-28 13:07:10 +01:00
Gelu Vrabie
c3c8ddbce8 fix forwarder supervisor tests
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-28 13:03:43 +01:00
Matt Beton
36a5d75efd Fix download tests 2025-07-28 12:51:10 +01:00
Seth Howes
e9b803604b Add Multiaddr type and refactor Hosts type for creating shard placement 2025-07-28 11:39:46 +01:00
Alex Cheema
b285a9f0b7 fix placement tests 2025-07-28 11:18:32 +01:00
Alex Cheema
57ca487fde Fixes for running this end to end
Co-authored-by: Gelu Vrabie <gelu.vrabie.univ@gmail.com>
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-28 10:51:03 +01:00
Andrei Cravtov
b687dec6b2 Discovery integration master
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-07-27 13:43:59 +01:00
Alex Cheema
98f204d14a Fix placement single node 2025-07-26 20:08:37 +01:00
Matt Beton
93330f0283 Inference Integration Test
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-07-26 20:08:25 +01:00
Gelu Vrabie
2e4635a8f5 add node started event
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-26 19:12:26 +01:00
Gelu Vrabie
261e575262 Serialize topology
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-25 15:09:03 +01:00
Alex Cheema
a97fb27c64 Glue TWO 2025-07-25 14:32:34 +01:00
Gelu Vrabie
9be08ec7dd add resource monitor
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-25 13:10:53 +01:00
Alex Cheema
a241c92dd1 Glue 2025-07-25 13:10:29 +01:00
Seth Howes
6f8e3419d5 Placement strategy
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-07-24 20:22:40 +01:00
Gelu Vrabie
4c0e4ef853 Go build
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-24 19:45:45 +01:00
Matt Beton
f41531d945 Worker Loop
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-07-24 18:44:31 +01:00
Alex Cheema
67c70b22e4 Best master 2025-07-24 17:12:52 +01:00
Andrei Cravtov
3730160477 Fix the node-ID test
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
2025-07-24 17:09:12 +01:00
Gelu Vrabie
df1fe3af26 Topology apply
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-24 14:27:09 +01:00
Matt Beton
5097493a42 Fix tests 2025-07-24 13:22:58 +01:00
Alex Cheema
a6b3ab6332 Worker plan
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
Co-authored-by: Seth Howes <71157822+sethhowes@users.noreply.github.com>
Co-authored-by: Gelu Vrabie <gelu.vrabie.univ@gmail.com>
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
Co-authored-by: Andrei Cravtov <the.andrei.cravtov@gmail.com>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
2025-07-24 12:45:27 +01:00
Gelu Vrabie
56d3565781 Add apply functions
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-24 11:02:20 +01:00
Andrei Cravtov
3ab5609289 wrote race-condition-free persistent NodeID-getting function 2025-07-23 20:18:56 +01:00
Matt Beton
7a452c3351 Fix tests 2025-07-23 18:25:50 +01:00
Seth Howes
7ac23ce96b Refactor tasks / commands / api 2025-07-23 15:52:29 +01:00
Andrei Cravtov
81060b7062 Made basedpyright work with Jetbrains environment
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
2025-07-23 14:12:11 +01:00
Andrei Cravtov
8d2536d926 Implemented basic discovery library in Rust + python bindings
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
2025-07-23 13:11:29 +01:00
Gelu Vrabie
76f903504c fix
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-22 22:29:35 +01:00
Seth Howes
cd9a1a9192 Topology update 2025-07-22 22:29:17 +01:00
Matt Beton
14b3c4a6be New API! 2025-07-22 21:21:12 +01:00
Gelu Vrabie
596d9fc9d0 add forwarder service
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-22 20:53:26 +01:00
Matt Beton
53c652c307 Fix tests! 2025-07-22 15:20:32 +01:00
Matt Beton
5adad08e09 New events 2025-07-22 15:16:06 +01:00
Gelu Vrabie
108128b620 fix sqlite connector
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-21 22:43:09 +01:00
Alex Cheema
449fdac27a Downloads 2025-07-21 22:42:37 +01:00
Seth Howes
cb101e3d24 Refactor model types 2025-07-21 20:35:27 +01:00
Gelu Vrabie
54efd01d77 add forwarder supervisor
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-21 20:21:43 +01:00
Seth Howes
bae58dd368 Refactor worker + master state into single state 2025-07-21 19:36:54 +01:00
Seth Howes
d19aa4f95a Simplify Task type + merge control & data plane types into single type 2025-07-21 17:10:09 +01:00
Gelu Vrabie
2f64e30dd1 Add sqlite connector
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-21 14:10:29 +01:00
Alex Cheema
bb7f1ae994 New worker
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
2025-07-18 10:08:56 +01:00
Matt Beton
cc45c7e9b9 Fixed events issue. 2025-07-17 12:21:01 +01:00
Arbion Halili
038cc4cdfa fix: Normalize Naming 2025-07-16 16:11:51 +01:00
Arbion Halili
e2a7935019 fix: Fix incorrect logic 2025-07-16 14:39:20 +01:00
Arbion Halili
6a671908a3 fix: FrozenSet Related Bits 2025-07-16 13:45:57 +01:00
Arbion Halili
520b1122a3 fix: Many Fixes 2025-07-16 13:35:31 +01:00
Arbion Halili
d9b9aa7ad2 Merge branch 'master-node' into staging 2025-07-15 16:32:08 +01:00
Arbion Halili
7fa7de8e83 more incomplete trash 2025-07-15 13:42:17 +01:00
Arbion Halili
9f96b6791f fix: Some, still broken 2025-07-15 13:11:21 +01:00
Arbion Halili
9b3c105bea fix: Save Andrei's sanity 2025-07-15 13:11:20 +01:00
Arbion Halili
8060120136 tweak 2025-07-14 22:37:53 +01:00
Arbion Halili
df6626fa31 fix: Event definitions, state definitions 2025-07-14 21:41:14 +01:00
Arbion Halili
70f0f09c05 Tweaked, Still Broken tho 2025-07-14 21:19:39 +01:00
Arbion Halili
8799c288b0 BROKEN: work thus far 2025-07-14 21:09:08 +01:00
Arbion Halili
4e4dbf52ec fix: Use Nix-compatible LSP set-up 2025-07-14 21:08:43 +01:00
Matt Beton
21acd3794a New Runner! 2025-07-10 16:34:35 +01:00
Arbion Halili
b0bd951005 Merge Basic Interfaces
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
Co-authored-by: Andrei Cravtov <the.andrei.cravtov@gmail.com>
2025-07-09 19:04:21 +01:00
Arbion Halili
74d56e52ff fix: Improve naming 2025-07-07 20:22:27 +01:00
Arbion Halili
fe17aaf9f8 fix: Make master hold a queue of task data 2025-07-07 20:22:00 +01:00
Arbion Halili
e1894bc106 refactor: A Lot 2025-07-07 20:19:08 +01:00
Arbion Halili
81cf6bce64 refactor: Simplify networking 2025-07-07 19:33:14 +01:00
Andrei Cravtov
6c8b8b30ae added rust to flake 2025-07-07 18:11:40 +01:00
Matt Beton
0425422f55 Simple fix 2025-07-07 17:18:43 +01:00
Matt Beton
03a1cf59a6 Matt's interfaces
Added interfaces for chunks, worker, runner, supervisor, resourcemonitor, etc.
2025-07-07 16:42:52 +01:00
Arbion Halili
367e76c8fa fix: Fix validation over Task types 2025-07-04 17:25:14 +01:00
Arbion Halili
cda3de2a28 fix: Use state for tasks 2025-07-04 15:08:54 +01:00
Arbion Halili
10224d09de refactor: Distinguish the topology of the control plane from that of the data plane 2025-07-03 15:45:54 +01:00
Arbion Halili
c456934342 refactor: Remove timestamp from Wrapped Events 2025-07-03 13:05:35 +01:00
Arbion Halili
0b6aadf576 refactor: Add safe state mutation method .apply() 2025-07-03 12:33:29 +01:00
Arbion Halili
f8039e20e0 feature: Add pretty_name to ModelMetadata 2025-07-03 12:32:32 +01:00
Arbion Halili
4bb3a995a4 feature: Interfaces for graph interfaces 2025-07-02 22:44:55 +01:00
Arbion Halili
7dd8a979d2 feature: Simplest utilities for logging 2025-07-02 22:13:42 +01:00
Arbion Halili
40793f1d86 refactor: Refactor most things 2025-07-02 21:11:49 +01:00
Arbion Halili
8596d5c5b1 refactor: Fix UUID implementation 2025-07-02 11:04:52 +01:00
Arbion Halili
6de1f2883f feat: Update Interfaces 2025-07-01 18:41:37 +01:00
Arbion Halili
73ac8969bc feat: Add ResourceGraph, runner types, etc. 2025-07-01 13:14:26 +01:00
Arbion Halili
df824e2e87 fix: Ensure MasterState inherits from SharedState 2025-07-01 12:18:54 +01:00
Seth Howes
d5033e658c refactor: Replace Literal with Enum in sources.py 2025-07-01 12:15:28 +01:00
Arbion Halili
c0df8e5463 feat: Implement Many Interfaces 2025-07-01 01:37:00 +01:00
Arbion Halili
899d8820dd Merge Seth's Control Plane API Work into Alex's Events Branch
Co-authored-by: Seth Howes <sethshowes@gmail.com>
2025-06-30 23:54:41 +01:00
Arbion Halili
53d5d23898 refactor: Use enums 2025-06-30 23:45:27 +01:00
Arbion Halili
b758df83cf Chore: Tweak CI 2025-06-30 22:41:33 +01:00
Alex Cheema
133ab70d67 chore: Run formatter 2025-06-30 09:48:03 +01:00
Alex Cheema
aae3e4a82d refactor: Put type defs on one line 2025-06-30 09:46:44 +01:00
Alex Cheema
596b069f84 chore: Fail pipeline if working tree changes instead of committing them in CI 2025-06-30 09:40:47 +01:00
Alex Cheema
c0b8bb9c98 chore: Rename conditional-commit.yml to action.yml 2025-06-29 22:34:04 +01:00
Alex Cheema
0c46adc298 refactor: Use official OpenAI types 2025-06-29 22:30:18 +01:00
Alex Cheema
4b3e60f899 refactor: Add types for model downloading 2025-06-29 21:59:06 +01:00
Alex Cheema
784f0ec423 chore: Skip protobuf generation if no .proto files exist 2025-06-29 21:52:46 +01:00
Alex Cheema
38dcf698eb chore: Fix typecheck job in GitHub workflow 2025-06-29 21:47:23 +01:00
Alex Cheema
c9d44a1658 chore: Fix typecheck job in GitHub workflow 2025-06-29 21:45:41 +01:00
Alex Cheema
bbdfdac7be refactor: Remove redundant comment 2025-06-29 21:42:00 +01:00
Alex Cheema
5ba230ed16 refactor: Add all event types with Event implementations 2025-06-29 21:41:00 +01:00
Arbion Halili
5abf03e31b Scaffold Event Sourcing 2025-06-29 19:44:58 +01:00
Arbion Halili
d8459358cf Refactor CI 2025-06-28 14:42:53 +01:00
Arbion Halili
c977ce9419 Ensure exo-shared is a Dependency of exo-master and exo-worker 2025-06-28 14:34:49 +01:00
Arbion Halili
74adbc4280 Remove PoeThePoet 2025-06-28 14:33:01 +01:00
Arbion Halili
587a52a944 Remove Bad UUID Implementation 2025-06-28 14:08:18 +01:00
Arbion Halili
885c7d5cd8 Add RULES.md and .cursorrules 2025-06-28 14:03:01 +01:00
Arbion Halili
e4c4b3e95a Overhaul CI Design 2025-06-28 12:29:01 +01:00
Arbion Halili
f7f779da19 Fix Type Checker; Improve Protobuf Generation 2025-06-28 12:28:26 +01:00
Arbion Halili
38bc8ea7e4 Keep Protobuf Directories 2025-06-28 01:32:10 +01:00
Arbion Halili
b53c1ba999 Use Hatch Build System 2025-06-28 01:28:52 +01:00
Arbion Halili
423efe10b8 Add Protobuf Support 2025-06-28 01:27:25 +01:00
Arbion Halili
61b8b1cb18 Add Protobuf Support 2025-06-28 01:26:49 +01:00
Arbion Halili
7f0f71b9eb Add .gitignore 2025-06-28 01:25:51 +01:00
Arbion Halili
da50da2b43 Add Simple env.py 2025-06-27 11:57:03 +01:00
Arbion Halili
3564d77e58 Add Sync to Runner 2025-06-27 11:56:02 +01:00
Arbion Halili
77546b951e Update pyproject.toml 2025-06-17 22:28:48 +01:00
Arbion Halili
c15e402f3b Add Simple Groundwork 2025-06-17 22:23:01 +01:00
Arbion Halili
c57ed32fc5 Add Initial Contribution Rules 2025-06-17 16:11:15 +01:00
Arbion Halili
41085eef7b Prepare Environment Parser 2025-06-17 16:10:58 +01:00
Arbion Halili
685c8eff58 Configure Runner Tasks to Cover "engines/" 2025-06-17 07:37:08 +01:00
Arbion Halili
13b6043c09 Add Linter 2025-06-17 07:32:33 +01:00
Arbion Halili
180748ee83 Update Workspace Configuration, Configure Build Backend 2025-06-17 06:45:25 +01:00
Arbion Halili
043253a55d Add ML Engines (Backend) 2025-06-17 05:55:43 +01:00
Arbion Halili
090265a374 Add Formatter To CI 2025-06-17 05:46:33 +01:00
Arbion Halili
e2508f3419 Add Type Checker In CI 2025-06-17 05:46:08 +01:00
Arbion Halili
ac2dfa6565 Initial Structure 2025-06-17 03:55:41 +01:00
Alex Cheema
db1a5252a2 Add CODEOWNERS. 2025-06-14 23:32:30 -07:00
Alex Cheema
e4238f9ef3 Merge pull request #800 from exo-explore/grpcio1.71.0
downgrade grpcio, grpcio-tools to 1.70.0
2025-03-21 15:23:32 -07:00
Alex Cheema
ad3bc6ceaa downgrade grpcio, grpcio-tools to 1.70.0 2025-03-21 15:23:11 -07:00
Alex Cheema
04d5dca18f Merge pull request #778 from exo-explore/grpcio1.71.0
upgrade grpcio and grpcio-tools to 1.71.0
2025-03-12 06:24:57 +00:00
Alex Cheema
50b6800a61 m3 ultra flops estimates based on some quick profiling 2025-03-11 22:51:23 -07:00
Alex Cheema
2857975bf3 upgrade grpcio and grpcio-tools to 1.71.0 2025-03-11 17:23:37 -07:00
Alex Cheema
854f515cf5 Merge pull request #763 from deftdawg/amdgpu
AMD/ROCm: Changes required to detect and inference on AMD GPUs
2025-03-06 16:07:05 +00:00
DeftDawg
f98d9bac53 Changes required to detect AMD GPUs 2025-03-05 22:49:29 -05:00
Alex Cheema
017bf93cf5 Merge pull request #753 from mags0ft/patch-1
remove dead links in README
2025-03-03 23:01:34 +00:00
mags0ft
013d2573e7 remove dead links in README 2025-03-02 18:37:59 +01:00
Alex Cheema
2702975762 Merge pull request #746 from exo-explore/grpcio1.70.0
downgrade grpc to 1.67.0. waiting for fix
2025-02-28 21:26:11 +00:00
Alex Cheema
30c3f58a00 downgrade grpc to 1.67.0. waiting for fix bd8f8a86e0 2025-02-28 21:25:11 +00:00
Alex Cheema
1bbbb1e1d8 Merge pull request #745 from exo-explore/grpcio1.70.0
Grpcio1.70.0
2025-02-28 21:05:41 +00:00
Alex Cheema
4081305e60 adjust grpc settings, ensure connected before sending any grpc commands 2025-02-28 20:52:12 +00:00
Alex Cheema
52a21645c6 Merge pull request #742 from samiamjidkhan/main
build fix
2025-02-28 12:29:58 +00:00
Sami Khan
63570c7b8b Merge pull request #1 from samiamjidkhan/build-fix
build fix
2025-02-28 15:47:36 +05:00
Sami Khan
971f5240bf build fix 2025-02-28 15:45:57 +05:00
Alex Cheema
36a6389af0 bump grpcio and grpcio-tools to 1.70.0 2025-02-27 01:40:04 +00:00
688 changed files with 62860 additions and 22463 deletions

View File

@@ -1,376 +0,0 @@
version: 2.1
orbs:
python: circleci/python@2
commands:
run_chatgpt_api_test:
parameters:
inference_engine:
type: string
model_id:
type: string
expected_output:
type: string
prompt:
type: string
steps:
- run:
name: Run chatgpt api integration test (<<parameters.inference_engine>>, <<parameters.model_id>>)
command: |
source env/bin/activate
# Set CLANG=1 for tinygrad only
if [ "<<parameters.inference_engine>>" = "tinygrad" ]; then
pip install llvmlite
export TOKENIZERS_PARALLELISM=true SUPPORT_BF16=0 CLANG=1
fi
# Start first instance
EXO_HOME="$(pwd)/.exo_cache_node1" DEBUG_DISCOVERY=7 DEBUG=7 exo --inference-engine <<parameters.inference_engine>> \
--node-id "node1" --listen-port 5678 --broadcast-port 5679 --chatgpt-api-port 8000 \
--chatgpt-api-response-timeout 900 --disable-tui > output1.log &
PID1=$!
tail -f output1.log &
TAIL1=$!
# Start second instance
EXO_HOME="$(pwd)/.exo_cache_node2" DEBUG_DISCOVERY=7 DEBUG=7 exo --inference-engine <<parameters.inference_engine>> \
--node-id "node2" --listen-port 5679 --broadcast-port 5678 --chatgpt-api-port 8001 \
--chatgpt-api-response-timeout 900 --disable-tui > output2.log &
PID2=$!
tail -f output2.log &
TAIL2=$!
# Remember to kill the tail processes at the end
trap 'kill $TAIL1 $TAIL2' EXIT
# Wait for discovery
sleep 10
# Function to check if processes are still running
check_processes() {
if ! kill -0 $PID1 2>/dev/null; then
echo "First instance (PID $PID1) died unexpectedly. Log output:"
cat output1.log
exit 1
fi
if ! kill -0 $PID2 2>/dev/null; then
echo "Second instance (PID $PID2) died unexpectedly. Log output:"
cat output2.log
exit 1
fi
}
# Check processes before proceeding
check_processes
echo "Sending request to first instance..."
response_1=$(curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "<<parameters.model_id>>",
"messages": [{"role": "user", "content": "<<parameters.prompt>>"}],
"temperature": 0.7
}')
echo "Response 1: $response_1"
# Check processes after first response
check_processes
echo "Sending request to second instance..."
response_2=$(curl -s http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "<<parameters.model_id>>",
"messages": [{"role": "user", "content": "<<parameters.prompt>>"}],
"temperature": 0.7
}')
echo "Response 2: $response_2"
# Check processes after second response
check_processes
# Stop both instances
kill $PID1 $PID2
echo ""
# Extract content using jq and check if it contains expected output
content1=$(echo "$response_1" | jq -r '.choices[0].message.content')
content2=$(echo "$response_2" | jq -r '.choices[0].message.content')
if [[ "$content1" != *"<<parameters.expected_output>>"* ]] || [[ "$content2" != *"<<parameters.expected_output>>"* ]]; then
echo "Test failed: Response does not match '<<parameters.expected_output>>'"
echo "Response 1 content: $content1"
echo ""
echo "Response 2 content: $content2"
echo "Output of first instance:"
cat output1.log
echo "Output of second instance:"
cat output2.log
exit 1
else
echo "Test passed: Response from both nodes matches '<<parameters.expected_output>>'"
fi
jobs:
unit_test:
macos:
xcode: "16.0.0"
resource_class: m2pro.large
steps:
- checkout
- run:
name: Set up Python
command: |
brew install python@3.12
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
- run:
name: Run tests
command: |
source env/bin/activate
# set TEMPERATURE to 0 for deterministic sampling
echo "Running inference engine tests..."
METAL_DEVICE_WRAPPER_TYPE=1 METAL_DEBUG_ERROR_MODE=0 METAL_XCODE=1 TEMPERATURE=0 python3 -m exo.inference.test_inference_engine
echo "Running tokenizer tests..."
python3 ./test/test_tokenizers.py
python3 ./test/test_model_helpers.py
discovery_integration_test:
macos:
xcode: "16.0.0"
steps:
- checkout
- run:
name: Set up Python
command: |
brew install python@3.12
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
- run:
name: Run discovery integration test
command: |
source env/bin/activate
DEBUG_DISCOVERY=7 DEBUG=7 exo --node-id "node1" --listen-port 5678 --broadcast-port 5679 --chatgpt-api-port 8000 --disable-tui > output1.log 2>&1 &
PID1=$!
DEBUG_DISCOVERY=7 DEBUG=7 exo --node-id "node2" --listen-port 5679 --broadcast-port 5678 --chatgpt-api-port 8001 --disable-tui > output2.log 2>&1 &
PID2=$!
sleep 10
kill $PID1 $PID2
if grep -q "Peer statuses: {\\'node2\\': \\'is_connected=True, health_check=True" output1.log && ! grep -q "Failed to connect peers:" output1.log && grep -q "Peer statuses: {\\'node1\\': \\'is_connected=True, health_check=True" output2.log && ! grep -q "Failed to connect peers:" output2.log; then
echo "Test passed: Both instances discovered each other"
exit 0
else
echo "Test failed: Devices did not discover each other"
echo "Output of first instance:"
cat output1.log
echo "Output of second instance:"
cat output2.log
exit 1
fi
chatgpt_api_integration_test_mlx:
macos:
xcode: "16.0.0"
resource_class: m2pro.large
steps:
- checkout
- run:
name: Set up Python
command: |
brew install python@3.12
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
- run_chatgpt_api_test:
inference_engine: mlx
model_id: llama-3.2-1b
prompt: "Keep responses concise. Who was the king of pop?"
expected_output: "Michael Jackson"
chatgpt_api_integration_test_dummy:
macos:
xcode: "16.0.0"
resource_class: m2pro.large
steps:
- checkout
- run:
name: Set up Python
command: |
brew install python@3.12
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
- run_chatgpt_api_test:
inference_engine: dummy
model_id: dummy
prompt: "Dummy prompt."
expected_output: "dummy"
chatgpt_api_integration_test_tinygrad:
macos:
xcode: "16.0.0"
resource_class: m2pro.large
steps:
- checkout
- run:
name: Set up Python
command: |
brew install python@3.12
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
- run_chatgpt_api_test:
inference_engine: tinygrad
model_id: llama-3.2-1b
prompt: "Keep responses concise. Who was the king of pop?"
expected_output: "Michael Jackson"
chatgpt_api_integration_test_tinygrad_linux:
machine:
image: ubuntu-2204:current
resource_class: xlarge
steps:
- checkout
- run:
name: Set up Python
command: |
export DEBIAN_FRONTEND=noninteractive
export DEBCONF_NONINTERACTIVE_SEEN=true
sudo apt-get update
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install -y python3.12 python3.12-venv clang
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
- run_chatgpt_api_test:
inference_engine: tinygrad
model_id: llama-3.2-1b
prompt: "Keep responses concise. Who was the king of pop?"
expected_output: "Michael Jackson"
measure_pip_sizes:
macos:
xcode: "16.0.0"
steps:
- checkout
- run:
name: Set up Python
command: |
brew install python@3.12
python3.12 -m venv env
source env/bin/activate
- run:
name: Install dependencies and measure sizes
command: |
source env/bin/activate
pip install --upgrade pip
pip install .
python ./extra/pipsize.py --json ./pipsize.json
- store_artifacts:
path: ./pipsize.json
destination: pip-sizes.json
check_line_count:
docker:
- image: cimg/python:3.10
steps:
- checkout
- run:
name: Setup git for PR comparison
command: |
if [[ -n "$CIRCLE_PULL_REQUEST" ]]; then
PR_NUMBER=$(echo $CIRCLE_PULL_REQUEST | rev | cut -d'/' -f1 | rev)
BASE_BRANCH=$(curl -s -H "Circle-Token: $CIRCLE_TOKEN" \
"https://circleci.com/api/v2/project/github/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME/pipeline/$CIRCLE_WORKFLOW_ID" \
| jq -r '.target_branch')
git clone -b $BASE_BRANCH --single-branch \
https://github.com/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME.git \
base_branch
fi
- run:
name: Install dependencies
command: |
python -m pip install --upgrade pip
pip install tabulate
- run:
name: Run line count check
command: |
if [[ -n "$CIRCLE_PULL_REQUEST" ]]; then
python extra/line_counter.py base_branch .
else
python extra/line_counter.py .
fi
- store_artifacts:
path: line-count-snapshot.json
destination: line-count-snapshot.json
- store_artifacts:
path: line-count-diff.json
destination: line-count-diff.json
- run:
name: Create test results directory
command: |
mkdir -p test-results/line-count
cp line-count-*.json test-results/line-count/
- store_test_results:
path: test-results
workflows:
version: 2
build_and_test:
jobs:
- check_line_count:
filters:
branches:
only: /.*/
tags:
only: /.*/
- unit_test
- discovery_integration_test
- chatgpt_api_integration_test_mlx
- chatgpt_api_integration_test_tinygrad
- chatgpt_api_integration_test_tinygrad_linux
- chatgpt_api_integration_test_dummy
- measure_pip_sizes

63
.clauderules Normal file
View File

@@ -0,0 +1,63 @@
# Claude Code Rules - Follow Every Rule Exactly
You must prioritize straightforward code semantics, well-named types, clear function signatures, and robust, carefully-chosen abstractions. Think about how your decisions might impact these aspects of code quality before proposing any changes.
You have access to all modern Python features from Python 3.13, 3.12, 3.11...
**When you're done making changes, remove any redundant comments; remaining comments should only apply to complex code segments, adding relevant context.**
## 1. Code Discipline
* Eliminate superfluous `try`/`catch` and `if` branches through strict typing and static analysis.
* Use pure functions unless you must mutate fixed state—then wrap that state in a class.
* Every function is **referentially transparent**: same inputs ⇒ same outputs, no hidden state, no unintended I/O.
* Put side-effects in injectable "effect handlers"; keep core logic pure.
## 2. Naming
* Choose descriptive, non-abbreviated names—no 3-letter acronyms or non-standard contractions.
* Anyone reading a function's type signature alone should grasp its purpose without extra context.
## 3. Typing
* Maintain **strict, exhaustive** typing; never bypass the type-checker.
* Default to `Literal[...]` when an enum-like set is needed.
* Prefer built-in types; when two values share structure but differ in meaning, enforce separation:
* Use `typing.NewType` for primitives (zero runtime cost).
* For serializable objects, add a `type: str` field that states the object's identity.
## 4. Pydantic
* Read, respect, and rely on Pydantic documentation.
* Centralize a common `ConfigDict` with `frozen=True` and `strict=True` (or stricter) and reuse it everywhere.
* For hierarchies of `BaseModel` variants, declare a discriminated union with `typing.Annotated[Base, Field(discriminator='variant')]`; publish a single `TypeAdapter[Base]` so all variants share one strict validator.
## 5. IDs & UUIDs
* Subclass Pydantic's `UUID4` for custom ID types.
* Generate fresh IDs with `uuid.uuid4()`.
* Create idempotency keys by hashing *persisted* state plus a **function-specific salt** to avoid collisions after crashes.
## 6. Error Handling
* Catch an exception **only** where you can handle or transform it meaningfully.
* State in the docstring **where** each exception is expected to be handled and **why**.
## 7. Dependencies
* Introduce new external dependencies only after approval.
* Request only libraries common in production environments.
## 8. Use of `@final` & Freezing
* Mark classes, methods, and variables as `@final` or otherwise immutable wherever applicable.
## 9. Repository Workflow
If you spot a rule violation within code that you've not been asked to work on directly, inform the user rather than patching it ad-hoc.
---
### One-Sentence Summary
Write strictly-typed, pure, self-describing Python that uses Pydantic, well-scoped side-effects, immutable state, approved dependencies, and explicit error handling.

64
.cursorrules Normal file
View File

@@ -0,0 +1,64 @@
# follow **every** rule exactly; report any violation instead of silently fixing it.
You must prioritize straightforward code semantics, well-named types, clear function signatures, and robust, carefully-chosen abstractions. Think about how your decisions might impact these aspects of code quality before proposing any changes.
You can use the advanced features of `typing`. You have access to all of the new features from Python 3.13, 3.12, 3.11...
**When you're done making your changes, remove any redundant comments that you may have left; the comments that remain should only apply to complex segments of code, adding relevant context.**
## 1. Code Discipline
* Eliminate superfluous `try` / `catch` and `if` branches through strict typing and static analysis.
* Use pure functions unless you must mutate fixed state—then wrap that state in a class.
* Every function is **referentially transparent**: same inputs ⇒ same outputs, no hidden state, no unintended I/O.
* Put side-effects in injectable “effect handlers”; keep core logic pure.
## 2. Naming
* Choose descriptive, non-abbreviated names—no 3-letter acronyms or non-standard contractions.
* Anyone reading a functions type signature alone should grasp its purpose without extra context.
## 3. Typing
* Maintain **strict, exhaustive** typing; never bypass the type-checker.
* Default to `Literal[...]` when an enum-like set is needed.
* Prefer built-in types; when two values share structure but differ in meaning, enforce separation:
* Use `typing.NewType` for primitives (zero runtime cost).
* For serialisable objects, add a `type: str` field that states the objects identity.
## 4. Pydantic
* Read, respect, and rely on Pydantic docs.
* Centralise a common `ConfigDict` with `frozen=True` and `strict=True` (or stricter) and reuse it everywhere.
* For hierarchies of `BaseModel` variants, declare a discriminated union with `typing.Annotated[Base, Field(discriminator='variant')]`; publish a single `TypeAdapter[Base]` so all variants share one strict validator.
## 5. IDs & UUIDs
* Subclass Pydantics `UUID4` for custom ID types.
* Generate fresh IDs with `uuid.uuid4()`.
* Create idempotency keys by hashing *persisted* state plus a **function-specific salt** to avoid collisions after crashes.
## 6. Error Handling
* Catch an exception **only** where you can handle or transform it meaningfully.
* State in the docstring **where** each exception is expected to be handled and **why**.
## 7. Dependencies
* Introduce new external dependencies only after approval.
* Request only libraries common in production environments.
## 8. Use of `@final` & Freezing
* Mark classes, methods, and variables as `@final` or otherwise immutable wherever applicable.
## 9. Repository Workflow
If you spot a rule violation within code that you've not been asked to work on directly, inform the user rather than patching it ad-hoc.
---
### One-Sentence Summary
Write strictly-typed, pure, self-describing Python that uses Pydantic, well-scoped side-effects, immutable state, approved dependencies, and explicit error handling

1
.envrc Normal file
View File

@@ -0,0 +1 @@
use flake

2
.gitattributes vendored
View File

@@ -1,2 +0,0 @@
*.mp3 filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text

3
.githooks/post-checkout Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/sh
command -v git-lfs >/dev/null 2>&1 || { printf >&2 "\n%s\n\n" "This repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting the 'post-checkout' file in the hooks directory (set by 'core.hookspath'; usually '.git/hooks')."; exit 2; }
git lfs post-checkout "$@"

3
.githooks/post-commit Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/sh
command -v git-lfs >/dev/null 2>&1 || { printf >&2 "\n%s\n\n" "This repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting the 'post-commit' file in the hooks directory (set by 'core.hookspath'; usually '.git/hooks')."; exit 2; }
git lfs post-commit "$@"

3
.githooks/post-merge Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/sh
command -v git-lfs >/dev/null 2>&1 || { printf >&2 "\n%s\n\n" "This repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting the 'post-merge' file in the hooks directory (set by 'core.hookspath'; usually '.git/hooks')."; exit 2; }
git lfs post-merge "$@"

3
.githooks/pre-push Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/sh
command -v git-lfs >/dev/null 2>&1 || { printf >&2 "\n%s\n\n" "This repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting the 'pre-push' file in the hooks directory (set by 'core.hookspath'; usually '.git/hooks')."; exit 2; }
git lfs pre-push "$@"

3
.github/CODEOWNERS vendored Normal file
View File

@@ -0,0 +1,3 @@
* @ToxicPine
* @AlexCheema
* @GeluVrabie

43
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,43 @@
---
name: Bug Report
about: Create a report to help us improve
title: '[BUG] '
labels: bug
assignees: ''
---
## Describe the bug
A clear and concise description of what the bug is.
## To Reproduce
Steps to reproduce the behavior:
1.
2.
3.
## Expected behavior
A clear and concise description of what you expected to happen.
## Actual behavior
A clear and concise description of what actually happened.
## Environment
- macOS Version:
- EXO Version:
- Hardware:
- Device 1: (e.g., MacBook Pro M1 Max, 32GB RAM)
- Device 2: (e.g., Mac Mini M2, 16GB RAM)
- Additional devices:
- Interconnection:
- (e.g., Thunderbolt 4 cable between Device 1 and 2)
- (e.g., WiFi 6 for Device 3)
- (e.g., 10GbE Ethernet between all devices)
## Additional context
Add any other context about the problem here.

View File

@@ -0,0 +1,11 @@
---
name: Feature Request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''
---
<!-- Please use a clear, descriptive title above -->
Describe what you'd like to see added to EXO.

View File

@@ -0,0 +1,16 @@
name: Commit if changed
description: "Create a commit when the working tree is dirty"
inputs:
message:
description: "Commit message"
required: true
runs:
using: composite
steps:
- name: Commit changed files
shell: bash
run: |
git diff --quiet && exit 0
git commit -am "${{ inputs.message }}"

10
.github/actions/format/action.yml vendored Normal file
View File

@@ -0,0 +1,10 @@
name: Format Code
description: "Run code formatter"
runs:
using: "composite"
steps:
- name: Format code
run: nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just fmt
shell: bash

10
.github/actions/lint-check/action.yml vendored Normal file
View File

@@ -0,0 +1,10 @@
name: Lint Check
description: "Check for lint errors"
runs:
using: "composite"
steps:
- name: Lint check
run: nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just lint-check
shell: bash

10
.github/actions/lint/action.yml vendored Normal file
View File

@@ -0,0 +1,10 @@
name: Lint Code
description: "Run code linter"
runs:
using: "composite"
steps:
- name: Lint code
run: nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just lint
shell: bash

View File

@@ -0,0 +1,10 @@
name: Regenerate Protobufs
description: "Regenerate protobuf files"
runs:
using: "composite"
steps:
- name: Regenerate protobufs
run: nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just regenerate-protobufs
shell: bash

View File

@@ -0,0 +1,20 @@
name: Setup Python & uv
description: "Regenerate Python environment from uv.lock"
runs:
using: "composite"
steps:
- name: Install uv
uses: astral-sh/setup-uv@v6
with:
enable-cache: true
cache-dependency-glob: uv.lock
- name: Install Python
run: uv python install
shell: bash
- name: Sync
run: uv sync --locked --all-extras --dev
shell: bash

12
.github/actions/typecheck/action.yml vendored Normal file
View File

@@ -0,0 +1,12 @@
name: Type Check
description: "Run type checker"
runs:
using: "composite"
steps:
- name: Run type checker
run: |
nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just sync
nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just check
shell: bash

12
.github/actions/unit-test/action.yml vendored Normal file
View File

@@ -0,0 +1,12 @@
name: Unit Test
description: "Run unit tests"
runs:
using: "composite"
steps:
- name: Run unit tests
run: |
nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just sync-clean
nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just test-fast
shell: bash

20
.github/actions/verify-clean/action.yml vendored Normal file
View File

@@ -0,0 +1,20 @@
name: Verify Clean Working Tree
description: "Fail the job if the previous step left the working tree dirty"
inputs:
step:
description: "The name of the step that just executed"
required: true
runs:
using: composite
steps:
- name: Check git diff
shell: bash
run: |
if ! git diff --quiet; then
echo "Error: ${{ inputs.step }} left working tree dirty." >&2
git --no-pager diff >&2
exit 1
fi

401
.github/bench.py vendored
View File

@@ -1,401 +0,0 @@
import aiohttp
import asyncio
import time
import json
import os
import boto3
from typing import Dict, Any
from datetime import datetime
import subprocess
import psutil
import platform
from pathlib import Path
def check_system_state():
print("\n=== System State Check ===", flush=True)
# Add macOS-specific checks
try:
# Check powermetrics with sudo
try:
power_metrics = subprocess.run(
['sudo', 'powermetrics', '-n', '1', '-i', '1000', '--samplers', 'cpu_power'],
capture_output=True, text=True
)
print("\nPower Metrics:", power_metrics.stdout, flush=True)
except Exception as e:
print(f"Error getting power metrics: {e}", flush=True)
# Check thermal state
thermal_state = subprocess.run(['pmset', '-g', 'therm'], capture_output=True, text=True)
print("\nThermal State:", thermal_state.stdout, flush=True)
# Check if running under Rosetta
arch = subprocess.run(['arch'], capture_output=True, text=True)
print("\nArchitecture:", arch.stdout, flush=True)
# Check MLX compilation mode - only if mlx is available
try:
import mlx.core as mx
if hasattr(mx, 'build_info'):
print("\nMLX Build Info:", mx.build_info(), flush=True)
else:
print("\nMLX Build Info: Not available in this version", flush=True)
except ImportError:
print("\nMLX: Not installed", flush=True)
except Exception as e:
print(f"\nError checking MLX: {e}", flush=True)
except Exception as e:
print(f"Error in macOS checks: {e}", flush=True)
# CPU Info
print("\nCPU Information:", flush=True)
try:
if platform.system() == 'Darwin' and platform.processor() == 'arm':
# Use sysctl for Apple Silicon Macs
cpu_info = subprocess.run(['sysctl', 'machdep.cpu'], capture_output=True, text=True)
if cpu_info.returncode == 0:
print(f"CPU Info (Apple Silicon):", cpu_info.stdout, flush=True)
# Parse powermetrics output for clearer CPU frequency display
try:
power_metrics = subprocess.run(
['sudo', 'powermetrics', '-n', '1', '-i', '100', '--samplers', 'cpu_power'],
capture_output=True, text=True
)
if power_metrics.returncode == 0:
output = power_metrics.stdout
print("\nDetailed CPU Frequency Information:")
# Extract cluster frequencies and max frequencies
current_cluster = None
max_freqs = {'E': 0, 'P0': 0, 'P1': 0}
for line in output.split('\n'):
# Track which cluster we're processing
if "E-Cluster" in line:
current_cluster = 'E'
elif "P0-Cluster" in line:
current_cluster = 'P0'
elif "P1-Cluster" in line:
current_cluster = 'P1'
# Get current frequencies
if "HW active frequency:" in line:
freq = line.split(':')[1].strip()
if freq != "0 MHz":
print(f"Current {current_cluster}-Cluster Frequency: {freq}")
# Get max frequencies from residency lines
if current_cluster and "active residency:" in line and "MHz:" in line:
try:
# Extract all frequency values
freqs = []
parts = line.split('MHz:')[:-1] # Skip last part as it's not a frequency
for part in parts:
freq_str = part.split()[-1]
try:
freq = float(freq_str)
freqs.append(freq)
except ValueError:
continue
if freqs:
max_freqs[current_cluster] = max(max_freqs[current_cluster], max(freqs))
except Exception:
continue
# Print max frequencies
print("\nMaximum Available Frequencies:")
for cluster, max_freq in max_freqs.items():
if max_freq > 0:
print(f"{cluster}-Cluster Max: {max_freq:.0f} MHz")
except Exception as e:
print(f"Error parsing powermetrics: {e}", flush=True)
else:
# Use psutil for other systems
cpu_freq = psutil.cpu_freq()
print(f"CPU Frequency - Current: {cpu_freq.current:.2f}MHz, Min: {cpu_freq.min:.2f}MHz, Max: {cpu_freq.max:.2f}MHz", flush=True)
print(f"\nCPU Usage per Core: {psutil.cpu_percent(percpu=True)}%", flush=True)
# Check if running in low power mode
power_mode = subprocess.run(['pmset', '-g'], capture_output=True, text=True)
print("\nPower Settings:", power_mode.stdout, flush=True)
except Exception as e:
print(f"Error getting CPU info: {e}", flush=True)
# Memory Info
print("\nMemory Information:", flush=True)
try:
mem = psutil.virtual_memory()
print(f"Total: {mem.total/1024/1024/1024:.2f}GB", flush=True)
print(f"Available: {mem.available/1024/1024/1024:.2f}GB", flush=True)
print(f"Used: {mem.used/1024/1024/1024:.2f}GB ({mem.percent}%)", flush=True)
# Check swap
swap = psutil.swap_memory()
print(f"Swap Used: {swap.used/1024/1024/1024:.2f}GB of {swap.total/1024/1024/1024:.2f}GB", flush=True)
except Exception as e:
print(f"Error getting memory info: {e}", flush=True)
# GPU Info
print("\nGPU Information:", flush=True)
try:
# Check MLX GPU settings
print("MLX Environment Variables:", flush=True)
mlx_vars = {k: v for k, v in os.environ.items() if k.startswith('MLX')}
print(json.dumps(mlx_vars, indent=2), flush=True)
# Check Metal GPU memory allocation
gpu_mem = subprocess.run(['sysctl', 'iogpu'], capture_output=True, text=True)
print("GPU Memory Settings:", gpu_mem.stdout, flush=True)
except Exception as e:
print(f"Error getting GPU info: {e}", flush=True)
# Process Priority
print("\nProcess Priority Information:", flush=True)
try:
current_process = psutil.Process()
print(f"Process Nice Value: {current_process.nice()}", flush=True)
# Only try to get ionice if the platform supports it
if hasattr(current_process, 'ionice'):
print(f"Process IO Nice Value: {current_process.ionice()}", flush=True)
except Exception as e:
print(f"Error getting process priority info: {e}", flush=True)
# System Load
print("\nSystem Load:", flush=True)
try:
load_avg = psutil.getloadavg()
print(f"Load Average: {load_avg}", flush=True)
# Get top processes by CPU and Memory
print("\nTop Processes by CPU Usage:", flush=True)
processes = []
for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_percent']):
try:
pinfo = proc.info
if pinfo['cpu_percent'] is not None and pinfo['memory_percent'] is not None:
processes.append(pinfo)
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
# Sort and display top 5 CPU-consuming processes
sorted_by_cpu = sorted(processes, key=lambda x: x['cpu_percent'] or 0, reverse=True)[:5]
for proc in sorted_by_cpu:
print(f"PID: {proc['pid']}, Name: {proc['name']}, CPU: {proc['cpu_percent']}%, Memory: {proc['memory_percent']:.1f}%")
except Exception as e:
print(f"Error getting system load info: {e}", flush=True)
print("\n=== End System State Check ===\n", flush=True)
def check_gpu_access():
try:
# Check if MLX can see the GPU
import mlx.core as mx
print("MLX device info:", mx.default_device())
# Check Metal device availability
result = subprocess.run(['system_profiler', 'SPDisplaysDataType'], capture_output=True, text=True)
print("GPU Info:", result.stdout)
except Exception as e:
print(f"Failed to check GPU access: {e}")
async def measure_performance(api_endpoint: str, prompt: str, model: str) -> Dict[str, Any]:
"""
Measures the performance of an API endpoint by sending a prompt and recording metrics.
Args:
api_endpoint (str): The API endpoint URL.
prompt (str): The prompt to send to the API.
Returns:
Dict[str, Any]: A dictionary containing performance metrics or error information.
"""
results = {
'model': model,
'run_id': os.environ.get('GITHUB_RUN_ID', 'unknown'),
'branch': os.environ.get('GITHUB_REF_NAME', 'unknown'),
'commit': os.environ.get('GITHUB_SHA', 'unknown'),
'configuration': json.loads(os.environ.get('HARDWARE_CONFIG', '{}'))
}
# Get token count
session = aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=600, connect=10, sock_read=600, sock_connect=10))
try:
response = await session.post(
"http://localhost:52415/v1/chat/token/encode",
json={
"model": model,
"messages": [{"role": "user", "content": prompt}]
}
)
response.raise_for_status()
token_data = await response.json()
results['prompt_len'] = token_data['num_tokens']
except Exception as e:
await session.close()
raise RuntimeError(f"Failed to get token count: {str(e)}")
# Measure completion performance
try:
start_time = time.time()
response = await session.post(
api_endpoint,
json={
"model": model,
"messages": [{"role": "user", "content": prompt}],
"temperature": 0,
"stream": True
}
)
response.raise_for_status()
first_token_time = None
total_tokens = 0
async for line in response.content.iter_chunks():
line = line[0].decode('utf-8').strip()
if not line.startswith('data: '):
continue
data = json.loads(line[6:]) # Skip 'data: ' prefix
if content := data.get('choices', [{}])[0].get('delta', {}).get('content'):
print(f"Received content: {content}", flush=True)
if first_token_time is None:
first_token_time = time.time()
ttft = first_token_time - start_time
results.update({
'ttft': ttft,
'prompt_tps': results['prompt_len'] / ttft
})
total_tokens += 1
total_time = time.time() - start_time
results.update({
'generation_tps': total_tokens / total_time,
'response_len': total_tokens,
'total_time': total_time
})
except Exception as e:
raise RuntimeError(f"Performance measurement failed: {str(e)}")
finally:
await session.close()
return results
async def main() -> None:
api_endpoint = "http://localhost:52415/v1/chat/completions"
# Define prompts
prompt_warmup = "what is the capital of France?"
prompt_essay = "write an essay about cats"
model = os.environ.get('model', 'llama-3.2-1b')
# Warmup request
print("\nPerforming warmup request...", flush=True)
try:
warmup_results = await measure_performance(api_endpoint, prompt_warmup, model)
print("Warmup completed successfully", flush=True)
except Exception as e:
print(f"Warmup request failed: {e}", flush=True)
# Measure performance for the essay prompt
print("\nMeasuring performance for the essay prompt...", flush=True)
results = await measure_performance(api_endpoint, prompt_essay, model)
try:
s3_client = boto3.client(
's3',
aws_access_key_id=os.environ.get('aws_access_key_id'),
aws_secret_access_key=os.environ.get('aws_secret_key')
)
job_name = os.environ.get('GITHUB_JOB')
# Create S3 key with timestamp and commit info
now = datetime.utcnow()
timestamp = now.strftime('%H-%M-%S')
commit_sha = os.environ.get('GITHUB_SHA', 'unknown')[:7]
s3_key = f"{job_name}/{model}/{now.year}/{now.month}/{now.day}/{timestamp}_{commit_sha}.json"
# Upload to S3
s3_client.put_object(
Bucket='exo-benchmarks',
Key=s3_key,
Body=json.dumps(results),
ContentType='application/json'
)
print(f"Performance metrics uploaded to S3: s3://exo-benchmarks/{s3_key}", flush=True)
except Exception as e:
print(f"Failed to upload metrics to S3: {e}", flush=True)
# Optionally print the metrics for visibility
print("Performance metrics:", flush=True)
print(json.dumps(results, indent=4), flush=True)
def optimize_system_performance():
"""Set optimal system performance settings before running benchmark."""
try:
# Try to set high performance power mode
subprocess.run(['sudo', 'pmset', '-a', 'powermode', '2'], check=False)
# Ensure MLX uses performance cores and GPU
os.environ['MLX_FORCE_P_CORES'] = '1'
os.environ['MLX_METAL_PREWARM'] = '1'
os.environ['MLX_USE_GPU'] = '1'
# Set process priority
current_process = psutil.Process()
try:
# Set highest priority
subprocess.run(['sudo', 'renice', '-n', '-20', '-p', str(current_process.pid)], check=False)
# Print current process state
print("\nProcess State Before Benchmark:", flush=True)
proc_info = subprocess.run(
['ps', '-o', 'pid,ppid,user,%cpu,%mem,nice,stat,pri,command', '-p', str(current_process.pid)],
capture_output=True, text=True
)
print(proc_info.stdout, flush=True)
# Verify power mode
power_info = subprocess.run(['pmset', '-g'], capture_output=True, text=True)
if 'powermode 0' in power_info.stdout:
print("\nWarning: System still in normal power mode. Trying to set high performance mode again...", flush=True)
subprocess.run(['sudo', 'pmset', '-a', 'powermode', '2'], check=False)
except Exception as e:
print(f"Warning: Could not set process priority: {e}", flush=True)
except Exception as e:
print(f"Warning: Could not optimize system performance: {e}", flush=True)
# Print optimization status
print("\nOptimization Settings:", flush=True)
print("MLX Environment Variables:", flush=True)
for var in ['MLX_FORCE_P_CORES', 'MLX_METAL_PREWARM', 'MLX_USE_GPU']:
print(f"{var}: {os.environ.get(var, 'Not set')}", flush=True)
try:
nice_value = psutil.Process().nice()
print(f"Process Nice Value: {nice_value}", flush=True)
if nice_value != -20:
print("Warning: Process not running at highest priority", flush=True)
except Exception:
pass
if __name__ == "__main__":
check_system_state()
check_gpu_access()
optimize_system_performance()
asyncio.run(main())

330
.github/bootstrap.sh vendored
View File

@@ -1,330 +0,0 @@
#!/bin/bash
set -e
command_exists() {
command -v "$1" >/dev/null 2>&1
}
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
if [ "$EUID" -eq 0 ]; then
log "Please do not run as root. Run as regular user with sudo access."
exit 1
fi
# Check for required arguments
if [ -z "$1" ]; then
log "Error: Runner token is required"
log "Usage: $0 <runner-token> [tailscale-auth-key]"
exit 1
fi
RUNNER_TOKEN=$1
TAILSCALE_AUTH_KEY=$2
REPO="exo-explore/exo"
# Add sudoers configuration
log "Configuring sudo access..."
SUDOERS_CONTENT="$(whoami) ALL=(ALL) NOPASSWD: ALL"
echo "$SUDOERS_CONTENT" | sudo tee /etc/sudoers.d/github-runner > /dev/null
sudo chmod 440 /etc/sudoers.d/github-runner
log "Configuring privacy permissions..."
sudo tccutil reset All
sudo tccutil reset SystemPolicyAllFiles
sudo tccutil reset SystemPolicyNetworkVolumes
# Configure power management for maximum performance
log "Configuring power management..."
sudo pmset -a powermode 2 # Force highest performance mode
sudo pmset -a gpuswitch 2 # Force discrete/high-performance GPU
sudo pmset -a lowpowermode 0
sudo pmset -a lessbright 0
sudo pmset -a disablesleep 1
sudo pmset -a sleep 0
sudo pmset -a hibernatemode 0
sudo pmset -a autopoweroff 0
sudo pmset -a standby 0
sudo pmset -a powernap 0
# For Python specifically
PYTHON_PATH="/opt/homebrew/bin/python3.12"
sudo chmod 755 "$PYTHON_PATH"
# Add to firewall
log "Configuring firewall access..."
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add "$PYTHON_PATH"
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock "$PYTHON_PATH"
# Set Homebrew paths based on architecture
if [ "$(uname -p)" = "arm" ]; then
BREW_PREFIX="/opt/homebrew"
else
BREW_PREFIX="/usr/local"
fi
# Install Homebrew if not present
if ! command_exists brew; then
log "Installing Homebrew..."
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
eval "$(/opt/homebrew/bin/brew shellenv)"
fi
# Install required packages
log "Installing required packages..."
export HOMEBREW_NO_AUTO_UPDATE=1
brew install python@3.12 coreutils
# Optional Tailscale setup if auth key is provided
if [ -n "$TAILSCALE_AUTH_KEY" ]; then
log "Installing and configuring Tailscale..."
brew install --quiet tailscale
sudo brew services stop tailscale 2>/dev/null || true
sudo rm -f /var/db/tailscale/tailscaled.state 2>/dev/null || true
sudo brew services start tailscale
sleep 2
sudo tailscale up --authkey=$TAILSCALE_AUTH_KEY
# Enable SSH and Screen Sharing
log "Enabling remote access services..."
sudo launchctl load -w /System/Library/LaunchDaemons/ssh.plist
sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart \
-activate \
-configure -access -on \
-configure -allowAccessFor -allUsers \
-configure -restart -agent -privs -all
# Create launch daemon for remote access
sudo bash -c 'cat > /Library/LaunchDaemons/com.remote.access.setup.plist' << 'EOL'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.remote.access.setup</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>-c</string>
<string>
launchctl load -w /System/Library/LaunchDaemons/ssh.plist;
/System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart -activate -configure -access -on
</string>
</array>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
EOL
sudo chmod 644 /Library/LaunchDaemons/com.remote.access.setup.plist
sudo launchctl load -w /Library/LaunchDaemons/com.remote.access.setup.plist
fi
# Configure GitHub Actions Runner
log "Gathering system metadata..."
MACHINE_NAME=$(scutil --get ComputerName)
MACHINE_NAME="runner-$(echo -n "$MACHINE_NAME" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]-')"
# Enhanced Apple Silicon detection
MACHINE_INFO=$(system_profiler SPHardwareDataType)
CHIP_FULL=$(echo "$MACHINE_INFO" | grep "Chip" | cut -d: -f2 | xargs)
if [[ $CHIP_FULL =~ "Apple" ]]; then
CHIP_MODEL=$(echo "$CHIP_FULL" | sed 's/^Apple //' | tr -d ' ' | tr '[:lower:]' '[:upper:]')
GPU_CORES=$(ioreg -l | grep "gpu-core-count" | awk -F'= ' '{print $2}')
if [ -z "$GPU_CORES" ]; then
GPU_CORES="N/A"
fi
else
CHIP_MODEL="Intel"
GPU_CORES="N/A"
fi
MEMORY=$(($(sysctl -n hw.memsize) / 1024 / 1024 / 1024))
# Set up GitHub Runner
RUNNER_DIR="$HOME/actions-runner"
# Check if runner is already configured
if [ -f "$RUNNER_DIR/.runner" ]; then
log "Runner already configured. Stopping existing service..."
sudo launchctl unload /Library/LaunchDaemons/com.github.runner.plist 2>/dev/null || true
fi
# Create runner directory if it doesn't exist
mkdir -p "$RUNNER_DIR"
cd "$RUNNER_DIR"
CUSTOM_LABELS="self-hosted,macos,arm64,${CHIP_MODEL}_GPU${GPU_CORES}_${MEMORY}GB"
# Only download and extract if not already present or if forced
if [ ! -f "$RUNNER_DIR/run.sh" ] || [ "${FORCE_SETUP:-false}" = "true" ]; then
log "Downloading GitHub Actions runner..."
RUNNER_VERSION=$(curl -s https://api.github.com/repos/actions/runner/releases/latest | grep '"tag_name":' | cut -d'"' -f4)
curl -o actions-runner.tar.gz -L "https://github.com/actions/runner/releases/download/${RUNNER_VERSION}/actions-runner-osx-arm64-${RUNNER_VERSION#v}.tar.gz"
tar xzf actions-runner.tar.gz
rm actions-runner.tar.gz
else
log "Runner already downloaded, skipping download step"
fi
log "Configuring runner with labels: $CUSTOM_LABELS"
./config.sh --unattended \
--url "https://github.com/${REPO}" \
--token "${RUNNER_TOKEN}" \
--name "${MACHINE_NAME}" \
--labels "${CUSTOM_LABELS}" \
--work "_work"
# Set optimal performance settings
log "Configuring system for optimal performance..."
# Configure CPU performance
log "Setting CPU performance controls..."
# Disable timer coalescing
sudo sysctl -w kern.timer.coalescing_enabled=0
sudo sysctl -w kern.timer_coalesce_bg_scale=-5
sudo sysctl -w kern.timer_resort_threshold_ns=0
# Set minimum timer intervals
sudo sysctl -w kern.wq_max_timer_interval_usecs=1000
sudo sysctl -w kern.timer_coalesce_bg_ns_max=1000
# Set minimum timer coalescing for all tiers
sudo sysctl -w kern.timer_coalesce_tier0_scale=-5
sudo sysctl -w kern.timer_coalesce_tier0_ns_max=1000
sudo sysctl -w kern.timer_coalesce_tier1_scale=-5
sudo sysctl -w kern.timer_coalesce_tier1_ns_max=1000
sudo sysctl -w kern.timer_coalesce_tier2_scale=-5
sudo sysctl -w kern.timer_coalesce_tier2_ns_max=1000
sudo sysctl -w kern.timer_coalesce_tier3_scale=-5
sudo sysctl -w kern.timer_coalesce_tier3_ns_max=1000
sudo sysctl -w kern.timer_coalesce_tier4_scale=-5
sudo sysctl -w kern.timer_coalesce_tier4_ns_max=1000
# Disable QoS restrictions
sudo sysctl -w net.qos.policy.restricted=0
sudo sysctl -w net.qos.policy.restrict_avapps=0
sudo sysctl -w net.qos.policy.wifi_enabled=0
sudo sysctl -w net.qos.policy.capable_enabled=0
# Set scheduler parameters
sudo sysctl -w kern.sched_rt_avoid_cpu0=0
sudo sysctl -w debug.sched=2
sudo sysctl -w net.pktsched.netem.sched_output_ival_ms=1
# Clean up any existing runner services
log "Cleaning up existing runner services..."
for service in com.github.runner com.github.runner.monitor com.github.runner.cpuaffinity com.github.runner.affinity; do
sudo launchctl bootout system/$service 2>/dev/null || true
sudo rm -f /Library/LaunchDaemons/$service.plist
done
# Create a simple runner service configuration
sudo tee /Library/LaunchDaemons/com.github.runner.plist > /dev/null << EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.github.runner</string>
<key>UserName</key>
<string>$(whoami)</string>
<key>GroupName</key>
<string>staff</string>
<key>WorkingDirectory</key>
<string>$RUNNER_DIR</string>
<key>ProgramArguments</key>
<array>
<string>$RUNNER_DIR/run.sh</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<dict>
<key>SuccessfulExit</key>
<false/>
<key>Crashed</key>
<true/>
</dict>
<key>ProcessType</key>
<string>Interactive</string>
<key>LowPriorityIO</key>
<false/>
<key>AbandonProcessGroup</key>
<false/>
<key>EnableTransactions</key>
<true/>
<key>ThrottleInterval</key>
<integer>0</integer>
<key>HardResourceLimits</key>
<dict>
<key>NumberOfFiles</key>
<integer>524288</integer>
<key>MemoryLock</key>
<integer>-1</integer>
</dict>
<key>SoftResourceLimits</key>
<dict>
<key>NumberOfFiles</key>
<integer>524288</integer>
<key>MemoryLock</key>
<integer>-1</integer>
</dict>
<key>QOSClass</key>
<string>User-Interactive</string>
<key>StandardOutPath</key>
<string>$RUNNER_DIR/_diag/runner.log</string>
<key>StandardErrorPath</key>
<string>$RUNNER_DIR/_diag/runner.err</string>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>/usr/local/bin:/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
</dict>
<key>Nice</key>
<integer>-20</integer>
</dict>
</plist>
EOF
# Set proper permissions for the LaunchDaemon
sudo chown root:wheel /Library/LaunchDaemons/com.github.runner.plist
sudo chmod 644 /Library/LaunchDaemons/com.github.runner.plist
# Remove any existing service
sudo launchctl bootout system/com.github.runner 2>/dev/null || true
# Load the new service using bootstrap
sudo launchctl bootstrap system /Library/LaunchDaemons/com.github.runner.plist
# Add Runner.Listener permissions (after runner installation)
RUNNER_PATH="$RUNNER_DIR/bin/Runner.Listener"
sudo chmod 755 "$RUNNER_PATH"
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add "$RUNNER_PATH"
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock "$RUNNER_PATH"
# Create connection info file if Tailscale is configured
if [ -n "$TAILSCALE_AUTH_KEY" ]; then
TAILSCALE_IP=$(tailscale ip)
cat > "$HOME/remote_access_info.txt" << EOL
Mac Remote Access Information
============================
Computer Name: $MACHINE_NAME
Username: $USER
Tailscale IP: $TAILSCALE_IP
SSH Command: ssh $USER@$TAILSCALE_IP
Screen Sharing: vnc://$TAILSCALE_IP
EOL
chmod 600 "$HOME/remote_access_info.txt"
fi
log "Verifying runner service status..."
if sudo launchctl list | grep com.github.runner > /dev/null; then
log "GitHub Actions runner service is running successfully!"
log "Runner labels: $CUSTOM_LABELS"
[ -n "$TAILSCALE_AUTH_KEY" ] && log "Remote access details saved to: $HOME/remote_access_info.txt"
else
log "Error: Failed to start GitHub Actions runner service"
exit 1
fi

View File

@@ -1,95 +0,0 @@
#!/bin/bash
set -e
# Function to log with timestamp
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
log "Applying comprehensive performance optimizations..."
# System-wide power management
log "Configuring power management..."
sudo pmset -a lessbright 0
sudo pmset -a disablesleep 1
sudo pmset -a sleep 0
sudo pmset -a hibernatemode 0
sudo pmset -a autopoweroff 0
sudo pmset -a standby 0
sudo pmset -a powernap 0
sudo pmset -a proximitywake 0
sudo pmset -a tcpkeepalive 1
sudo pmset -a powermode 2
sudo pmset -a gpuswitch 2
sudo pmset -a displaysleep 0
sudo pmset -a disksleep 0
# Memory and kernel optimizations
log "Configuring memory and kernel settings..."
sudo sysctl -w kern.memorystatus_purge_on_warning=0
sudo sysctl -w kern.memorystatus_purge_on_critical=0
sudo sysctl -w kern.timer.coalescing_enabled=0
# Metal and GPU optimizations
log "Configuring Metal and GPU settings..."
defaults write com.apple.CoreML MPSEnableGPUValidation -bool false
defaults write com.apple.CoreML MPSEnableMetalValidation -bool false
defaults write com.apple.CoreML MPSEnableGPUDebug -bool false
defaults write com.apple.Metal GPUDebug -bool false
defaults write com.apple.Metal GPUValidation -bool false
defaults write com.apple.Metal MetalValidation -bool false
defaults write com.apple.Metal MetalCaptureEnabled -bool false
defaults write com.apple.Metal MTLValidationBehavior -string "Disabled"
defaults write com.apple.Metal EnableMTLDebugLayer -bool false
defaults write com.apple.Metal MTLDebugLevel -int 0
defaults write com.apple.Metal PreferIntegratedGPU -bool false
defaults write com.apple.Metal ForceMaximumPerformance -bool true
defaults write com.apple.Metal MTLPreferredDeviceGPUFrame -bool true
# Create MPS cache directory with proper permissions
sudo mkdir -p /tmp/mps_cache
sudo chmod 777 /tmp/mps_cache
# Process and resource limits
log "Configuring process limits..."
sudo launchctl limit maxfiles 524288 524288
ulimit -n 524288 || log "Warning: Could not set file descriptor limit"
ulimit -c 0
ulimit -l unlimited || log "Warning: Could not set memory lock limit"
# Export performance-related environment variables
cat << 'EOF' > /tmp/performance_env.sh
# Metal optimizations
export MTL_DEBUG_LAYER=0
export METAL_DEVICE_WRAPPER_TYPE=1
export METAL_DEBUG_ERROR_MODE=0
export METAL_FORCE_PERFORMANCE_MODE=1
export METAL_DEVICE_PRIORITY=high
export METAL_MAX_COMMAND_QUEUES=1024
export METAL_LOAD_LIMIT=0
export METAL_VALIDATION_ENABLED=0
export METAL_ENABLE_VALIDATION_LAYER=0
export OBJC_DEBUG_MISSING_POOLS=NO
export MPS_CACHEDIR=/tmp/mps_cache
# MLX optimizations
export MLX_USE_GPU=1
export MLX_METAL_COMPILE_ASYNC=1
export MLX_METAL_PREALLOCATE=1
export MLX_METAL_MEMORY_GUARD=0
export MLX_METAL_CACHE_KERNELS=1
export MLX_PLACEMENT_POLICY=metal
export MLX_METAL_VALIDATION=0
export MLX_METAL_DEBUG=0
export MLX_FORCE_P_CORES=1
export MLX_METAL_MEMORY_BUDGET=0
export MLX_METAL_PREWARM=1
# Python optimizations
export PYTHONUNBUFFERED=1
export PYTHONOPTIMIZE=2
export PYTHONHASHSEED=0
export PYTHONDONTWRITEBYTECODE=1
EOF
log "Performance optimizations completed. Environment variables written to /tmp/performance_env.sh"

23
.github/pull_request_template.md vendored Normal file
View File

@@ -0,0 +1,23 @@
## Motivation
<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->
## Changes
<!-- Describe what you changed in detail -->
## Why It Works
<!-- Explain why your approach solves the problem -->
## Test Plan
### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->
### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover this change -->
<!-- - -->

View File

@@ -1,207 +0,0 @@
# This is the reusable workflow file
name: Distributed Job Runner
on:
workflow_call:
inputs:
config:
required: true
type: string
model:
required: true
type: string
calling_job_name:
required: true
type: string
network_interface:
required: true
type: string
jobs:
generate-matrix:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- id: set-matrix
env:
CONFIG: ${{ inputs.config }}
run: |
MATRIX=$(echo $CONFIG | jq -c '{cpu: [to_entries | .[] | .key as $k | range(.value) | $k]}')
echo "matrix=$MATRIX" >> $GITHUB_OUTPUT
run-distributed-job:
needs: generate-matrix
strategy:
matrix: ${{fromJson(needs.generate-matrix.outputs.matrix)}}
runs-on: ['self-hosted', 'macOS', '${{ matrix.cpu }}']
env:
HARDWARE_CONFIG: ${{ inputs.config }}
model: ${{ inputs.model }}
# Add performance-related environment variables
MTL_DEBUG_LAYER: 0
METAL_VALIDATION_ENABLED: 0
MLX_METAL_VALIDATION: 0
MLX_METAL_DEBUG: 0
MLX_FORCE_P_CORES: 1
MLX_METAL_PREWARM: 1
PYTHONOPTIMIZE: 2
steps:
- name: Cleanup workspace
run: |
sudo rm -rf "$GITHUB_WORKSPACE"
sudo mkdir -p "$GITHUB_WORKSPACE"
sudo chown -R $(whoami):$(id -g) "$GITHUB_WORKSPACE"
- uses: actions/checkout@v4
- name: Install dependencies
run: |
export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH"
python3.12 -m venv .venv || {
echo "Failed to find python3.12. Checking installation locations:"
ls -l /usr/local/bin/python* /opt/homebrew/bin/python* 2>/dev/null || true
exit 1
}
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
pip install boto3==1.35.76
- name: Apply Performance Optimizations
run: |
# Export performance-related environment variables
cat << 'EOF' > /tmp/performance_env.sh
# MLX and Metal optimizations
export MTL_DEBUG_LAYER=0
export METAL_VALIDATION_ENABLED=0
export MLX_METAL_VALIDATION=0
export MLX_METAL_DEBUG=0
export MLX_FORCE_P_CORES=1
export MLX_METAL_PREWARM=1
export PYTHONOPTIMIZE=2
EOF
# Source the performance environment variables
source /tmp/performance_env.sh
# MLX Memory Settings
./configure_mlx.sh
# Verify optimizations
echo "Verifying performance settings..."
env | grep -E "MLX_|METAL_|MTL_"
- name: Run exo
env:
aws_access_key_id: ${{ secrets.S3_EXO_BENCHMARKS_AWS_ACCESS_KEY_ID }}
aws_secret_key: ${{ secrets.S3_EXO_BENCHMARKS_AWS_SECRET_ACCESS_KEY }}
run: |
# Source performance environment variables
source /tmp/performance_env.sh
# Debug information
echo "Current commit SHA: $GITHUB_SHA"
git rev-parse HEAD
git status
CALLING_JOB="${{ inputs.calling_job_name }}"
UNIQUE_JOB_ID="${CALLING_JOB}_${model}_${GITHUB_RUN_ID}"
ALL_NODE_IDS=$(for i in $(seq ${{ strategy.job-total }} -1 0); do echo -n "${UNIQUE_JOB_ID}_${i},"; done | sed 's/,$//')
MY_NODE_ID="${UNIQUE_JOB_ID}_${{ strategy.job-index }}"
source .venv/bin/activate
export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH"
echo "=== Before starting exo ==="
ps -eo pid,ppid,user,%cpu,%mem,nice,state,pri,command | head -1
ps -eo pid,ppid,user,%cpu,%mem,nice,state,pri,command | grep -i python
echo "Starting exo daemon..."
echo "Power mode settings:"
sudo pmset -g
# Start exo with explicit process control
sudo taskpolicy -d default -g default -a -t 0 -l 0 .venv/bin/exo \
--node-id="${MY_NODE_ID}" \
--node-id-filter="${ALL_NODE_IDS}" \
--interface-type-filter="${{ inputs.network_interface }}" \
--disable-tui \
--max-generate-tokens 250 \
--chatgpt-api-response-timeout 900 \
--chatgpt-api-port 52415 > output1.log 2>&1 &
PID1=$!
echo "Exo process started with PID: $PID1"
tail -f output1.log &
TAIL1=$!
# Give process time to start
sleep 2
# Set additional process priorities
sudo renice -n -20 -p $PID1
sudo taskpolicy -t 4 -p $PID1
echo "=== After starting exo ==="
ps -eo pid,ppid,user,%cpu,%mem,nice,state,pri,command | head -1
ps -eo pid,ppid,user,%cpu,%mem,nice,state,pri,command | grep $PID1
echo "Additional process details:"
sudo powermetrics -n 1 -i 1000 --show-process-energy | grep -A 5 $PID1 || true
trap 'kill $TAIL1' EXIT
trap 'kill $PID1' EXIT
echo "Waiting for all nodes to connect..."
for i in {1..20}; do
echo "Attempt $i: Checking node count..."
nodes=$(curl -s http://localhost:52415/topology | jq ".nodes | length")
echo "Current node count: $nodes"
if [ "$nodes" -eq "${{ strategy.job-total }}" ]; then
echo "All nodes connected successfully!"
break
fi
if [ $i -eq 20 ]; then
echo "ERROR: Failed to connect all nodes after 20 attempts. Expected ${{ strategy.job-total }} nodes, but got $nodes"
exit 1
fi
sleep 5
done
if ! kill -0 $PID1 2>/dev/null; then
echo "ERROR: Instance (PID $PID1) died unexpectedly. Full log output:"
cat output1.log
exit 1
fi
if [ "${{ strategy.job-index }}" -eq "0" ]; then
sleep 10
echo "This is the primary node (index 0). Running benchmark..."
GITHUB_JOB=$CALLING_JOB python .github/bench.py
else
echo "This is a secondary node (index ${{ strategy.job-index }}). Waiting for completion..."
sleep 10
while true; do
echo "Checking if primary node is still running..."
nodes=$(curl -s http://localhost:52415/topology | jq ".nodes | length")
echo "Current node count: $nodes"
if [ "$nodes" -lt "${{ strategy.job-total }}" ]; then
echo "Primary node completed, exiting..."
break
fi
sleep 5
done
fi
- name: Check Final System State
if: always()
run: |
echo "=== Final System State ==="
sudo pmset -g
sudo powermetrics -n 1 -i 1000 --show-process-energy || true
system_profiler SPDisplaysDataType
sysctl iogpu
ps -eo pid,ppid,user,%cpu,%mem,nice,state,command | grep -i python
env | grep -E "MLX_|METAL_|MTL_"
echo "=== End Final System State ==="

View File

@@ -1,71 +0,0 @@
name: Build and Test
on:
push:
branches: [ '*' ]
tags: [ '*' ]
pull_request:
branches: [ '*' ]
jobs:
single-m4-pro:
strategy:
matrix:
model: ['llama-3.2-1b', 'llama-3.2-3b', 'llama-3.1-8b']
uses: ./.github/workflows/bench_job.yml
with:
config: '{"M4PRO_GPU16_24GB": 1}'
model: ${{ matrix.model }}
calling_job_name: 'single-m4-pro'
network_interface: 'Ethernet'
secrets: inherit
two-m4-pro-cluster:
strategy:
matrix:
model: ['llama-3.2-1b', 'llama-3.2-3b', 'llama-3.1-8b']
uses: ./.github/workflows/bench_job.yml
with:
config: '{"M4PRO_GPU16_24GB": 2}'
model: ${{ matrix.model }}
calling_job_name: 'two-m4-pro-cluster'
network_interface: 'Ethernet'
secrets: inherit
# two-m4-pro-cluster-thunderbolt:
# strategy:
# matrix:
# model: ['llama-3.2-1b', 'llama-3.2-3b', 'llama-3.1-8b']
# uses: ./.github/workflows/bench_job.yml
# with:
# config: '{"M4PRO_GPU16_24GB": 2}'
# model: ${{ matrix.model }}
# calling_job_name: 'two-m4-pro-cluster-thunderbolt'
# network_interface: 'Thunderbolt'
# secrets: inherit
three-m4-pro-cluster:
strategy:
matrix:
model: ['llama-3.2-1b', 'llama-3.2-3b', 'llama-3.1-8b', 'llama-3.3-70b']
fail-fast: false
uses: ./.github/workflows/bench_job.yml
with:
config: '{"M4PRO_GPU16_24GB": 3}'
model: ${{ matrix.model }}
calling_job_name: 'three-m4-pro-cluster'
network_interface: 'Ethernet'
secrets: inherit
# test-m3-single-node:
# strategy:
# matrix:
# model: ['llama-3.2-1b']
# fail-fast: false
# uses: ./.github/workflows/bench_job.yml
# with:
# config: '{"M3MAX_GPU40_128GB": 1}'
# model: ${{ matrix.model }}
# calling_job_name: 'test-m3-cluster'
# network_interface: 'Ethernet'
# secrets: inherit

442
.github/workflows/build-app.yml vendored Normal file
View File

@@ -0,0 +1,442 @@
name: Build EXO macOS DMG
# Release workflow:
# 1. Create a draft GitHub Release with the tag name (e.g. v1.0.0) and write release notes in markdown
# 2. Push the tag: git tag v1.0.0 && git push origin v1.0.0
# 3. This workflow builds, signs, and notarizes the DMG
# 4. Release notes are embedded in appcast.xml for Sparkle (rendered as markdown)
# 5. DMG and appcast.xml are uploaded to S3
# 6. The draft GitHub Release is published with the DMG attached
#
# For alpha releases (e.g. v1.0.0-alpha.1): draft release and notes are optional.
# If no draft exists, a release is auto-created with generated notes.
on:
workflow_dispatch:
push:
tags:
- "v*"
branches:
- "test-app"
jobs:
build-macos-app:
runs-on: "macos-26"
permissions:
contents: write
env:
SPARKLE_VERSION: 2.9.0-beta.1
SPARKLE_DOWNLOAD_PREFIX: ${{ secrets.SPARKLE_DOWNLOAD_PREFIX }}
SPARKLE_FEED_URL: ${{ secrets.SPARKLE_FEED_URL }}
SPARKLE_ED25519_PUBLIC: ${{ secrets.SPARKLE_ED25519_PUBLIC }}
SPARKLE_ED25519_PRIVATE: ${{ secrets.SPARKLE_ED25519_PRIVATE }}
SPARKLE_S3_BUCKET: ${{ secrets.SPARKLE_S3_BUCKET }}
SPARKLE_S3_PREFIX: ${{ secrets.SPARKLE_S3_PREFIX }}
EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT: ${{ secrets.EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT }}
AWS_REGION: ${{ secrets.AWS_REGION }}
EXO_BUILD_NUMBER: ${{ github.run_number }}
EXO_LIBP2P_NAMESPACE: ${{ github.ref_name }}
steps:
# ============================================================
# Checkout and tag validation
# ============================================================
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Derive release version from tag
run: |
if [[ "$GITHUB_REF_NAME" == "test-app" || "${{ github.event_name }}" == "workflow_dispatch" ]]; then
VERSION="0.0.0-alpha.0"
echo "IS_ALPHA=true" >> $GITHUB_ENV
else
VERSION="${GITHUB_REF_NAME#v}"
if [[ "$VERSION" == *-alpha* ]]; then
echo "IS_ALPHA=true" >> $GITHUB_ENV
else
echo "IS_ALPHA=false" >> $GITHUB_ENV
fi
fi
echo "RELEASE_VERSION=$VERSION" >> $GITHUB_ENV
- name: Compute build version from semver
run: |
VERSION="$RELEASE_VERSION"
# Extract major.minor.patch (strip prerelease suffix)
BASE_VERSION="${VERSION%%-*}"
MAJOR=$(echo "$BASE_VERSION" | cut -d. -f1)
MINOR=$(echo "$BASE_VERSION" | cut -d. -f2)
PATCH=$(echo "$BASE_VERSION" | cut -d. -f3)
# Extract prerelease number (e.g., "alpha.2" -> 2, or 999 for releases)
if [[ "$VERSION" == *-* ]]; then
PRERELEASE_PART="${VERSION#*-}"
PRERELEASE_NUM="${PRERELEASE_PART##*.}"
# Default to 0 if not a number
if ! [[ "$PRERELEASE_NUM" =~ ^[0-9]+$ ]]; then
PRERELEASE_NUM=0
fi
else
PRERELEASE_NUM=999
fi
# Compute: PRERELEASE + (1000 * PATCH) + (1_000_000 * MINOR) + (1_000_000_000 * MAJOR)
BUILD_VERSION=$((PRERELEASE_NUM + 1000 * PATCH + 1000000 * MINOR + 1000000000 * MAJOR))
echo "EXO_BUILD_VERSION=$BUILD_VERSION" >> $GITHUB_ENV
echo "Computed build version: $BUILD_VERSION from $VERSION"
- name: Ensure tag commit is on main
if: github.ref_type == 'tag'
run: |
git fetch origin main
# Alpha tags can be on any branch, production tags must be on main
if [[ "$IS_ALPHA" == "true" ]]; then
echo "Alpha tag detected, skipping main branch check"
elif ! git merge-base --is-ancestor origin/main HEAD; then
echo "Production tag must point to a commit on main"
exit 1
fi
- name: Fetch and validate release notes
if: github.ref_type == 'tag'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Find draft release by name using gh release list (more reliable with default token)
echo "Looking for draft release named '$GITHUB_REF_NAME'..."
DRAFT_EXISTS=$(gh release list --json name,isDraft --jq ".[] | select(.isDraft == true) | select(.name == \"$GITHUB_REF_NAME\") | .name" 2>/dev/null || echo "")
if [[ -z "$DRAFT_EXISTS" ]]; then
if [[ "$IS_ALPHA" == "true" ]]; then
echo "No draft release found for alpha tag $GITHUB_REF_NAME (optional for alphas)"
echo "HAS_RELEASE_NOTES=false" >> $GITHUB_ENV
exit 0
fi
echo "ERROR: No draft release found for tag $GITHUB_REF_NAME"
echo "Please create a draft release with release notes before pushing the tag."
exit 1
fi
# Fetch full release details via API to get body and ID
echo "Found draft release, fetching details..."
RELEASE_JSON=$(gh api repos/${{ github.repository }}/releases --jq ".[] | select(.draft == true) | select(.name == \"$GITHUB_REF_NAME\")" 2>/dev/null || echo "")
# Extract release notes
NOTES=$(echo "$RELEASE_JSON" | jq -r '.body // ""')
if [[ -z "$NOTES" || "$NOTES" == "null" ]]; then
if [[ "$IS_ALPHA" == "true" ]]; then
echo "Draft release has no notes (optional for alphas)"
echo "HAS_RELEASE_NOTES=false" >> $GITHUB_ENV
exit 0
fi
echo "ERROR: Draft release exists but has no release notes"
echo "Please add release notes to the draft release before pushing the tag."
exit 1
fi
# Save release ID for later publishing
RELEASE_ID=$(echo "$RELEASE_JSON" | jq -r '.id')
echo "DRAFT_RELEASE_ID=$RELEASE_ID" >> $GITHUB_ENV
echo "HAS_RELEASE_NOTES=true" >> $GITHUB_ENV
echo "Found draft release (ID: $RELEASE_ID), saving release notes..."
echo "$NOTES" > /tmp/release_notes.md
echo "RELEASE_NOTES_FILE=/tmp/release_notes.md" >> $GITHUB_ENV
# ============================================================
# Install dependencies
# ============================================================
- name: Select Xcode 26.2
run: |
sudo xcode-select -s /Applications/Xcode_26.2.app
if ! xcrun -f metal >/dev/null 2>&1; then
echo "Metal toolchain is not installed."
exit 1
fi
- name: Install Homebrew packages
run: brew install just awscli macmon
- name: Install UV
uses: astral-sh/setup-uv@v6
with:
enable-cache: true
cache-dependency-glob: uv.lock
- name: Setup Python
run: |
uv python install
uv sync --locked
- name: Install Nix
uses: cachix/install-nix-action@v31
with:
nix_path: nixpkgs=channel:nixos-unstable
- name: Configure Cachix
uses: cachix/cachix-action@v14
with:
name: exo
authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
- name: Build dashboard
run: |
DASHBOARD_OUT=$(nix build .#dashboard --print-build-logs --no-link --print-out-paths)
mkdir -p dashboard/build
cp -r "$DASHBOARD_OUT"/* dashboard/build/
- name: Install Sparkle CLI
run: |
CLI_URL="${SPARKLE_CLI_URL:-https://github.com/sparkle-project/Sparkle/releases/download/${SPARKLE_VERSION}/Sparkle-${SPARKLE_VERSION}.tar.xz}"
echo "Downloading Sparkle CLI from: $CLI_URL"
mkdir -p /tmp/sparkle
curl --fail --location --output /tmp/sparkle.tar.xz "$CLI_URL"
tar -xJf /tmp/sparkle.tar.xz -C /tmp/sparkle --strip-components=1
echo "SPARKLE_BIN=/tmp/sparkle/bin" >> $GITHUB_ENV
- name: Prepare code-signing keychain
env:
MACOS_CERTIFICATE: ${{ secrets.MACOS_CERTIFICATE }}
MACOS_CERTIFICATE_PASSWORD: ${{ secrets.MACOS_CERTIFICATE_PASSWORD }}
PROVISIONING_PROFILE: ${{ secrets.PROVISIONING_PROFILE }}
run: |
KEYCHAIN_PATH="$HOME/Library/Keychains/build.keychain-db"
# Create fresh keychain
security create-keychain -p "$MACOS_CERTIFICATE_PASSWORD" "$KEYCHAIN_PATH"
# Disable auto-lock (no timeout, no lock-on-sleep)
security set-keychain-settings "$KEYCHAIN_PATH"
# Add to search list while preserving existing keychains
security list-keychains -d user -s "$KEYCHAIN_PATH" $(security list-keychains -d user | tr -d '"')
# Set as default and unlock
security default-keychain -s "$KEYCHAIN_PATH"
security unlock-keychain -p "$MACOS_CERTIFICATE_PASSWORD" "$KEYCHAIN_PATH"
# Import certificate with full access for codesign
echo "$MACOS_CERTIFICATE" | base64 --decode > /tmp/cert.p12
security import /tmp/cert.p12 -k "$KEYCHAIN_PATH" -P "$MACOS_CERTIFICATE_PASSWORD" \
-T /usr/bin/codesign -T /usr/bin/security -T /usr/bin/productbuild
rm /tmp/cert.p12
# Allow codesign to access the key without prompting
security set-key-partition-list -S apple-tool:,apple:,codesign: -s -k "$MACOS_CERTIFICATE_PASSWORD" "$KEYCHAIN_PATH"
# Verify keychain is unlocked and identity is available
echo "Verifying signing identity..."
security find-identity -v -p codesigning "$KEYCHAIN_PATH"
# Setup provisioning profile
mkdir -p "$HOME/Library/Developer/Xcode/UserData/Provisioning Profiles"
echo "$PROVISIONING_PROFILE" | base64 --decode > "$HOME/Library/Developer/Xcode/UserData/Provisioning Profiles/EXO.provisionprofile"
# Export keychain path for other steps
echo "BUILD_KEYCHAIN_PATH=$KEYCHAIN_PATH" >> $GITHUB_ENV
# ============================================================
# Build the bundle
# ============================================================
- name: Build PyInstaller bundle
run: uv run pyinstaller packaging/pyinstaller/exo.spec
- name: Build Swift app
env:
MACOS_CERTIFICATE_PASSWORD: ${{ secrets.MACOS_CERTIFICATE_PASSWORD }}
SPARKLE_FEED_URL: ${{ secrets.SPARKLE_FEED_URL }}
SPARKLE_ED25519_PUBLIC: ${{ secrets.SPARKLE_ED25519_PUBLIC }}
run: |
cd app/EXO
security unlock-keychain -p "$MACOS_CERTIFICATE_PASSWORD" "$BUILD_KEYCHAIN_PATH"
SIGNING_IDENTITY=$(security find-identity -v -p codesigning "$BUILD_KEYCHAIN_PATH" | awk -F '"' '{print $2}')
xcodebuild clean build \
-scheme EXO \
-configuration Release \
-derivedDataPath build \
MARKETING_VERSION="$RELEASE_VERSION" \
CURRENT_PROJECT_VERSION="$EXO_BUILD_VERSION" \
EXO_BUILD_TAG="$RELEASE_VERSION" \
EXO_BUILD_COMMIT="$GITHUB_SHA" \
SPARKLE_FEED_URL="$SPARKLE_FEED_URL" \
SPARKLE_ED25519_PUBLIC="$SPARKLE_ED25519_PUBLIC" \
EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT="$EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT" \
CODE_SIGNING_IDENTITY="$SIGNING_IDENTITY" \
CODE_SIGN_INJECT_BASE_ENTITLEMENTS=YES
mkdir -p ../../output
cp -R build/Build/Products/Release/EXO.app ../../output/EXO.app
- name: Inject PyInstaller runtime
run: |
rm -rf output/EXO.app/Contents/Resources/exo
mkdir -p output/EXO.app/Contents/Resources
cp -R dist/exo output/EXO.app/Contents/Resources/exo
- name: Codesign PyInstaller runtime
env:
MACOS_CERTIFICATE_PASSWORD: ${{ secrets.MACOS_CERTIFICATE_PASSWORD }}
run: |
cd output
security unlock-keychain -p "$MACOS_CERTIFICATE_PASSWORD" "$BUILD_KEYCHAIN_PATH"
SIGNING_IDENTITY=$(security find-identity -v -p codesigning "$BUILD_KEYCHAIN_PATH" | awk -F '"' '{print $2}')
RUNTIME_DIR="EXO.app/Contents/Resources/exo"
find "$RUNTIME_DIR" -type f \( -perm -111 -o -name "*.dylib" -o -name "*.so" \) -print0 |
while IFS= read -r -d '' file; do
/usr/bin/codesign --force --timestamp --options runtime \
--sign "$SIGNING_IDENTITY" "$file"
done
- name: Sign, notarize, and create DMG
env:
MACOS_CERTIFICATE_PASSWORD: ${{ secrets.MACOS_CERTIFICATE_PASSWORD }}
APPLE_NOTARIZATION_USERNAME: ${{ secrets.APPLE_NOTARIZATION_USERNAME }}
APPLE_NOTARIZATION_PASSWORD: ${{ secrets.APPLE_NOTARIZATION_PASSWORD }}
APPLE_NOTARIZATION_TEAM: ${{ secrets.APPLE_NOTARIZATION_TEAM }}
run: |
cd output
security unlock-keychain -p "$MACOS_CERTIFICATE_PASSWORD" "$BUILD_KEYCHAIN_PATH"
SIGNING_IDENTITY=$(security find-identity -v -p codesigning "$BUILD_KEYCHAIN_PATH" | awk -F '"' '{print $2}')
/usr/bin/codesign --deep --force --timestamp --options runtime \
--sign "$SIGNING_IDENTITY" EXO.app
mkdir -p dmg-root
cp -R EXO.app dmg-root/
ln -s /Applications dmg-root/Applications
DMG_NAME="EXO-${RELEASE_VERSION}.dmg"
hdiutil create -volname "EXO" -srcfolder dmg-root -ov -format UDZO "$DMG_NAME"
/usr/bin/codesign --force --timestamp --options runtime \
--sign "$SIGNING_IDENTITY" "$DMG_NAME"
if [[ -n "$APPLE_NOTARIZATION_USERNAME" ]]; then
SUBMISSION_OUTPUT=$(xcrun notarytool submit "$DMG_NAME" \
--apple-id "$APPLE_NOTARIZATION_USERNAME" \
--password "$APPLE_NOTARIZATION_PASSWORD" \
--team-id "$APPLE_NOTARIZATION_TEAM" \
--wait --timeout 15m 2>&1)
echo "$SUBMISSION_OUTPUT"
SUBMISSION_ID=$(echo "$SUBMISSION_OUTPUT" | awk 'tolower($1)=="id:" && $2 ~ /^[0-9a-fA-F-]+$/ {print $2; exit}')
STATUS=$(echo "$SUBMISSION_OUTPUT" | awk 'tolower($1)=="status:" {print $2; exit}')
if [[ -n "$SUBMISSION_ID" ]]; then
xcrun notarytool log "$SUBMISSION_ID" \
--apple-id "$APPLE_NOTARIZATION_USERNAME" \
--password "$APPLE_NOTARIZATION_PASSWORD" \
--team-id "$APPLE_NOTARIZATION_TEAM" > notarization-log.txt || true
echo "===== Notarization Log ====="
cat notarization-log.txt
echo "============================"
fi
if [[ "$STATUS" != "Accepted" ]]; then
echo "Notarization failed with status: ${STATUS:-Unknown}"
exit 1
fi
xcrun stapler staple "$DMG_NAME"
fi
- name: Generate Sparkle appcast
env:
SPARKLE_DOWNLOAD_PREFIX: ${{ env.SPARKLE_DOWNLOAD_PREFIX }}
SPARKLE_ED25519_PRIVATE: ${{ secrets.SPARKLE_ED25519_PRIVATE }}
IS_ALPHA: ${{ env.IS_ALPHA }}
run: |
set -euo pipefail
cd output
DOWNLOAD_PREFIX="${SPARKLE_DOWNLOAD_PREFIX:-https://assets.exolabs.net}"
echo "$SPARKLE_ED25519_PRIVATE" > sparkle_ed25519.key
chmod 600 sparkle_ed25519.key
CHANNEL_FLAG=""
if [[ "$IS_ALPHA" == "true" ]]; then
CHANNEL_FLAG="--channel alpha"
echo "Generating appcast for alpha channel"
fi
$SPARKLE_BIN/generate_appcast \
--ed-key-file sparkle_ed25519.key \
--download-url-prefix "$DOWNLOAD_PREFIX" \
$CHANNEL_FLAG \
.
- name: Inject release notes into appcast
if: github.ref_type == 'tag' && env.HAS_RELEASE_NOTES == 'true'
env:
RELEASE_VERSION: ${{ env.RELEASE_VERSION }}
run: |
# Inject markdown release notes with sparkle:format="markdown" (Sparkle 2.9+)
export NOTES=$(cat "$RELEASE_NOTES_FILE")
# Insert description after the enclosure tag for this version
awk '
/<enclosure[^>]*>/ && index($0, ENVIRON["RELEASE_VERSION"]) {
print
print " <description sparkle:format=\"markdown\"><![CDATA["
print ENVIRON["NOTES"]
print " ]]></description>"
next
}
{ print }
' output/appcast.xml > output/appcast.xml.tmp && mv output/appcast.xml.tmp output/appcast.xml
echo "Injected markdown release notes for version $RELEASE_VERSION"
# ============================================================
# Upload artifacts
# ============================================================
- name: Upload DMG
uses: actions/upload-artifact@v4
with:
name: EXO-dmg-${{ env.RELEASE_VERSION }}
path: output/EXO-${{ env.RELEASE_VERSION }}.dmg
- name: Upload to S3
if: env.SPARKLE_S3_BUCKET != '' && github.ref_type == 'tag'
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ env.AWS_REGION }}
SPARKLE_S3_BUCKET: ${{ env.SPARKLE_S3_BUCKET }}
SPARKLE_S3_PREFIX: ${{ env.SPARKLE_S3_PREFIX }}
IS_ALPHA: ${{ env.IS_ALPHA }}
run: |
set -euo pipefail
cd output
PREFIX="${SPARKLE_S3_PREFIX:-}"
if [[ -n "$PREFIX" && "${PREFIX: -1}" != "/" ]]; then
PREFIX="${PREFIX}/"
fi
DMG_NAME="EXO-${RELEASE_VERSION}.dmg"
aws s3 cp "$DMG_NAME" "s3://${SPARKLE_S3_BUCKET}/${PREFIX}${DMG_NAME}"
if [[ "$IS_ALPHA" != "true" ]]; then
aws s3 cp "$DMG_NAME" "s3://${SPARKLE_S3_BUCKET}/${PREFIX}EXO-latest.dmg"
aws s3 cp appcast.xml "s3://${SPARKLE_S3_BUCKET}/${PREFIX}appcast.xml" --content-type application/xml --cache-control no-cache
fi
- name: Publish GitHub Release
if: github.ref_type == 'tag'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
DMG_PATH="output/EXO-${RELEASE_VERSION}.dmg"
if [[ "$HAS_RELEASE_NOTES" == "true" ]]; then
# Update the draft release with the tag and upload DMG
gh api --method PATCH "repos/${{ github.repository }}/releases/$DRAFT_RELEASE_ID" \
-f tag_name="$GITHUB_REF_NAME" \
-F draft=false
gh release upload "$GITHUB_REF_NAME" "$DMG_PATH" --clobber
echo "Published release $GITHUB_REF_NAME with DMG attached"
else
# Alpha without draft release - create one with auto-generated notes
gh release create "$GITHUB_REF_NAME" "$DMG_PATH" \
--title "$GITHUB_REF_NAME" \
--generate-notes \
--prerelease
echo "Created alpha release $GITHUB_REF_NAME with auto-generated notes"
fi

136
.github/workflows/pipeline.yml vendored Normal file
View File

@@ -0,0 +1,136 @@
name: ci-pipeline
on:
push:
pull_request:
branches:
- staging
- main
jobs:
typecheck:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
lfs: false
- uses: cachix/install-nix-action@v31
with:
nix_path: nixpkgs=channel:nixos-unstable
- uses: cachix/cachix-action@v14
name: Configure Cachix
with:
name: exo
authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
- name: Configure git user
run: |
git config --local user.email "github-actions@users.noreply.github.com"
git config --local user.name "github-actions bot"
shell: bash
- name: Pull LFS files
run: |
echo "Pulling Git LFS files..."
git lfs pull
shell: bash
- name: Setup Nix Environment
run: |
echo "Checking for nix installation..."
# Check if nix binary exists directly
if [ -f /nix/var/nix/profiles/default/bin/nix ]; then
echo "Found nix binary at /nix/var/nix/profiles/default/bin/nix"
export PATH="/nix/var/nix/profiles/default/bin:$PATH"
echo "PATH=$PATH" >> $GITHUB_ENV
nix --version
elif [ -f /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh ]; then
echo "Found nix profile script, sourcing..."
source /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh
nix --version
elif command -v nix >/dev/null 2>&1; then
echo "Nix already in PATH"
nix --version
else
echo "Nix not found. Debugging info:"
echo "Contents of /nix/var/nix/profiles/default/:"
ls -la /nix/var/nix/profiles/default/ 2>/dev/null || echo "Directory not found"
echo "Contents of /nix/var/nix/profiles/default/bin/:"
ls -la /nix/var/nix/profiles/default/bin/ 2>/dev/null || echo "Directory not found"
exit 1
fi
shell: bash
- name: Configure basedpyright include for local MLX
run: |
RUNNER_LABELS='${{ toJSON(runner.labels) }}'
if echo "$RUNNER_LABELS" | grep -q "local_mlx"; then
if [ -d "/Users/Shared/mlx" ]; then
echo "Updating [tool.basedpyright].include to use /Users/Shared/mlx"
awk '
BEGIN { in=0 }
/^\[tool\.basedpyright\]/ { in=1; print; next }
in && /^\[/ { in=0 } # next section
in && /^[ \t]*include[ \t]*=/ {
print "include = [\"/Users/Shared/mlx\"]"
next
}
{ print }
' pyproject.toml > pyproject.toml.tmp && mv pyproject.toml.tmp pyproject.toml
echo "New [tool.basedpyright] section:"
sed -n '/^\[tool\.basedpyright\]/,/^\[/p' pyproject.toml | sed '$d' || true
else
echo "local_mlx tag present but /Users/Shared/mlx not found; leaving pyproject unchanged."
fi
else
echo "Runner does not have 'local_mlx' tag; leaving pyproject unchanged."
fi
shell: bash
- uses: ./.github/actions/typecheck
nix:
name: Build and check (${{ matrix.system }})
runs-on: ${{ matrix.runner }}
strategy:
fail-fast: false
matrix:
include:
- runner: macos-26
system: aarch64-darwin
- runner: ubuntu-latest
system: x86_64-linux
- runner: ubuntu-24.04-arm
system: aarch64-linux
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
lfs: false
- uses: cachix/install-nix-action@v31
with:
nix_path: nixpkgs=channel:nixos-unstable
- uses: cachix/cachix-action@v14
name: Configure Cachix
with:
name: exo
authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
- name: Build all Nix outputs
run: |
nix flake show --json | jq -r '
[
(.packages."${{ matrix.system }}" // {} | keys[] | ".#packages.${{ matrix.system }}.\(.)"),
(.devShells."${{ matrix.system }}" // {} | keys[] | ".#devShells.${{ matrix.system }}.\(.)")
] | .[]
' | xargs nix build
- name: Run nix flake check
run: nix flake check

193
.gitignore vendored
View File

@@ -1,175 +1,30 @@
__pycache__/
.venv*
test_weights.npz
.exo_used_ports
.exo_node_id
# gitingest
digest.txt
# python
**/__pycache__
# nix
.direnv/
# IDEA (PyCharm)
.idea
.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# xcode / macos
*.xcuserstate
*.xcuserdata
*.xcuserdatad/
**/.DS_Store
app/EXO/build/
dist/
# C extensions
*.so
# Distribution / packaging
/.Python
/develop-eggs/
/dist/
/downloads/
/eggs/
/.eggs/
/lib/
/lib64/
/parts/
/sdist/
/var/
/wheels/
/share/python-wheels/
/*.egg-info/
/.installed.cfg
/*.egg
/MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
# rust
target/
**/*.rs.bk
*.pdb
# Jupyter Notebook
.ipynb_checkpoints
Untitled.ipynb
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
**/*.xcodeproj/*
.aider*
exo/tinychat/images/*.png
# svelte
dashboard/build/
dashboard/node_modules/
dashboard/.svelte-kit/

9
.idea/.gitignore generated vendored Normal file
View File

@@ -0,0 +1,9 @@
# Default ignored files
/shelf/
/workspace.xml
# Editor-based HTTP Client requests
/httpRequests/
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml
workspace.xml

16
.idea/LanguageServersSettings.xml generated Normal file
View File

@@ -0,0 +1,16 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="LanguageServerSettingsState">
<state>
<map>
<entry key="com.insyncwithfoo.pyright">
<value>
<LanguageServerDefinitionSettings>
<option name="errorReportingKind" value="in_log" />
</LanguageServerDefinitionSettings>
</value>
</entry>
</map>
</state>
</component>
</project>

31
.idea/exo-v2.iml generated Normal file
View File

@@ -0,0 +1,31 @@
<?xml version="1.0" encoding="UTF-8"?>
<module type="EMPTY_MODULE" version="4">
<component name="FacetManager">
<facet type="Python" name="Python facet">
<configuration sdkName="Python 3.13 virtualenv at ~/Desktop/exo/.venv" />
</facet>
</component>
<component name="Go" enabled="true" />
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/scripts/src" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/rust/exo_pyo3_bindings/src" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/rust/exo_pyo3_bindings/tests" isTestSource="true" />
<sourceFolder url="file://$MODULE_DIR$/rust/util/src" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/rust/networking/examples" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/rust/networking/src" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/rust/networking/tests" isTestSource="true" />
<sourceFolder url="file://$MODULE_DIR$/rust/system_custodian/src" isTestSource="false" />
<excludeFolder url="file://$MODULE_DIR$/.venv" />
<excludeFolder url="file://$MODULE_DIR$/.direnv" />
<excludeFolder url="file://$MODULE_DIR$/build" />
<excludeFolder url="file://$MODULE_DIR$/dist" />
<excludeFolder url="file://$MODULE_DIR$/.go_cache" />
<excludeFolder url="file://$MODULE_DIR$/rust/target" />
</content>
<orderEntry type="jdk" jdkName="Python 3.13 (exo)" jdkType="Python SDK" />
<orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="library" name="Python 3.13 virtualenv at ~/Desktop/exo/.venv interpreter library" level="application" />
</component>
</module>

6
.idea/externalDependencies.xml generated Normal file
View File

@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ExternalDependencies">
<plugin id="systems.fehn.intellijdirenv" />
</component>
</project>

View File

@@ -0,0 +1,14 @@
<component name="InspectionProjectProfileManager">
<profile version="1.0">
<option name="myName" value="Project Default" />
<inspection_tool class="PyCompatibilityInspection" enabled="true" level="WARNING" enabled_by_default="true">
<option name="ourVersions">
<value>
<list size="1">
<item index="0" class="java.lang.String" itemvalue="3.14" />
</list>
</value>
</option>
</inspection_tool>
</profile>
</component>

10
.idea/misc.xml generated Normal file
View File

@@ -0,0 +1,10 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="Black">
<option name="sdkName" value="Python 3.13 (exo)" />
</component>
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.13 (exo)" project-jdk-type="Python SDK" />
<component name="PythonCompatibilityInspectionAdvertiser">
<option name="version" value="3" />
</component>
</project>

8
.idea/modules.xml generated Normal file
View File

@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectModuleManager">
<modules>
<module fileurl="file://$PROJECT_DIR$/.idea/exo.iml" filepath="$PROJECT_DIR$/.idea/exo.iml" />
</modules>
</component>
</project>

18
.idea/pyright-overrides.xml generated Normal file
View File

@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="com.insyncwithfoo.pyright.configurations.Override">
<option name="names">
<map>
<entry key="configurationFile" value="true" />
<entry key="diagnosticMode" value="true" />
<entry key="inlayHintsGenericTypes" value="true" />
<entry key="prefixTooltipMessages" value="true" />
<entry key="runningMode" value="true" />
<entry key="smartExecutableResolution" value="true" />
<entry key="smartLanguageServerExecutableResolution" value="true" />
<entry key="useEditorFontForTooltips" value="true" />
<entry key="useTypingExtensions" value="true" />
</map>
</option>
</component>
</project>

9
.idea/pyright.xml generated Normal file
View File

@@ -0,0 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="com.insyncwithfoo.pyright.configurations.Local">
<option name="diagnosticMode" value="WORKSPACE" />
<option name="inlayHintsGenericTypes" value="true" />
<option name="prefixTooltipMessages" value="true" />
<option name="useEditorFontForTooltips" value="true" />
</component>
</project>

6
.idea/vcs.xml generated Normal file
View File

@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="" vcs="Git" />
</component>
</project>

View File

@@ -0,0 +1,7 @@
"""
This type stub file was generated by pyright.
"""
import os
if "TOKENIZERS_PARALLELISM" not in os.environ: ...

View File

@@ -0,0 +1,3 @@
"""
This type stub file was generated by pyright.
"""

View File

@@ -0,0 +1,47 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import PIL.Image
import tqdm
from typing import Protocol
from mflux.models.common.config.config import Config
class BeforeLoopCallback(Protocol):
def call_before_loop(
self,
seed: int,
prompt: str,
latents: mx.array,
config: Config,
canny_image: PIL.Image.Image | None = ...,
depth_image: PIL.Image.Image | None = ...,
) -> None: ...
class InLoopCallback(Protocol):
def call_in_loop(
self,
t: int,
seed: int,
prompt: str,
latents: mx.array,
config: Config,
time_steps: tqdm,
) -> None: ...
class AfterLoopCallback(Protocol):
def call_after_loop(
self, seed: int, prompt: str, latents: mx.array, config: Config
) -> None: ...
class InterruptCallback(Protocol):
def call_interrupt(
self,
t: int,
seed: int,
prompt: str,
latents: mx.array,
config: Config,
time_steps: tqdm,
) -> None: ...

View File

@@ -0,0 +1,24 @@
"""
This type stub file was generated by pyright.
"""
from typing import TYPE_CHECKING
from mflux.callbacks.callback import (
AfterLoopCallback,
BeforeLoopCallback,
InLoopCallback,
InterruptCallback,
)
from mflux.callbacks.generation_context import GenerationContext
from mflux.models.common.config.config import Config
if TYPE_CHECKING: ...
class CallbackRegistry:
def __init__(self) -> None: ...
def register(self, callback) -> None: ...
def start(self, seed: int, prompt: str, config: Config) -> GenerationContext: ...
def before_loop_callbacks(self) -> list[BeforeLoopCallback]: ...
def in_loop_callbacks(self) -> list[InLoopCallback]: ...
def after_loop_callbacks(self) -> list[AfterLoopCallback]: ...
def interrupt_callbacks(self) -> list[InterruptCallback]: ...

View File

@@ -0,0 +1,29 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import PIL.Image
import tqdm
from typing import TYPE_CHECKING
from mflux.callbacks.callback_registry import CallbackRegistry
from mflux.models.common.config.config import Config
if TYPE_CHECKING: ...
class GenerationContext:
def __init__(
self, registry: CallbackRegistry, seed: int, prompt: str, config: Config
) -> None: ...
def before_loop(
self,
latents: mx.array,
*,
canny_image: PIL.Image.Image | None = ...,
depth_image: PIL.Image.Image | None = ...,
) -> None: ...
def in_loop(self, t: int, latents: mx.array, time_steps: tqdm = ...) -> None: ...
def after_loop(self, latents: mx.array) -> None: ...
def interruption(
self, t: int, latents: mx.array, time_steps: tqdm = ...
) -> None: ...

View File

@@ -0,0 +1,3 @@
"""
This type stub file was generated by pyright.
"""

View File

@@ -0,0 +1,22 @@
"""
This type stub file was generated by pyright.
"""
import os
BATTERY_PERCENTAGE_STOP_LIMIT = ...
CONTROLNET_STRENGTH = ...
DEFAULT_DEV_FILL_GUIDANCE = ...
DEFAULT_DEPTH_GUIDANCE = ...
DIMENSION_STEP_PIXELS = ...
GUIDANCE_SCALE = ...
GUIDANCE_SCALE_KONTEXT = ...
IMAGE_STRENGTH = ...
MODEL_CHOICES = ...
MODEL_INFERENCE_STEPS = ...
QUANTIZE_CHOICES = ...
if os.environ.get("MFLUX_CACHE_DIR"):
MFLUX_CACHE_DIR = ...
else:
MFLUX_CACHE_DIR = ...
MFLUX_LORA_CACHE_DIR = ...

View File

@@ -0,0 +1,3 @@
"""
This type stub file was generated by pyright.
"""

View File

@@ -0,0 +1,3 @@
"""
This type stub file was generated by pyright.
"""

View File

@@ -0,0 +1,3 @@
"""
This type stub file was generated by pyright.
"""

View File

@@ -0,0 +1,8 @@
"""
This type stub file was generated by pyright.
"""
from mflux.models.common.config.config import Config
from mflux.models.common.config.model_config import ModelConfig
__all__ = ["Config", "ModelConfig"]

View File

@@ -0,0 +1,66 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from pathlib import Path
from typing import Any
from tqdm import tqdm
from mflux.models.common.config.model_config import ModelConfig
logger = ...
class Config:
def __init__(
self,
model_config: ModelConfig,
num_inference_steps: int = ...,
height: int = ...,
width: int = ...,
guidance: float = ...,
image_path: Path | str | None = ...,
image_strength: float | None = ...,
depth_image_path: Path | str | None = ...,
redux_image_paths: list[Path | str] | None = ...,
redux_image_strengths: list[float] | None = ...,
masked_image_path: Path | str | None = ...,
controlnet_strength: float | None = ...,
scheduler: str = ...,
) -> None: ...
@property
def height(self) -> int: ...
@property
def width(self) -> int: ...
@width.setter
def width(self, value): # -> None:
...
@property
def image_seq_len(self) -> int: ...
@property
def guidance(self) -> float: ...
@property
def num_inference_steps(self) -> int: ...
@property
def precision(self) -> mx.Dtype: ...
@property
def num_train_steps(self) -> int: ...
@property
def image_path(self) -> Path | None: ...
@property
def image_strength(self) -> float | None: ...
@property
def depth_image_path(self) -> Path | None: ...
@property
def redux_image_paths(self) -> list[Path] | None: ...
@property
def redux_image_strengths(self) -> list[float] | None: ...
@property
def masked_image_path(self) -> Path | None: ...
@property
def init_time_step(self) -> int: ...
@property
def time_steps(self) -> tqdm: ...
@property
def controlnet_strength(self) -> float | None: ...
@property
def scheduler(self) -> Any: ...

View File

@@ -0,0 +1,86 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from functools import lru_cache
from typing import Literal
class ModelConfig:
precision: mx.Dtype = ...
def __init__(
self,
priority: int,
aliases: list[str],
model_name: str,
base_model: str | None,
controlnet_model: str | None,
custom_transformer_model: str | None,
num_train_steps: int | None,
max_sequence_length: int | None,
supports_guidance: bool | None,
requires_sigma_shift: bool | None,
transformer_overrides: dict | None = ...,
) -> None: ...
@staticmethod
@lru_cache
def dev() -> ModelConfig: ...
@staticmethod
@lru_cache
def schnell() -> ModelConfig: ...
@staticmethod
@lru_cache
def dev_kontext() -> ModelConfig: ...
@staticmethod
@lru_cache
def dev_fill() -> ModelConfig: ...
@staticmethod
@lru_cache
def dev_redux() -> ModelConfig: ...
@staticmethod
@lru_cache
def dev_depth() -> ModelConfig: ...
@staticmethod
@lru_cache
def dev_controlnet_canny() -> ModelConfig: ...
@staticmethod
@lru_cache
def schnell_controlnet_canny() -> ModelConfig: ...
@staticmethod
@lru_cache
def dev_controlnet_upscaler() -> ModelConfig: ...
@staticmethod
@lru_cache
def dev_fill_catvton() -> ModelConfig: ...
@staticmethod
@lru_cache
def krea_dev() -> ModelConfig: ...
@staticmethod
@lru_cache
def flux2_klein_4b() -> ModelConfig: ...
@staticmethod
@lru_cache
def flux2_klein_9b() -> ModelConfig: ...
@staticmethod
@lru_cache
def qwen_image() -> ModelConfig: ...
@staticmethod
@lru_cache
def qwen_image_edit() -> ModelConfig: ...
@staticmethod
@lru_cache
def fibo() -> ModelConfig: ...
@staticmethod
@lru_cache
def z_image_turbo() -> ModelConfig: ...
@staticmethod
@lru_cache
def seedvr2_3b() -> ModelConfig: ...
def x_embedder_input_dim(self) -> int: ...
def is_canny(self) -> bool: ...
@staticmethod
def from_name(
model_name: str, base_model: Literal["dev", "schnell", "krea-dev"] | None = ...
) -> ModelConfig: ...
AVAILABLE_MODELS = ...

View File

@@ -0,0 +1,7 @@
"""
This type stub file was generated by pyright.
"""
"""
This type stub file was generated by pyright.
"""

View File

@@ -0,0 +1,49 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from pathlib import Path
from typing import TYPE_CHECKING, TypeAlias
from mlx import nn
from mflux.models.common.vae.tiling_config import TilingConfig
from mflux.models.fibo.latent_creator.fibo_latent_creator import FiboLatentCreator
from mflux.models.flux.latent_creator.flux_latent_creator import FluxLatentCreator
from mflux.models.qwen.latent_creator.qwen_latent_creator import QwenLatentCreator
from mflux.models.z_image.latent_creator.z_image_latent_creator import (
ZImageLatentCreator,
)
if TYPE_CHECKING:
LatentCreatorType: TypeAlias = type[
FiboLatentCreator | FluxLatentCreator | QwenLatentCreator | ZImageLatentCreator
]
class Img2Img:
def __init__(
self,
vae: nn.Module,
latent_creator: LatentCreatorType,
sigmas: mx.array,
init_time_step: int,
image_path: str | Path | None,
tiling_config: TilingConfig | None = ...,
) -> None: ...
class LatentCreator:
@staticmethod
def create_for_txt2img_or_img2img(
seed: int, height: int, width: int, img2img: Img2Img
) -> mx.array: ...
@staticmethod
def encode_image(
vae: nn.Module,
image_path: str | Path,
height: int,
width: int,
tiling_config: TilingConfig | None = ...,
) -> mx.array: ...
@staticmethod
def add_noise_by_interpolation(
clean: mx.array, noise: mx.array, sigma: float
) -> mx.array: ...

View File

@@ -0,0 +1,3 @@
"""
This type stub file was generated by pyright.
"""

View File

@@ -0,0 +1,13 @@
"""
This type stub file was generated by pyright.
"""
from mlx import nn
from mflux.models.common.lora.layer.linear_lora_layer import LoRALinear
class FusedLoRALinear(nn.Module):
def __init__(
self, base_linear: nn.Linear | nn.QuantizedLinear, loras: list[LoRALinear]
) -> None: ...
def __call__(self, x): # -> array:
...

View File

@@ -0,0 +1,22 @@
"""
This type stub file was generated by pyright.
"""
from mlx import nn
class LoRALinear(nn.Module):
@staticmethod
def from_linear(
linear: nn.Linear | nn.QuantizedLinear, r: int = ..., scale: float = ...
): # -> LoRALinear:
...
def __init__(
self,
input_dims: int,
output_dims: int,
r: int = ...,
scale: float = ...,
bias: bool = ...,
) -> None: ...
def __call__(self, x): # -> array:
...

View File

@@ -0,0 +1,26 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import mlx.nn as nn
from collections.abc import Callable
from dataclasses import dataclass
from mflux.models.common.lora.mapping.lora_mapping import LoRATarget
@dataclass
class PatternMatch:
source_pattern: str
target_path: str
matrix_name: str
transpose: bool
transform: Callable[[mx.array], mx.array] | None = ...
class LoRALoader:
@staticmethod
def load_and_apply_lora(
lora_mapping: list[LoRATarget],
transformer: nn.Module,
lora_paths: list[str] | None = ...,
lora_scales: list[float] | None = ...,
) -> tuple[list[str], list[float]]: ...

View File

@@ -0,0 +1,21 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from collections.abc import Callable
from dataclasses import dataclass
from typing import List, Protocol
@dataclass
class LoRATarget:
model_path: str
possible_up_patterns: List[str]
possible_down_patterns: List[str]
possible_alpha_patterns: List[str] = ...
up_transform: Callable[[mx.array], mx.array] | None = ...
down_transform: Callable[[mx.array], mx.array] | None = ...
class LoRAMapping(Protocol):
@staticmethod
def get_mapping() -> List[LoRATarget]: ...

View File

@@ -0,0 +1,9 @@
"""
This type stub file was generated by pyright.
"""
import mlx.nn as nn
class LoRASaver:
@staticmethod
def bake_and_strip_lora(module: nn.Module) -> nn.Module: ...

View File

@@ -0,0 +1,35 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
class LoraTransforms:
@staticmethod
def split_q_up(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_k_up(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_v_up(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_q_down(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_k_down(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_v_down(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_q_up(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_k_up(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_v_up(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_mlp_up(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_q_down(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_k_down(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_v_down(tensor: mx.array) -> mx.array: ...
@staticmethod
def split_single_mlp_down(tensor: mx.array) -> mx.array: ...

View File

@@ -0,0 +1,17 @@
"""
This type stub file was generated by pyright.
"""
from mflux.models.common.resolution.config_resolution import ConfigResolution
from mflux.models.common.resolution.lora_resolution import LoraResolution
from mflux.models.common.resolution.path_resolution import PathResolution
from mflux.models.common.resolution.quantization_resolution import (
QuantizationResolution,
)
__all__ = [
"ConfigResolution",
"LoraResolution",
"PathResolution",
"QuantizationResolution",
]

View File

@@ -0,0 +1,39 @@
"""
This type stub file was generated by pyright.
"""
from enum import Enum
from typing import NamedTuple
class QuantizationAction(Enum):
NONE = ...
STORED = ...
REQUESTED = ...
class PathAction(Enum):
LOCAL = ...
HUGGINGFACE_CACHED = ...
HUGGINGFACE = ...
ERROR = ...
class LoraAction(Enum):
LOCAL = ...
REGISTRY = ...
HUGGINGFACE_COLLECTION_CACHED = ...
HUGGINGFACE_COLLECTION = ...
HUGGINGFACE_REPO_CACHED = ...
HUGGINGFACE_REPO = ...
ERROR = ...
class ConfigAction(Enum):
EXACT_MATCH = ...
EXPLICIT_BASE = ...
INFER_SUBSTRING = ...
ERROR = ...
class Rule(NamedTuple):
priority: int
name: str
check: str
action: QuantizationAction | PathAction | LoraAction | ConfigAction
...

View File

@@ -0,0 +1,14 @@
"""
This type stub file was generated by pyright.
"""
from typing import TYPE_CHECKING
from mflux.models.common.config.model_config import ModelConfig
if TYPE_CHECKING: ...
logger = ...
class ConfigResolution:
RULES = ...
@staticmethod
def resolve(model_name: str, base_model: str | None = ...) -> ModelConfig: ...

View File

@@ -0,0 +1,21 @@
"""
This type stub file was generated by pyright.
"""
from pathlib import Path
logger = ...
class LoraResolution:
RULES = ...
_registry: dict[str, Path] = ...
@staticmethod
def resolve(path: str) -> str: ...
@staticmethod
def resolve_paths(paths: list[str] | None) -> list[str]: ...
@staticmethod
def resolve_scales(scales: list[float] | None, num_paths: int) -> list[float]: ...
@staticmethod
def get_registry() -> dict[str, Path]: ...
@staticmethod
def discover_files(library_paths: list[Path]) -> dict[str, Path]: ...

View File

@@ -0,0 +1,12 @@
"""
This type stub file was generated by pyright.
"""
from pathlib import Path
logger = ...
class PathResolution:
RULES = ...
@staticmethod
def resolve(path: str | None, patterns: list[str] | None = ...) -> Path | None: ...

View File

@@ -0,0 +1,12 @@
"""
This type stub file was generated by pyright.
"""
logger = ...
class QuantizationResolution:
RULES = ...
@staticmethod
def resolve(
stored: int | None, requested: int | None
) -> tuple[int | None, str | None]: ...

View File

@@ -0,0 +1,26 @@
"""
This type stub file was generated by pyright.
"""
from .flow_match_euler_discrete_scheduler import FlowMatchEulerDiscreteScheduler
from .linear_scheduler import LinearScheduler
from .seedvr2_euler_scheduler import SeedVR2EulerScheduler
__all__ = [
"LinearScheduler",
"FlowMatchEulerDiscreteScheduler",
"SeedVR2EulerScheduler",
]
class SchedulerModuleNotFound(ValueError): ...
class SchedulerClassNotFound(ValueError): ...
class InvalidSchedulerType(TypeError): ...
SCHEDULER_REGISTRY = ...
def register_contrib(scheduler_object, scheduler_name=...): # -> None:
...
def try_import_external_scheduler(
scheduler_object_path: str,
): # -> type[BaseScheduler]:
...

View File

@@ -0,0 +1,16 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from abc import ABC, abstractmethod
class BaseScheduler(ABC):
@property
@abstractmethod
def sigmas(self) -> mx.array: ...
@abstractmethod
def step(
self, noise: mx.array, timestep: int, latents: mx.array, **kwargs
) -> mx.array: ...
def scale_model_input(self, latents: mx.array, t: int) -> mx.array: ...

View File

@@ -0,0 +1,26 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from typing import TYPE_CHECKING
from mflux.models.common.config.config import Config
from mflux.models.common.schedulers.base_scheduler import BaseScheduler
if TYPE_CHECKING: ...
class FlowMatchEulerDiscreteScheduler(BaseScheduler):
def __init__(self, config: Config) -> None: ...
@property
def sigmas(self) -> mx.array: ...
@property
def timesteps(self) -> mx.array: ...
def set_image_seq_len(self, image_seq_len: int) -> None: ...
@staticmethod
def get_timesteps_and_sigmas(
image_seq_len: int, num_inference_steps: int, num_train_timesteps: int = ...
) -> tuple[mx.array, mx.array]: ...
def step(
self, noise: mx.array, timestep: int, latents: mx.array, **kwargs
) -> mx.array: ...
def scale_model_input(self, latents: mx.array, t: int) -> mx.array: ...

View File

@@ -0,0 +1,20 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from typing import TYPE_CHECKING
from mflux.models.common.config.config import Config
from mflux.models.common.schedulers.base_scheduler import BaseScheduler
if TYPE_CHECKING: ...
class LinearScheduler(BaseScheduler):
def __init__(self, config: Config) -> None: ...
@property
def sigmas(self) -> mx.array: ...
@property
def timesteps(self) -> mx.array: ...
def step(
self, noise: mx.array, timestep: int, latents: mx.array, **kwargs
) -> mx.array: ...

View File

@@ -0,0 +1,20 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from typing import TYPE_CHECKING
from mflux.models.common.config.config import Config
from mflux.models.common.schedulers.base_scheduler import BaseScheduler
if TYPE_CHECKING: ...
class SeedVR2EulerScheduler(BaseScheduler):
def __init__(self, config: Config) -> None: ...
@property
def timesteps(self) -> mx.array: ...
@property
def sigmas(self) -> mx.array: ...
def step(
self, noise: mx.array, timestep: int, latents: mx.array, **kwargs
) -> mx.array: ...

View File

@@ -0,0 +1,24 @@
"""
This type stub file was generated by pyright.
"""
from mflux.models.common.tokenizer.tokenizer import (
BaseTokenizer,
LanguageTokenizer,
Tokenizer,
VisionLanguageTokenizer,
)
from mflux.models.common.tokenizer.tokenizer_loader import TokenizerLoader
from mflux.models.common.tokenizer.tokenizer_output import TokenizerOutput
"""
This type stub file was generated by pyright.
"""
__all__ = [
"Tokenizer",
"BaseTokenizer",
"LanguageTokenizer",
"VisionLanguageTokenizer",
"TokenizerLoader",
"TokenizerOutput",
]

View File

@@ -0,0 +1,74 @@
"""
This type stub file was generated by pyright.
"""
from abc import ABC, abstractmethod
from typing import Protocol, runtime_checkable
from PIL import Image
from transformers import PreTrainedTokenizer
from mflux.models.common.tokenizer.tokenizer_output import TokenizerOutput
"""
This type stub file was generated by pyright.
"""
@runtime_checkable
class Tokenizer(Protocol):
tokenizer: PreTrainedTokenizer
def tokenize(
self,
prompt: str | list[str],
images: list[Image.Image] | None = ...,
max_length: int | None = ...,
**kwargs,
) -> TokenizerOutput: ...
class BaseTokenizer(ABC):
def __init__(
self, tokenizer: PreTrainedTokenizer, max_length: int = ...
) -> None: ...
@abstractmethod
def tokenize(
self,
prompt: str | list[str],
images: list[Image.Image] | None = ...,
max_length: int | None = ...,
**kwargs,
) -> TokenizerOutput: ...
class LanguageTokenizer(BaseTokenizer):
def __init__(
self,
tokenizer: PreTrainedTokenizer,
max_length: int = ...,
padding: str = ...,
return_attention_mask: bool = ...,
template: str | None = ...,
use_chat_template: bool = ...,
chat_template_kwargs: dict | None = ...,
add_special_tokens: bool = ...,
) -> None: ...
def tokenize(
self,
prompt: str | list[str],
images: list[Image.Image] | None = ...,
max_length: int | None = ...,
**kwargs,
) -> TokenizerOutput: ...
class VisionLanguageTokenizer(BaseTokenizer):
def __init__(
self,
tokenizer: PreTrainedTokenizer,
processor,
max_length: int = ...,
template: str | None = ...,
image_token: str = ...,
) -> None: ...
def tokenize(
self,
prompt: str | list[str],
images: list[Image.Image] | None = ...,
max_length: int | None = ...,
**kwargs,
) -> TokenizerOutput: ...

View File

@@ -0,0 +1,22 @@
"""
This type stub file was generated by pyright.
"""
from typing import TYPE_CHECKING
from mflux.models.common.tokenizer.tokenizer import BaseTokenizer
from mflux.models.common.weights.loading.weight_definition import TokenizerDefinition
"""
This type stub file was generated by pyright.
"""
if TYPE_CHECKING: ...
class TokenizerLoader:
@staticmethod
def load(definition: TokenizerDefinition, model_path: str) -> BaseTokenizer: ...
@staticmethod
def load_all(
definitions: list[TokenizerDefinition],
model_path: str,
max_length_overrides: dict[str, int] | None = ...,
) -> dict[str, BaseTokenizer]: ...

View File

@@ -0,0 +1,17 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from dataclasses import dataclass
"""
This type stub file was generated by pyright.
"""
@dataclass
class TokenizerOutput:
input_ids: mx.array
attention_mask: mx.array
pixel_values: mx.array | None = ...
image_grid_thw: mx.array | None = ...

View File

@@ -0,0 +1,8 @@
"""
This type stub file was generated by pyright.
"""
from mflux.models.common.vae.tiling_config import TilingConfig
from mflux.models.common.vae.vae_tiler import VAETiler
__all__ = ["TilingConfig", "VAETiler"]

View File

@@ -0,0 +1,13 @@
"""
This type stub file was generated by pyright.
"""
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class TilingConfig:
vae_decode_tiles_per_dim: int | None = ...
vae_decode_overlap: int = ...
vae_encode_tiled: bool = ...
vae_encode_tile_size: int = ...
vae_encode_tile_overlap: int = ...

View File

@@ -0,0 +1,27 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from typing import Callable
class VAETiler:
@staticmethod
def encode_image_tiled(
*,
image: mx.array,
encode_fn: Callable[[mx.array], mx.array],
latent_channels: int,
tile_size: tuple[int, int] = ...,
tile_overlap: tuple[int, int] = ...,
spatial_scale: int = ...,
) -> mx.array: ...
@staticmethod
def decode_image_tiled(
*,
latent: mx.array,
decode_fn: Callable[[mx.array], mx.array],
tile_size: tuple[int, int] = ...,
tile_overlap: tuple[int, int] = ...,
spatial_scale: int = ...,
) -> mx.array: ...

View File

@@ -0,0 +1,17 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from mlx import nn
from mflux.models.common.vae.tiling_config import TilingConfig
class VAEUtil:
@staticmethod
def encode(
vae: nn.Module, image: mx.array, tiling_config: TilingConfig | None = ...
) -> mx.array: ...
@staticmethod
def decode(
vae: nn.Module, latent: mx.array, tiling_config: TilingConfig | None = ...
) -> mx.array: ...

View File

@@ -0,0 +1,18 @@
"""
This type stub file was generated by pyright.
"""
from mflux.models.common.weights.loading.loaded_weights import LoadedWeights, MetaData
from mflux.models.common.weights.loading.weight_applier import WeightApplier
from mflux.models.common.weights.loading.weight_definition import ComponentDefinition
from mflux.models.common.weights.loading.weight_loader import WeightLoader
from mflux.models.common.weights.saving.model_saver import ModelSaver
__all__ = [
"ComponentDefinition",
"LoadedWeights",
"MetaData",
"ModelSaver",
"WeightApplier",
"WeightLoader",
]

View File

@@ -0,0 +1,18 @@
"""
This type stub file was generated by pyright.
"""
from dataclasses import dataclass
@dataclass
class MetaData:
quantization_level: int | None = ...
mflux_version: str | None = ...
@dataclass
class LoadedWeights:
components: dict[str, dict]
meta_data: MetaData
def __getattr__(self, name: str) -> dict | None: ...
def num_transformer_blocks(self, component_name: str = ...) -> int: ...
def num_single_transformer_blocks(self, component_name: str = ...) -> int: ...

View File

@@ -0,0 +1,30 @@
"""
This type stub file was generated by pyright.
"""
import mlx.nn as nn
from typing import TYPE_CHECKING
from mflux.models.common.weights.loading.loaded_weights import LoadedWeights
from mflux.models.common.weights.loading.weight_definition import (
ComponentDefinition,
WeightDefinitionType,
)
if TYPE_CHECKING: ...
class WeightApplier:
@staticmethod
def apply_and_quantize_single(
weights: LoadedWeights,
model: nn.Module,
component: ComponentDefinition,
quantize_arg: int | None,
quantization_predicate=...,
) -> int | None: ...
@staticmethod
def apply_and_quantize(
weights: LoadedWeights,
models: dict[str, nn.Module],
quantize_arg: int | None,
weight_definition: WeightDefinitionType,
) -> int | None: ...

View File

@@ -0,0 +1,73 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from dataclasses import dataclass
from typing import Callable, List, TYPE_CHECKING, TypeAlias
from mflux.models.common.weights.mapping.weight_mapping import WeightTarget
from mflux.models.common.tokenizer.tokenizer import BaseTokenizer
from mflux.models.depth_pro.weights.depth_pro_weight_definition import (
DepthProWeightDefinition,
)
from mflux.models.fibo.weights.fibo_weight_definition import FIBOWeightDefinition
from mflux.models.fibo_vlm.weights.fibo_vlm_weight_definition import (
FIBOVLMWeightDefinition,
)
from mflux.models.flux.weights.flux_weight_definition import FluxWeightDefinition
from mflux.models.qwen.weights.qwen_weight_definition import QwenWeightDefinition
from mflux.models.seedvr2.weights.seedvr2_weight_definition import (
SeedVR2WeightDefinition,
)
from mflux.models.z_image.weights.z_image_weight_definition import (
ZImageWeightDefinition,
)
"""
This type stub file was generated by pyright.
"""
if TYPE_CHECKING:
WeightDefinitionType: TypeAlias = type[
FluxWeightDefinition
| FIBOWeightDefinition
| FIBOVLMWeightDefinition
| QwenWeightDefinition
| ZImageWeightDefinition
| SeedVR2WeightDefinition
| DepthProWeightDefinition
]
@dataclass
class ComponentDefinition:
name: str
hf_subdir: str
mapping_getter: Callable[[], List[WeightTarget]] | None = ...
model_attr: str | None = ...
num_blocks: int | None = ...
num_layers: int | None = ...
loading_mode: str = ...
precision: mx.Dtype | None = ...
skip_quantization: bool = ...
bulk_transform: Callable[[mx.array], mx.array] | None = ...
weight_subkey: str | None = ...
download_url: str | None = ...
weight_prefix_filters: List[str] | None = ...
weight_files: List[str] | None = ...
@dataclass
class TokenizerDefinition:
name: str
hf_subdir: str
tokenizer_class: str = ...
fallback_subdirs: List[str] | None = ...
download_patterns: List[str] | None = ...
encoder_class: type[BaseTokenizer] | None = ...
max_length: int = ...
padding: str = ...
template: str | None = ...
use_chat_template: bool = ...
chat_template_kwargs: dict | None = ...
add_special_tokens: bool = ...
processor_class: type | None = ...
image_token: str = ...
chat_template: str | None = ...

View File

@@ -0,0 +1,23 @@
"""
This type stub file was generated by pyright.
"""
from typing import TYPE_CHECKING
from mflux.models.common.weights.loading.loaded_weights import LoadedWeights
from mflux.models.common.weights.loading.weight_definition import (
ComponentDefinition,
WeightDefinitionType,
)
if TYPE_CHECKING: ...
logger = ...
class WeightLoader:
@staticmethod
def load_single(
component: ComponentDefinition, repo_id: str, file_pattern: str = ...
) -> LoadedWeights: ...
@staticmethod
def load(
weight_definition: WeightDefinitionType, model_path: str | None = ...
) -> LoadedWeights: ...

View File

@@ -0,0 +1,16 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from typing import Dict, List, Optional
from mflux.models.common.weights.mapping.weight_mapping import WeightTarget
class WeightMapper:
@staticmethod
def apply_mapping(
hf_weights: Dict[str, mx.array],
mapping: List[WeightTarget],
num_blocks: Optional[int] = ...,
num_layers: Optional[int] = ...,
) -> Dict: ...

View File

@@ -0,0 +1,23 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from dataclasses import dataclass
from typing import Callable, List, Optional, Protocol
"""
This type stub file was generated by pyright.
"""
@dataclass
class WeightTarget:
to_pattern: str
from_pattern: List[str]
transform: Optional[Callable[[mx.array], mx.array]] = ...
required: bool = ...
max_blocks: Optional[int] = ...
class WeightMapping(Protocol):
@staticmethod
def get_mapping() -> List[WeightTarget]: ...

View File

@@ -0,0 +1,17 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
class WeightTransforms:
@staticmethod
def reshape_gamma_to_1d(tensor: mx.array) -> mx.array: ...
@staticmethod
def transpose_patch_embed(tensor: mx.array) -> mx.array: ...
@staticmethod
def transpose_conv3d_weight(tensor: mx.array) -> mx.array: ...
@staticmethod
def transpose_conv2d_weight(tensor: mx.array) -> mx.array: ...
@staticmethod
def transpose_conv_transpose2d_weight(tensor: mx.array) -> mx.array: ...

View File

@@ -0,0 +1,14 @@
"""
This type stub file was generated by pyright.
"""
from typing import Any, TYPE_CHECKING
from mflux.models.common.weights.loading.weight_definition import WeightDefinitionType
if TYPE_CHECKING: ...
class ModelSaver:
@staticmethod
def save_model(
model: Any, bits: int, base_path: str, weight_definition: WeightDefinitionType
) -> None: ...

View File

@@ -0,0 +1,9 @@
"""
This type stub file was generated by pyright.
"""
from mflux.models.depth_pro.model.depth_pro_model import DepthProModel
class DepthProInitializer:
@staticmethod
def init(model: DepthProModel, quantize: int | None = ...) -> None: ...

View File

@@ -0,0 +1,10 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import mlx.nn as nn
class FeatureFusionBlock2d(nn.Module):
def __init__(self, num_features: int, deconv: bool = ...) -> None: ...
def __call__(self, x0: mx.array, x1: mx.array | None = ...) -> mx.array: ...

View File

@@ -0,0 +1,17 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import mlx.nn as nn
class MultiresConvDecoder(nn.Module):
def __init__(self) -> None: ...
def __call__(
self,
x0_latent: mx.array,
x1_latent: mx.array,
x0_features: mx.array,
x1_features: mx.array,
x_global_features: mx.array,
) -> mx.array: ...

View File

@@ -0,0 +1,10 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import mlx.nn as nn
class ResidualBlock(nn.Module):
def __init__(self, num_features: int) -> None: ...
def __call__(self, x: mx.array) -> mx.array: ...

View File

@@ -0,0 +1,20 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from dataclasses import dataclass
from pathlib import Path
from PIL import Image
@dataclass
class DepthResult:
depth_image: Image.Image
depth_array: mx.array
min_depth: float
max_depth: float
...
class DepthPro:
def __init__(self, quantize: int | None = ...) -> None: ...
def create_depth_map(self, image_path: str | Path) -> DepthResult: ...

View File

@@ -0,0 +1,12 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import mlx.nn as nn
class DepthProModel(nn.Module):
def __init__(self) -> None: ...
def __call__(
self, x0: mx.array, x1: mx.array, x2: mx.array
) -> tuple[mx.array, mx.array]: ...

View File

@@ -0,0 +1,15 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import mlx.nn as nn
class DepthProUtil:
@staticmethod
def split(x: mx.array, overlap_ratio: float = ...) -> mx.array: ...
@staticmethod
def interpolate(x: mx.array, size=..., scale_factor=...): # -> array:
...
@staticmethod
def apply_conv(x: mx.array, conv_module: nn.Module) -> mx.array: ...

View File

@@ -0,0 +1,12 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
from mlx import nn
class Attention(nn.Module):
def __init__(
self, dim: int = ..., head_dim: int = ..., num_heads: int = ...
) -> None: ...
def __call__(self, x: mx.array) -> mx.array: ...

View File

@@ -0,0 +1,10 @@
"""
This type stub file was generated by pyright.
"""
import mlx.core as mx
import mlx.nn as nn
class DinoVisionTransformer(nn.Module):
def __init__(self) -> None: ...
def __call__(self, x: mx.array) -> tuple[mx.array, mx.array, mx.array]: ...

Some files were not shown because too many files have changed in this diff Show More