mirror/exo - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-01-19 03:22:01 -05:00

Author	SHA1	Message	Date
Evan	ca321cfcc2	mapping of conns	2026-01-18 21:14:03 +00:00
Evan	bce4cb458d	fix tests	2026-01-18 21:14:03 +00:00
Evan	80a8e83348	review response	2026-01-18 21:14:03 +00:00
Evan	2645beea42	i hate that test	2026-01-18 21:14:03 +00:00
Evan	b8842e8081	rebase lint fmt	2026-01-18 21:14:03 +00:00
Evan	74a50d71af	think that was the bug	2026-01-18 21:14:03 +00:00
Evan	652528b32c	update log message + assertion	2026-01-18 21:14:03 +00:00
Evan	729c4ccaa2	add a test to gather TB connectivity data	2026-01-18 21:14:03 +00:00
Alex Cheema	7b6d49448b	fix: dashboard TypeScript errors and friendly name showing "Unknown" Dashboard fixes (TypeScript errors from `npm run check`): - TopologyGraph.svelte: remove reference to deleted sendBackMultiaddr property, fix type inference for debug edge labels - ModelCard.svelte: add missing topoWidth/topoHeight to early return - +page.svelte: fix nested property access for deviceRank Backend fix: - info_gatherer.py: send initial MiscData on startup so friendly name appears immediately instead of showing "Unknown" until it changes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 21:14:03 +00:00
Evan	5f31c7884c	lint fmt	2026-01-18 21:14:03 +00:00
Evan	46637e8ca9	bug	2026-01-18 21:14:03 +00:00
Evan	52d9ef17b2	still use ibv_devices	2026-01-18 21:14:03 +00:00
Evan	566a1688bd	fix the dashboard	2026-01-18 21:14:03 +00:00
Evan	912b8303ec	forgot how weird this platform is	2026-01-18 21:14:03 +00:00
Evan	aa4b0ede9f	fmt ts	2026-01-18 21:14:03 +00:00
Evan	64fab01822	remove the old network script functionality	2026-01-18 21:14:03 +00:00
Evan	9c5467aa35	add to the test server	2026-01-18 21:14:03 +00:00
Evan	d6951feac3	lint fmt	2026-01-18 21:14:03 +00:00
Evan	b714bc4562	switch from sequence to map of connections	2026-01-18 21:14:03 +00:00
Evan	bce23eac3f	pydantic types are now coherent	2026-01-18 21:14:03 +00:00
Sami Khan	5acb026c1e	parsing api fix	2026-01-18 21:14:03 +00:00
Evan	a7bba1e29b	code review followup	2026-01-18 21:14:03 +00:00
Evan	8add86fdd4	rename channel test	2026-01-18 21:14:03 +00:00
Evan	c289702ca4	move macmon test	2026-01-18 21:14:03 +00:00
Evan	63bd024d48	cleanup after rebase	2026-01-18 21:14:03 +00:00
Evan	741450987d	dedup connections	2026-01-18 21:14:03 +00:00
Evan	ae38c594e7	freeze those models	2026-01-18 21:14:03 +00:00
Evan	4a8a8fd296	format	2026-01-18 21:14:03 +00:00
Evan	9d552cdc38	tidy	2026-01-18 21:14:03 +00:00
Evan	695708ae27	all mastet tests pass	2026-01-18 21:14:03 +00:00
Evan	a282478951	ibv -> jaccl	2026-01-18 21:14:03 +00:00
Evan	9e4a0049f9	tidying some horrible logic	2026-01-18 21:14:02 +00:00
Evan	84780eb538	fix download test	2026-01-18 21:14:02 +00:00
Evan	7ba8217e64	fix all master tests except rdma placement	2026-01-18 21:14:02 +00:00
Evan	3159a2d038	fix topology tests	2026-01-18 21:14:02 +00:00
Evan	2b5a368977	actually update the topology	2026-01-18 21:14:02 +00:00
Evan	272e36345c	incorrect log	2026-01-18 21:14:02 +00:00
Evan	77062e4fef	handle an error	2026-01-18 21:14:02 +00:00
Evan	19c6758a87	fix pydantic validation	2026-01-18 21:14:02 +00:00
Evan	37ca32dc33	type checks outside of tests, time to test	2026-01-18 21:14:02 +00:00
Evan	e30c24aac8	wuff	2026-01-18 21:14:02 +00:00
Evan	fc5acf8cfb	rework topology	2026-01-18 21:14:02 +00:00
Evan	0fcbbfabac	update placement	2026-01-18 21:14:02 +00:00
Evan	00b15ce20d	mvp	2026-01-18 21:14:02 +00:00
Evan	287e03daa3	tidy config	2026-01-18 21:14:02 +00:00
rltakashige	618cee5223	Resolve test event ordering flakiness (#1194 ) ## Motivation mp sender occasionally does not have time to flush its events before collect() is called, making the event ordering test fail. ## Changes - Replace mp_channel with simple collector for event ordering test - Also suppress warning for <frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> <!-- - --> ### Automated Testing Ran the test 100 times without it failing.	2026-01-18 20:33:20 +00:00
Antonio Lujano Luna	9c29eb7d48	Add proxy and custom SSL certificate support for corporate networks (#1189 ) Support HTTPS_PROXY/HTTP_PROXY environment variables for proxy configuration and SSL_CERT_FILE for custom CA certificates, enabling use in corporate environments with SSL inspection. ## Motivation Users in corporate environments often need to route traffic through HTTP proxies and use custom CA certificates for SSL inspection. Without this support, exo cannot download models in these network configurations. ## Changes - Added `HTTPS_PROXY`/`HTTP_PROXY` environment variable support to `create_http_session()` in `download_utils.py` - Added `SSL_CERT_FILE` environment variable support for custom CA certificate bundles, falling back to certifi's default bundle ## Why It Works - `aiohttp.ClientSession` natively supports the `proxy` parameter for routing requests through HTTP proxies - `ssl.create_default_context(cafile=...)` accepts a custom CA bundle path, allowing corporate CAs to be trusted - Using environment variables is consistent with the codebase's existing configuration patterns (e.g., `EXO_HOME`, `HF_ENDPOINT`) ## Test Plan ### Manual Testing - Set `HTTPS_PROXY` environment variable and verified model downloads route through proxy - Set `SSL_CERT_FILE` to custom CA bundle and verified SSL verification succeeds with corporate SSL inspection ### Automated Testing - No automated tests added; this change is configuration-only and does not alter existing behavior when environment variables are unset	2026-01-18 12:05:50 +00:00
Alex Cheema	c5158bee53	Add pre-commit checks documentation to AGENTS.md (#1184 ) ## Motivation CI failures can be avoided by running checks locally before committing. This adds clear documentation to AGENTS.md so that AI agents (and humans) know exactly which checks must pass before pushing code. ## Changes Added a new "Pre-Commit Checks (REQUIRED)" section to AGENTS.md that: - Lists all 4 required checks (basedpyright, ruff, nix fmt, pytest) - Provides a one-liner to run all checks in sequence - Notes that `nix fmt` changes must be staged before committing - Explains that CI runs `nix flake check` which verifies everything ## Why It Works Clear documentation prevents CI failures by ensuring contributors run checks locally first. The one-liner command makes it easy to run all checks before committing. ## Test Plan ### Manual Testing - Verified the documented commands work correctly ### Automated Testing - N/A - documentation only change Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-17 21:50:24 +00:00
rltakashige	5c8a237940	Handle model timeouts (#1177 ) - Add eval with a timeout. - Add fast synch flag ## Motivation Because of the experimental FAST SYNCH flag, some models may not work. This PR catches when this occurs and allows users to specify a run without fast synch ## Changes - Adds a flag to enable or disable fast synch (--fast-synch and --no-fast-synch) - Adds a heuristic timeout - Reduces exo_bench default timeout to 10 minutes. ## Why It Works Heuristic timeout assumes normal loading times on Mac devices (60 + model size in gb / 5: e.g. DeepSeek takes up to 120 seconds to load on tensor parallel, and timeout is set to 60 + 120 = 180s. We could raise this value if necessary. ## Test Plan ### Manual Testing Catches that GPT OSS fails to load in Tensor RDMA Can launch with --no-fast-synch flag to launch GPT OSS. GPT OSS 20B TP with fast synch <img width="3064" height="456" alt="image" src="https://github.com/user-attachments/assets/f6e25cd8-8621-4e99-99fe-292ee05c4035" /> TP without fast synch <img width="3098" height="496" alt="image" src="https://github.com/user-attachments/assets/d36453d9-6686-4cfe-aa7c-a7d458369d4d" /> [Note: the performance is really not great as fast synch is off] (As a sanity check) PP with fast synch <img width="3124" height="496" alt="image" src="https://github.com/user-attachments/assets/e97d4547-c6fa-483d-badb-4b371b900b4c" /> PP without fast synch <img width="3078" height="508" alt="image" src="https://github.com/user-attachments/assets/b2e20dfd-4b0e-4295-8a92-417dfe745c28" /> PP without RDMA <img width="3070" height="498" alt="image" src="https://github.com/user-attachments/assets/a8509d68-0aef-4cda-bca5-a67d39a0801e" /> TP without RDMA <img width="3068" height="496" alt="image" src="https://github.com/user-attachments/assets/b5691429-89f4-4369-bcf2-8fde2ad7154a" />	2026-01-16 20:25:12 +00:00
rltakashige	745343c705	Return error responses for Chat Completions (#1173 ) - Error chunks - Use error handling in exo_bench.py ## Motivation Return when an error occurs so that generation stops. Adding timeouts is a separate TODO for model loading and chat completions. ## Changes - Return HTTP exceptions as JSON responses in an OpenAI compatible format. - Context manager for generation to catch and return error messages. - Use error handling in exo_bench.py. ## Test Plan ### Manual Testing Manually tested that exo_bench returns on failures within and outside generation ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - -->	2026-01-16 19:24:37 +00:00

1 2 3 4 5 ...

1954 Commits