1833 Commits

Author SHA1 Message Date
Alex Cheema
bcd58938de clean debug logs 2024-07-14 23:28:55 -07:00
Alex Cheema
b9c323bb07 memory-efficient shard loading 2024-07-14 23:27:57 -07:00
Alex Cheema
53a5b3fc6a add uuid requirement 2024-07-14 21:46:41 -07:00
Alex Cheema
05b9fa497d initialize node id to uuid4 if not set 2024-07-14 21:46:30 -07:00
Alex Cheema
ff597d9551 fix discovery 2024-07-14 21:46:13 -07:00
Alex Cheema
a04974168e fix model import path 2024-07-14 21:46:00 -07:00
Alex Cheema
b8a2a0fbe0 update readme run instruction 2024-07-14 21:26:56 -07:00
Alex Cheema
a933352ac3 add DEBUG flag for controlling debug logs 2024-07-14 21:26:45 -07:00
Alex Cheema
dd882fe6bc experimental notice 2024-07-14 21:16:20 -07:00
Alex Cheema
c8753ba5fe reshuffle readme 2024-07-14 21:12:55 -07:00
Alex Cheema
ee5204fbca readme installation instructions 2024-07-14 21:12:17 -07:00
Alex Cheema
78da11e10b slightly nicer readme 2024-07-14 21:05:41 -07:00
Alex Cheema
2fc472c8fe slightly nicer readme 2024-07-14 21:03:48 -07:00
Alex Cheema
8ff3e263a0 slightly nicer readme 2024-07-14 21:02:25 -07:00
Alex Cheema
32f2e36fd3 main rename 2024-07-14 21:01:28 -07:00
Alex Cheema
5bbde22a23 move everything under exo module 2024-07-14 21:00:37 -07:00
Alex Cheema
c851644a43 update requirements, specify exact versions 2024-07-14 20:55:29 -07:00
Alex Cheema
32972033dd update readme 2024-07-14 18:38:48 -07:00
Alex Cheema
5ef07d41a5 readme 2024-07-14 18:09:38 -07:00
Alex Cheema
490fa102a4 tinygrad inference engine 2024-07-14 13:07:37 -07:00
Alex Cheema
e6f387a690 handle is_finished 2024-07-13 23:27:34 -07:00
Alex Cheema
b01f69bb6b add support for multiple concurrent requests with request ids 2024-07-13 23:11:01 -07:00
Alex Cheema
7077652c8e graceful node shutdown 2024-07-13 20:43:37 -07:00
Alex Cheema
ca6095c04d a generic test for every inference engine 2024-07-13 18:25:26 -07:00
Alex Cheema
850b72d3ea make StatefulShardedModel callable, add some tests for mlx sharded inference 2024-07-13 15:41:15 -07:00
Alex Cheema
6ee0547eff fix layer calculation for sharded llama 2024-07-13 15:39:31 -07:00
Alex Cheema
445eda156c dynamically assign shards to nodes deterministically weighted by memory 2024-06-25 21:17:58 +01:00
Alex Cheema
36b8456798 collect global topology with local peer visibility, ring memory weighted partitioning strategy 2024-06-25 12:32:16 +01:00
Alex Cheema
3a66a0a4a8 add requirements.txt 2024-06-24 21:00:04 +01:00
Alex Cheema
ee96c6b023 add another test for device capabiities on MacBook Air 2024-06-24 20:59:55 +01:00
Alex Cheema
6c8c9ee7b1 topology with partitioning strategy 2024-06-24 20:56:50 +01:00
Alex Cheema
563dcb56b0 mlx sharded implementation with example of distributed inference 2024-06-24 19:35:57 +01:00
Alex Cheema
a21f59ff45 scaffolding for networking, inference and orchestration 2024-06-23 23:28:10 +01:00