Alex Cheema
|
bcd58938de
|
clean debug logs
|
2024-07-14 23:28:55 -07:00 |
|
Alex Cheema
|
b9c323bb07
|
memory-efficient shard loading
|
2024-07-14 23:27:57 -07:00 |
|
Alex Cheema
|
53a5b3fc6a
|
add uuid requirement
|
2024-07-14 21:46:41 -07:00 |
|
Alex Cheema
|
05b9fa497d
|
initialize node id to uuid4 if not set
|
2024-07-14 21:46:30 -07:00 |
|
Alex Cheema
|
ff597d9551
|
fix discovery
|
2024-07-14 21:46:13 -07:00 |
|
Alex Cheema
|
a04974168e
|
fix model import path
|
2024-07-14 21:46:00 -07:00 |
|
Alex Cheema
|
b8a2a0fbe0
|
update readme run instruction
|
2024-07-14 21:26:56 -07:00 |
|
Alex Cheema
|
a933352ac3
|
add DEBUG flag for controlling debug logs
|
2024-07-14 21:26:45 -07:00 |
|
Alex Cheema
|
dd882fe6bc
|
experimental notice
|
2024-07-14 21:16:20 -07:00 |
|
Alex Cheema
|
c8753ba5fe
|
reshuffle readme
|
2024-07-14 21:12:55 -07:00 |
|
Alex Cheema
|
ee5204fbca
|
readme installation instructions
|
2024-07-14 21:12:17 -07:00 |
|
Alex Cheema
|
78da11e10b
|
slightly nicer readme
|
2024-07-14 21:05:41 -07:00 |
|
Alex Cheema
|
2fc472c8fe
|
slightly nicer readme
|
2024-07-14 21:03:48 -07:00 |
|
Alex Cheema
|
8ff3e263a0
|
slightly nicer readme
|
2024-07-14 21:02:25 -07:00 |
|
Alex Cheema
|
32f2e36fd3
|
main rename
|
2024-07-14 21:01:28 -07:00 |
|
Alex Cheema
|
5bbde22a23
|
move everything under exo module
|
2024-07-14 21:00:37 -07:00 |
|
Alex Cheema
|
c851644a43
|
update requirements, specify exact versions
|
2024-07-14 20:55:29 -07:00 |
|
Alex Cheema
|
32972033dd
|
update readme
|
2024-07-14 18:38:48 -07:00 |
|
Alex Cheema
|
5ef07d41a5
|
readme
|
2024-07-14 18:09:38 -07:00 |
|
Alex Cheema
|
490fa102a4
|
tinygrad inference engine
|
2024-07-14 13:07:37 -07:00 |
|
Alex Cheema
|
e6f387a690
|
handle is_finished
|
2024-07-13 23:27:34 -07:00 |
|
Alex Cheema
|
b01f69bb6b
|
add support for multiple concurrent requests with request ids
|
2024-07-13 23:11:01 -07:00 |
|
Alex Cheema
|
7077652c8e
|
graceful node shutdown
|
2024-07-13 20:43:37 -07:00 |
|
Alex Cheema
|
ca6095c04d
|
a generic test for every inference engine
|
2024-07-13 18:25:26 -07:00 |
|
Alex Cheema
|
850b72d3ea
|
make StatefulShardedModel callable, add some tests for mlx sharded inference
|
2024-07-13 15:41:15 -07:00 |
|
Alex Cheema
|
6ee0547eff
|
fix layer calculation for sharded llama
|
2024-07-13 15:39:31 -07:00 |
|
Alex Cheema
|
445eda156c
|
dynamically assign shards to nodes deterministically weighted by memory
|
2024-06-25 21:17:58 +01:00 |
|
Alex Cheema
|
36b8456798
|
collect global topology with local peer visibility, ring memory weighted partitioning strategy
|
2024-06-25 12:32:16 +01:00 |
|
Alex Cheema
|
3a66a0a4a8
|
add requirements.txt
|
2024-06-24 21:00:04 +01:00 |
|
Alex Cheema
|
ee96c6b023
|
add another test for device capabiities on MacBook Air
|
2024-06-24 20:59:55 +01:00 |
|
Alex Cheema
|
6c8c9ee7b1
|
topology with partitioning strategy
|
2024-06-24 20:56:50 +01:00 |
|
Alex Cheema
|
563dcb56b0
|
mlx sharded implementation with example of distributed inference
|
2024-06-24 19:35:57 +01:00 |
|
Alex Cheema
|
a21f59ff45
|
scaffolding for networking, inference and orchestration
|
2024-06-23 23:28:10 +01:00 |
|