This setting both the server and client being written in Python and running on the same computer. This is the basic code if you want to call LLMs in your server, or own computer and make it accessible ...
I attempted to run the released USO inference code on macOS (Apple Silicon, MPS) following the README steps. The code loads successfully after some local fixes (merging projector.safetensors), but ...