How to configure LM Studio server settings
LM Studio v0.2+ installed
What this does
Adjusts runtime parameters of the embedded API server including port binding, context window size, GPU offload split, temperature, and maximum token output. After completion, the server serves requests with the desired behavior and performance.
Steps
Open the Server tab and load a model. Click the Server icon in the navigation bar. Use the model selector to choose a model, then click Load to place it in memory.
Set the bind host and port. In the server configuration panel, locate the Host and Port fields. The default host is
127.0.0.1(localhost only). To allow connections from other machines, change the host to0.0.0.0.Adjust inference parameters. Locate the inference sliders for context length, temperature, and max tokens. Slide each parameter to the desired value. Changes take effect on the next request.
Enable GPU acceleration if available. If the system has a compatible GPU, locate the GPU Offload toggle and enable it. Adjust the offload percentage to match available VRAM.
Restart the server to apply changes. After modifying settings, click Stop Server (if running) then Start Server again.
Verification
curl -s http://localhost:1234/v1/models
# Expected: a JSON response showing the loaded model with applied settings
Common failures
- Port already in use — The chosen port is claimed by another process. Run
netstat -an | grep <port>to identify the conflict, or choose a different port. - Connection from remote hosts fails — The host is still bound to
127.0.0.1. Change the binding to0.0.0.0and ensure the firewall allows incoming connections. - GPU offload causes out-of-memory errors — Reduce the offload split to 50% or lower.
- Temperature slider has no effect — Some model quantizations are tuned to specific temperature ranges. Verify the model's configuration card for recommended bounds.