

If it can power up and decrypt the docker volumes on its own without prompting you for a password in your basement, it will also power up and decrypt the docker volumes on its own without prompting the robbers for a password in their basement
If it can power up and decrypt the docker volumes on its own without prompting you for a password in your basement, it will also power up and decrypt the docker volumes on its own without prompting the robbers for a password in their basement
Pakistan has made guarantees that they will retaliate with their nuclear weapons if a nuclear strike occurs on Iran.
EDIT: I went to find a source and Pakistan has walked this back and are denying that they ever said it
This behavior is associated with autism
In most somewhat-but-not-completely sane places, the part of piracy that is illegal/criminal is distributing the copyrighted material, so downloading it to feed an LLM would already not really be an issue. The “problem” here is that most of these LLMs are used for commercial purposes, which is not allowed, and some claim LLMs are a form of distributing the copyrighted material.
If your LLM is only for personal use, then yeah that’s probably totally fine already, unless you live in one of the completely not sane places where both receiving and distibuting copyrighted material are illegal/criminal.
Returning the shopping cart
Dunno why this is downvoted because RAG is the correct answer. Fine tuning/training is not the tool for this job. RAG is.
You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file
What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching
Agreed.
Because they were banned.
I didn’t make this claim, no
Because they would just be inactive rather than banned? You’re the one claiming that bans have occurred, which would be a major censorship issue for lemmy…
Can you try setting the num_ctx
and num_predict
using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter
Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization
I have no problems with changing num_ctx or num_predict
Um… modlog is public. Where’s your evidence?
Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit
Ummm… did you try /set parameter num_ctx #
and /set parameter num_predict #
? Are you using a model that actually supports the context length that you desire…?
Notice how all those bot accounts that were so active leading up to the election have completely vanished from the internet now? Yeah.
My guess is an x86 32bit machine
4690k was solid! Mine is retired, though. Now I selfhost on ARM
The funny part is the leak could only have come from inside his administration and party because he withheld briefings from his political opponents