microsoft/Phi-3-mini-128-instruct
not loading:
Might a library version mismatch with a missing positional parameter or a problem with the latest text-generation-interface. They can tell me if Iâm misreading their codebase and the error message.
Opened Regression: get_weights_col_packed_qkv() quantize parameter not included in calls so fails with error missing positional parfameter - merge error? · Issue #2236 · huggingface/text-generation-inference · GitHub
This looks relevant something about a missing parameter
2024-07-15T02:50:11.941076Z INFO text_generation_launcher: Detected system cuda
Polling inference server. Awaiting status 200; trying again in 5s.
2024-07-15T02:50:14.461678Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/home/workbench/.local/lib/python3.10/site-packages/typer/main.py", line 309, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/workbench/.local/lib/python3.10/site-packages/typer/core.py", line 723, in main
return _main(
File "/home/workbench/.local/lib/python3.10/site-packages/typer/core.py", line 193, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/workbench/.local/lib/python3.10/site-packages/typer/main.py", line 692, in wrapper
return callback(**use_params)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 647, in get_model
return FlashCausalLM(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 897, in __init__
model = model_class(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 479, in __init__
self.model = FlashLlamaModel(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 400, in __init__
[
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 401, in <listcomp>
FlashLlamaLayer(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 333, in __init__
self.self_attn = FlashLlamaAttention(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 150, in __init__
self.query_key_value = load_attention(config, prefix, weights, index)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 66, in load_attention
base_layer = TensorParallelColumnLinear.load_qkv(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/tensor_parallel.py", line 151, in load_qkv
weight = weights.get_weights_col_packed_qkv(
TypeError: Weights.get_weights_col_packed_qkv() missing 1 required positional argument: 'quantize'
2024-07-15T02:50:15.716159Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
and
TypeError: Weights.get_weights_col_packed_qkv() missing 1 required positional
argument: 'quantize' rank=0
Error: ShardCannotStart
2024-07-15T21:52:01.495921Z ERROR text_generation_launcher: Shard 0 failed to start
2024-07-15T21:52:01.495947Z INFO text_generation_launcher: Shutting down shards