The demo on the homepage for the completion of the findMaxElement function is a good example of what is to come. Or maybe where we are at now?
The six lines of Python suggested for that function can also be replaced with a simple “return max(arr)”. The suggested code works but is absolute junior level.
I am terrified of what is to come. Not just horrible code but also how people who blindly “autocomplete” this code are going to stall in their skill level progress.
You may score some story points but did you actually get any better at your craft?
This is self correcting. Code of this quality won't let you ship things. You are forced to understand the last 20%-30% of details the LLM can't help you with to pass all your tests. But, it also turns out, to understand the 20% of details the LLM couldn't handle, you need to understand the 80% the LLM could handle.
I'm just not worried about this, LLMs don't ship.
In the case where it write functionally "good enough" code that performs terribly, it rewards the LLM vendor...since the LLM vendor is also often your IaC vendor. And now you need to buy more infra.
That's one hell of a synergy. Win-win-lose
I sense a new position coming up: slop cleanup engineer
So an engineer.
This needs to be shouted from the rooftops. If you could do it yourself then LLMs can be a great help, speeding things up, offering suggestions and alternatives etc.
But if you’re asking for something you don’t know how to do you might end up with junk and not even know it.
But if that junk doesn't work (which it likely won't for any worthwhile problem) then you have to get it working. And to get it working you almost always have to figure out how the junk code works. And in that process I've found is where the real magic happens. You learn by fixing, pruning, optimizing.
I think there's a whole meta level of the actual dynamic between human<>LLM interactions that is not being sufficiently talked about. I think there's, potentially, many secondary benefits that can come from using them simply due to the ways you have to react to their outputs (if a person decides to rise to that occasion).
If the junk doesn't work right from the beginning, yes. The problem is that sometimes the junk might look like it works at first, and then later you find out that it doesn't, and you ended up having to make urgent fixes on a Friday night.
> And in that process I've found is where the real magic happens
It might be good way to learn if there's someone who's supervising the process, so they _know_ that the code is incorrect, and tells you to figure out what's wrong and how to fixes.
If you are shipping this stuff yourself, this sounds like a way of deploying giant foot-guns into production.
I still think it's a better to learn if you try to understand the code from the beginning (in the same way that a person should try to understand code they read from tutorials and stackoverflow), rather than delaying the learning until something doesn't work. This is like trying to make yourself do reinforcement learning on the outputs of an LLM, which sounds really inefficient to me.
I see what you’re saying. Maybe in a novice or learning type situation having the LLM generate code you need to check for errors could be educational. We all learn from debugging afterall. On the flip I suspect for most people course questions might be better for that however. For those already good at the craft I agree there might be some unexplored secondary effects.
What I find (being in the latter category) is most LLM code output falls on the spectrum of “small snippets that work but wouldn’t have taken me long to type out anyway” to “large chunk that saves me time to write but that I have to thoroughly check/test/tweak”. In other words, the more time it saves typing the more time I have to spend on it afterwards. Novices probably spend more time on the former part of that spectrum and experienced devs on the latter. I suspect the average productivity increase across the spectrum is fairly level which means the benefits don’t really scale with user ability.
I think this tracks with the main thing people need to understand about LLMs: they are a tool. Like any tool simply having access to it doesn’t automatically make you good at the thing it helps with. It might help you learn and it might help you do the thing better, but it will not do your job for you.
There are real dangers in code that appears to run but contains sneaky problems. I once asked ChatGPT to take a data set and run a separate t-test on each group. The code it generated first took the average of each group and then ran the test on that one value. The results were wrong, but the code ran and handed off results to my downstream analysis.
Wait till they come with auto review/merge agents, or maybe there already is. gulp
On the other hand it might become a next level of abstraction.
It compiles human prompt into some intermediate code (in this case Python). Probably initial version of CPython was not perfect at all, and engineers were also terrified. If we are lucky this new "compiler" will be becoming better and better, more efficient. Never perfect, but people will be paying the same price they are already paying for not dealing directly with ASM.
Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable. With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
I’m not sure why that matters here. Users want code that solves their business need. In general most don’t care about repeatability if someone else tries to solve their problem.
The question that matters is: can businesses solve their problems cheaper for the same quality, or at lower quality while beating the previous Pareto-optimal cost/quality frontier.
Recognizable repetition can be abstracted, reducing code base and its (running) support cost.
The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
Sure. You seem to think that LLMs will be unable to identify abstraction opportunities if the code is not identical; that’s not obvious to me. Indeed there are some good (but not certain) reasons to think LLMs will be better at broad-not-deep stuff like “load codebase into context window and spot conceptual repetition”. Though I think the creative insight of figuring out what kind of abstraction is needed may be the spark that remains human for a while.
Also, maybe recognizing the repetition remains the human's job, but refactoring is exponentially easier and so again we get better code as a result.
Seems to me to be pretty early to be making confident predictions about how this is all going to pan out.
> The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
but why doesn't that happen today? Cheap code can be had by hiring in cheap locations (outsourced for example).
The reality is that customers are the ultimate arbiters, and if it satisfies them, the business will not collapse. And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
The code quality translates to speed of introduction of changes, fixes of defects and amount of user-facing defects.
While customers may not express any care about code quality directly they can and will express (dis)satisfaction with performance and defects of the product.
It happens today. However, companies fail for multiple problems that come together. Bad software quality (from whatever source) is typically not a very visible one among them because when business people take over, they only see (at most) that software development/maintenance cost more money that it could yield.
It is happening. There is a lot of bad software out there. Terrible to use, but still functional enough that it keeps selling. The question is how much crap you can pile on top of that already bad code before it falls apart.
> Cheap code can be had by hiring in cheap locations (outsourced for example).
If you outsource and like what you get, you would assume the place you outsourced to can help provide continued support. What assurance do you have with LLMs? A working solution doesn't mean it can be easily maintained and/or evolved.
> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
That is true, but they will complain if bugs cannot be fixed and features are added. It is true that customers don't care, and they shouldn't, until it does matter, of course.
The challenge with software development isn't necessarily with the first iteration, but rather it is with continued support. Where I think LLMs can really shine is in providing domain experts (those who understand the problem) with a better way to demonstrate their needs.
Recognizable repetition can be abstracted
... which is the whole idea behind training, isn't it?
The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
The problem is really the opposite -- most programmers are employed to create very minor variations on work done either by other programmers elsewhere, by other programmers in the same organization, or by their own younger selves. The resulting inefficiency is massive in human terms, not just in managerial metrics. Smart people are wasting their lives on pointlessly repetitive work.
When it comes to the art of computer programming, there are more painters than there are paintings to create. That's why a genuinely-new paradigm is so important, and so overdue... and it's why I get so frustrated when supposed "hackers" stand in the way.
>> Recognizable repetition can be abstracted
> ... which is the whole idea behind training, isn't it?
The comment I was answering specifically dismissed LLM's inability to answer same question with same... answer as unimportant. My point is that this ability is crucial to software engineering - answers to similar problems should be as similar as possible.
Also, I bet that LLM's are not trained to abstract. In my experience they lately are trained to engage users in pointless dialogue as long as possible.
No, only the spec is important. How the software implements the spec is not important in the least. (To the extent that's not true, fix the spec!)
Nor is whether the implementation is the same from one build to the next.
LLMs use pseudo-random numbers. You can set the seed and get exactly the same output with the same model and input.
you won't because floating point arithmetic isn't associative
Unfortunately, this is only deterministic on the same hardware, but there is no reason why one couldn't write reasonably efficient LLM kernels. It just has not been a priority.
Nevertheless, I still agree with the main point that it is difficult to get LLMs to produce the same output reliably. A small change in the context might trigger all kinds of changes in the generated code.
> Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable.
As long as you consider C and dragons flying out of your nose predictable.
(Insert similar quip about hardware)
There is no reason to assume that say C compiler generates the same machine code for the same source code. AFAIK, a C compiler that chooses randomly between multiple C-semantically equivalent sequences of instructions is a valid C compiler.
With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
That's something we'll have to give up and get over.
See also: understanding how the underlying code actually works. You don't need to know assembly to use a high-level programming language (although it certainly doesn't hurt), and you won't need to know a high-level programming language to write the functional specs in English that the code generator model uses.
I say bring it on. 50+ years was long enough to keep doing things the same way.
Even compiling code isn't deterministic given different compilers and different items installed on a machine can influence the final resulting code, right? Ideally they shouldn't have any noticeable impact, but in edge cases it might, which is why you compile your code once during a build step and then deploy the same compiled code to different environments instead of compiling it per environment.
> With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
Set temperature appropriately, that problem is then solved, no?
No, it is much more involved and not all providers allow the necessary tweakings. This means you will need to use local models (with hardware caveats) which will require us to ask:
- Are local models good enough?
- What are we giving up for deterministic behaviour?
For example, will it be much more difficult to write prompts. Will the output be nonsensical and more.
Aren't some models deterministic with temperature set to 0?
assuming you have full control over which compiler youre using for each step ;)
What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance
> assuming you have full control over which compiler youre using for each step ;)
With existing tools, we know if we need to do something, we can. The issue with LLMs, is they are very much black boxes.
> What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance
Honestly, having a compiler interface for LLMs isn't a bad idea...for some use cases. What I don't see us being able to do is use natural language to build complex apps in a deterministic manner. Solving this problem would require turning LLMs into deterministic machines, which I don't believe will be an easy task, given how LLMs work today.
I'm a strong believer in that LLMs will change how we develop and create software development tools. In the past, you would need Google and Microsoft level of funding to integrate natural language into a tool, but with LLMs, we can easily have LLMs parse input and have it map to deterministic functions in days.
It may be a “level of abstraction”, but not a good one, because it is imprecise.
When you want to make changes to the code (which is what we spend most of our time on), you’ll have to either (1) modify the prompt and accept the risk of using the new code or (2) modify the original code, which you can’t do unless you know the lower level of abstraction.
No goal to become a programmer– But I like to build programs.
Build a rather complex AI-ecosystem simulator with me as the director and GPT-4 now Claude 3.5 as the programmer.
Would never have been able to do this beforehand.
I think there is a big difference between an abstraction layer that can improve -- one where you maybe write "code" in prompts and then have a compiler build through real code, allowing that compiler to get better over time -- and an interactive tool that locks bad decisions autocompleted today into both your codebase and your brain, involving you still working at the lower layer but getting low quality "help" in your editor. I am totally pro- compilers and high-level languages, but I think the idea of writing assembly with the help of a partial compiler where you kind of write stuff and then copy/paste the result into your assembly file with some munging to fix issues is dumb.
By all means, though: if someone gets us to the point where the "code" I am checking in is a bunch of English -- for which I will likely need a law degree in addition to an engineering background to not get evil genie with a cursed paw results from it trying to figure out what I must have meant from what I said :/ -- I will think that's pretty cool and will actually be a new layer of abstraction in the same class as compiler... and like, if at that point I don't use it, it will only be because I think it is somehow dangerous to humanity itself (and even then I will admit that it is probably more effective)... but we aren't there yet and "we're on the way there" doesn't count anywhere near as much as people often want it to ;P.
The most underrated thing I do on nearly every cursor suggestion is to follow up with “are there any better ways to do this?”.
A deeper version of the same idea is to ask a second model to check the first model’s answers. aider’s “architect” is an automated version of this approach.
I always ask it to "analyze approached to achieve X and then make a suggestion, no code" in the chat. Then a refinement step where I give feedback on the generated code. I also always try to give it an "out" between making changes and keeping it to same to stave off the bias of action.
Yea, the "analyze and explain but no code yet" approach works well. Let's me audit its approach beforehand.
I used to know things. Then they made Google, and I just looked things up. But at least I could still do things. Now we have AI, and I just ask it to do things for me. Now I don't know anything and I can't do anything.
Programmers (and adjacent positions) of late strike me as remarkably shortsighted and myopic.
Cheering for remote work leading to loads of new positions being offered overseas opposed to domestically, and now loudly celebrating LLMs writing "boilerplate" for them.
How folks don't see the consequences of their actions is remarkable to me.
In both cases, you get what you pay for.
I feel like I've seen this comment so many times but actually genuine. The cult like dedication is kind of baffling.
I think that example says more about the company that chose to put that code as a demo in their homepage.
LLMs also love to double down on solutions that don't work.
Case in point, I'm working on a game that's essentially a website right now. Since I'm very very bad with web design I'm using an LLM.
It's perfect 75% of the time. The other 25% it just doesn't work. Multiple LLMs will misunderstand basic tasks. Let's add properties and invent functions.
It's like you had hired a college junior who insists their never wrong and keeps pushing non functional code.
The entire mindset is whatever it's close enough, good luck.
God forbid you need to do anything using an uncommon node module or anything like that.
> LLMs also love to double down on solutions that don't work.
“Often wrong but never in doubt” is not proprietary to LLMs. It’s off-putting and we want them to be correct and to have humility when they’re wrong. But we should remember LLMs are trained on work created by people, and many of those people have built successful careers being exceedingly confident in solutions that don’t work.
The issue is LLMs never say:
"I don't know how to do this".
When it comes to programming. Tell me you don't know so I can do something else. I ended up just refactoring my UX to work around it. In this case it's a personal prototype so it's not a big deal.
That is definitely an issue with many LLMs. I've had limited success including instructions like "Don't invent facts" in the system prompt and more success saying "that was not correct. Please answer again and check to ensure your code works before giving it to me" within the context of chats. More success still comes from requesting second opinions from a different model -- e.g. asking Claude's opinion of Qwen's solution.
To the other point, not admitting to gaps in knowledge or experience is also something that people do all the time. "I copied & pasted that from the top answer in Stack Overflow so it must be correct!" is a direct analog.
So now you have an overconfident human using an overconfident tool, both of which will end up coding themselves into a corner? Compilers at least, for the most part, offer very definitive feedback that act as guard rails to those overconfident humans.
Also, let's not forget LLMs are a product of the internet and anonymity. Human interaction on the internet is significantly different from in person interaction, where typically people are more humble and less overconfident. If someone at my office acted like some overconfident SO/reddit/HN users I would probably avoid them like the plague.
A compiler in the mix is very helpful. That and other sanity checks wielded by a skilled engineer doing code reviews can provide valuable feedback to other developers and to LLMs. The knowledgeable human in the loop makes the coding process and final products so much better. Two LLMs with tool usage capabilities reviewing the code isn't as good today but is available today.
The LLMs overconfidence is based on it spitting out the most-probable tokens based on its training data and your prompt. When LLMs learn real hubris from actual anonymous internet jackholes, we will have made significant progress toward AGI.
> people who blindly “autocomplete” this code are going to stall in their skill level progress
AI is just going to widen the skill level bell curve. Enables some people to get away with far more mediocre work than before, but also enables some people to become far more capable. You can't make someone put in more effort, but the ones who do will really shine.
Anybody care to comment whether the quality of the existing code influences how good the AI's assistance is? In other words, would they suggest sloppy code where the existing code is sloppy and better (?) code when the existing code is good?
What do you think? (I don't mean that in a snarky way.) Based on how LLMs work, I can't see how that would not be the case.
But in my experience there are nuances to this. It's less about "good" vs "bad"/"sloppy" code and more about discernable. If it's discernably sloppy (i.e. the type of sloppy a beginning programmer might do which is familiar to all of us) I would say that's better than opaque "good" code (good really only meaning functional).
These things predict tokens. So when you use them, help them increase their chances of predicting the thing you want. Good comments on code, good function names, explain what you don't know, etc. etc. The same things you would ideally do if working with another person on a codebase.
Reminds me of the 2000s outsourcing hype. I made a lot of money cleaning up that mess.
Entire projects late, buggy, unreadable and unmaintainable.
Business pay big when they need to recover from that kind of thing and save face to investors.
As a cybersecurity professional (as in, the more cybersecurity problems there are, the less likely I am to ever find myself out of a job), I'm rooting for AI!
Keep in mind that this is the stupidest the LLM will ever be and we can expect major improvements every few months. On the other hand junior devs will always be junior devs. At some point python and C++ will be like assembly now, something that’s always out there but not something the vast majority of developers will ever need to read or write.
My experience observing commercial LLM's since the release of GPT-4 is actually the opposite of this.
Sure, they've gotten much cheaper on a per-token basis, but that cost reduction has come with a non-trivial accuracy/reliability cost.
The problem is, tokens that are 10x cheaper are still useless if what they say is straight up wrong.
> Sure, they've gotten much cheaper on a per-token basis, but that cost reduction has come with a non-trivial accuracy/reliability cost.
This only holds for OpenAI.
> Keep in mind that this is the stupidest the LLM will ever be and we can expect major improvements every few months.
We have seen no noticable improvements (at usable prices) for 7 months, when the original Sonnet 3.5 came out.
Maybe specialized hardware for LLM inference will improve so rapidly that o1 (full) will be quick and cheap enough a year from now, but it seems extremely unlikely. For the end user, the top models hadn't gotten cheaper for kore than a year until the release of Deepseek v3 a few weeks ago. Even that is currently very slow at non-Deepseek providers, and who knows just how subsidized the pricing and speed at Deepseek itself is, given political interests.
No major AI advancements for 7 months? Guess everyone's jobs are safe for another year, and after that we're all dead?
> No major AI advancements for 7 months?
For my caveat "at usable prices", no, there haven't been any. o1 (full) and now o3 have been advancements, but are hardly available for real-world use given limitations and pricing.
> we can expect major improvements every few months.
I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5. I do believe things will improve over time, mainly due to advancements in hardware. With better hardware, we can better brute force correct answers.
> junior devs will always be junior devs
Junior developers turn into senior developers over time.
> I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5.
Progress by Google, meta, Microsoft, Qwen and Deepseek is unhampered by OpenAI’s schedule. Their latest — including Gemini 2.0, Llama 3.3, Phi 4 — and the coding fine tunes that follow are all pretty good.
> unhampered by OpenAI’s schedule
Sure, but if the advancements are to catch up to OpenAI, then major improvements by other vendors are nice and all, but I don't believe that was what the commenter was implying. Right now the leaders in my opinion are OpenAI and Anthropic and unless they are making major improvements every few months, the industry as a whole is not making major improvements.
OpenAI and Anthropic are definitely among the leaders. Playing catch-up to these leaders' mind-share and technology is some of the motivation for others. Calling the progress being made in the space by Google (Gemini), MSFT (Phi), Meta (llama), Alibaba (Qwen) "nice and all" is a position you might be pleasantly surprised to reconsider if this technology interests you. And don't sleep on Apple and AMZ -
In the space covered by Tabby, Copilot, aider, Continue and others, capabilities continue to improve considerably month-over-month.
In the segments of the industry I care most about, I agree 100% with what the commenter said w/r/t expecting major improvements every few months. Pay even passing attention to huggingface and github and see work being done by indies as well as corporate behemoths happening at breakneck pace. Some work is pushing the SOTA. Some is making the SOTA more widely available. Lots of it is different approaches to solving similar challenges. Most of it benefits consumers and creators looking use and learn from all of this.
I wish this was true as being a shitty programmer who is old , I would benefit from this as much as anyone here but I think it is delusional.
From my experience I wouldn't even say LLMs are stupid. The LLM is a carrier and the intelligence is in the training data. Unfortunately, the training data is not going to get smarter.
If any of this had anything to do with reality then we should already have a programming specific model only trained on CS and math textbooks that is awesome. Of course, that doesn't work because the LLM is not abstracting the concepts how we normally think of in order to be stupid or intelligent.
It hardly shocking that next token prediction on math and CS textbooks is of limited use. You hardly have to think about it to see how flawed the whole idea is.
GitHub Copilot came out in 2021.
> I am terrified of what is to come.
Don't worry. Like everything else in life, you get what you pay for.
The silver lining is that the value of your skills is going up.
> The suggested code works but is absolute junior level
This isn't far the current status quo. Good software companies pay for people who write top quality code, and the rest pay juniors to work far above their pay grade or offshore it to the cheapest bidder. Now it will be offloaded to LLM's instead. Same code, different writer, same work for a contractor who knows what they're doing to come and fix later.
And so the cycle continues.
I mean you can treat it as just a general pseudocode-ish implementation of an O(n) find_max algorithm. Tons of people use Python to illustrate algorithms.
(Not to hide your point though -- people please review your LLM-generated code!)
Never imagined our project would make it to the HN front page on Sunday!
Tabby has undergone significant development since its launch two years ago [0]. It is now a comprehensive AI developer platform featuring code completion and a codebase chat, with a team [1] / enterprise focus (SSO, Access Control, User Authentication).
Tabby's adopters [2][3] have discovered that Tabby is the only platform providing a fully self-service onboarding experience as an on-prem offering. It also delivers performance that rivals other options in the market. If you're curious, I encourage you to give it a try!
As someone unfamiliar with local AIs and eager to try, how does the “run tabby in 1 minute”[1] compare to e.g. chatgpt’s free 4o-mini? Can I run that docker command on a medium specced macbook pro and have an AI that is comparably fast and capable? Or are we not there (yet)?
Edit: looks like there is a separate page with instructions for macbooks[2] that has more context.
> The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA or ROCm.
A teeny tiny model such as a 1.5B model is really dumb, and not good at interactively generating code in a conversational way, but models in the 3B or less size can do a good job of suggesting tab completions.
There are larger "open" models (in the 32B - 70B range) that you can run locally that should be much, much better than gpt-4o-mini at just about everything, including writing code. For a few examples, llama3.3-70b-instruct and qwen2.5-coder-32b-instruct are pretty good. If you're really pressed for RAM, qwen2.5-coder-7b-instruct or codegemma-7b-it might be okay for some simple things.
> medium specced macbook pro
medium specced doesn't mean much. How much RAM do you have? Each "B" (billion) of parameters is going to require about 1GB of RAM, as a rule of thumb. (500MB for really heavily quantized models, 2GB for un-quantized models... but, 8-bit quants use 1GB, and that's usually fine.)
Also context size significantly impacts ram/vram usage and in programming those chats get big quickly
Thanks for your explanation! Very helpful!
Side question : open source models tend to be less "smart" than private ones, do you intend to compensate by providing a better context (eg query relevant technology docs to feed context)?
> Toggle IDE / Extensions telemetry
Cannot be turned off in the Community Edition. What does this telemetry data contain?
For something similar I use Continue.dev with ollama, it’s always nice to see more tools in the space! But as usual, you need pretty formidable hardware to run the actually good models, like the 32B version of Qwen2.5-coder.
All the examples are for code that would otherwise be found in a library. Some of the code is of dubious quality.
LLMs - a spam bot for your codebase?
> How to utilize multiple NVIDIA GPUs?
| Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.
So using 2 NVLinked GPU's with inference is not supported? Or is that situation different because NVLink treats the two GPU as a single one?
> So using 2 NVLinked GPU's with inference is not supported?
I see. So this is like, I can have tabby be my LLM server with this limitation or I can just turn that feature off and point tabby at my self hosted LLM as any other OpenAI compatible endpoint?
Yes - however, the FIM model requires careful configuration to properly set the prompt template.
Awesome project! I love the idea of not sending my data to a big company and trust their TOS.
The effectiveness of coding assistant is directly proportional to context length and the open models you can run on your computer are usually much smaller. Would love to see something more quantified around the usefulness on more complex codebases.
I hope for proliferation of 100% local coding assistants, but for now the recommendation of "Works best on $10K+ GPU" is a show stopper, and we are forced to use the "big company". :(
It’s not really that bad. You can run some fairly big models on an Apple Silicon machine costing £2k (M4 Pro Mac Mini with 64GB RAM).
What is the recommended hardware? GPU required? Could this run OK on an older Ryzen APU (Zen 3 with Vega 7 graphics)?
The usual bottleneck for self-hosted LLMs is memory bandwidth. It doesn't really matter if there are integrated graphics or not... the models will run at the same (very slow) speed on CPU-only. Macs are only decent for LLMs because Apple has given Apple Silicon unusually high memory bandwidth, but they're still nowhere near as fast as a high-end GPU with extremely fast VRAM.
For extremely tiny models like you would use for tab completion, even an old AMD CPU is probably going to do okay.
Good to know. It also looks like you can host TabbyML as an on-premise server with docker and serve requests over a private network. Interesting to think that a self-hosted GPU server might become a thing.
That thread doesn't seem to mention hardware. It would be really helpful to just put hardware requirements in the GitHub README.
Very cool. I'm especially happy to see that there is an Eclipse client[1]. One note though: I had to dig around a bit to find the info about the Eclipse client. It's not mentioned in the main readme, or in the list of IDE extensions in the docs. Not sure if that's an oversight or because it's not "ready for prime time" yet or what.
I’ve been using something similar called Twinny.
It’s a vscode extension that connects to an ollama locally hosted LLM of your choice and works like CoPilot.
It’s an extra step to install Ollama, so not as plugnplay as tfa but the license is MIT which makes it worthwhile for me.
So does this run on your personal machine, or can you install it on a local company server and have everyone in the company connect to it?
Tabby is engineered for team usage, intended to be deployed on a shared server. However, with robust local computing resources, you can also run Tabby on your individual machine. Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.
I'm currently investigating a self hosted AI solution for my workplace.
I was wondering, how does this company make money?
From the pricing there is a free/community/opensource option, but how is the "up to 5 users" monitored?
Are you asking on a public forum, on how to get around using a product for a commercial setting by using the non-commercial version of the product?
I'm saying I don't understand their open source model. I thought open source meant you could use and modify code and run it yourself without having to pay a license. ie completely independent of the maintainer. So I was confused by this limit of how many were allowed to use something you are running yourself.
You have it wrong. Neither "open source" nor "free software" imply free-of-cost. Paid software with open and free license is very much a thing.
fyi the pricing page has a typo for "Singel Sign-On"
Appreciated! Fixed
Maybe a good product but terrible company to interview with. I went through several rounds and was basically ghosted after the 4th with no explanation or follow up. The last interview was to write an blog post for their blog which I submitted and then didn’t hear back until continuously nagging months later. It was pretty disheartening since all of the interviews were some form of a take-home and I spent a combined total of ~10 hours or more.
Such interview processes are big red flags. The company can't afford taking a risk with you and at the same time tests how desperate you are by making you work for free. They are likely short on cash and short on experience. Expect crunch and bad management. Run.
> The last interview was to write an blog post for their blog
Where you applying as a Software Dev.? Because that's not a software (or an interview) assignment.
Yes I was applying for software engineer. I think they wanted engineers who were good at explaining the product to users.
Sure. Writing and a good command of the language is important. There are multiple ways to showcase that. Writing a blog post for their blog is not one of them.
I was willing to jump through hoops—I really wanted the job.
Did the blog post get published on their blog?
IMHO companies should aim for courteous interviews, with faster decisions, and if there's any take home work then it's fully paid. I've seen your work at Beaver.digital and on GetFractals.com. If you're still looking, feel free to contact me; I'm hiring for a startup doing AI/ML data analysis. Specifically Figma + DaisyUI + TypeScript + Python + Pandas + AWS + Postgres.
Did their engineers spend time with you or did they get their blog post otherwise? I once made 1 minute videos for interview process of an AI training data company. I have a hunch they were just harvesting the data.
They did get the blog post but I don’t believe they used it; it’s possible that they didn’t think it was well written and that’s why I was ghosted but I will never know. I know they were interviewing many very talented people for the position. It’s okay to be disorganized as a startup, but I think that keeping people happy, employee or otherwise, should always be the top priority. It would have taken just a few seconds to write an email to me to reject me, and by not doing so, this comment has probably evolved into a big nightmare for them. I didn’t expect it to get this much attention, but yeah; I guess my general sentiment is shared by many.
Were you at least paid?
you know that paid interview processes are not the norm, "at least" is unlikely
If I was paid, I probably wouldn't be complaining publicly. :-) It's probably better for both interests if these types of engagements are paid.
I've worked with paid take home tests for a while, but stopped again. Hiring managers started to make the assignments more convoluted, i.e. stopped respecting the candidate's time. Candidates, on the flip side, always said they don't want to bother with the bureaucracy of writing an invoice and reporting it for their taxes etc., so didn't want to be paid.
Now my logic is: If a take home test is designed to take more than two hours, we need to redesign it. Two hours of interviews, two hours of take home test, that ought to suffice.
If we're still unsure after that, I sometimes offered the candidate a time limited freelance position, paid obviously. We've ended up hiring everyone who went into that process though.
I just finished interviewing with a company called Infisical. The take-homes were crazy (the kind of thing that normally takes a few days or a week). I was paid but it took me 12 hours.
Hope they paid for the work.
Did they post the blog publicly?
[deleted]
your first mistake was doing any kind of take-home exercise at all.
At least per Github, the TabbyML project is older than the TabbyAPI project.
Also, wildly more popular, to the tune of several magnitudes more forks and stars. If anything, this question should be asked of the TabbyAPI project.
I'm not sure what's going on with TabbyAPI's github metrics, but exl2 quants are very popular among nvidia local LLM crowd and TabbyAPI comes in tons of reddit posts of people using it. Might be just my bubble, not saying they're not accurate, just generally surprised such a useful project has under 1k stars. On the flip side, LLMs will hallucinate about TabbyML if you ask it TabbyAPI related questions, so I'd agree the naming is unfortunate.
The demo on the homepage for the completion of the findMaxElement function is a good example of what is to come. Or maybe where we are at now?
The six lines of Python suggested for that function can also be replaced with a simple “return max(arr)”. The suggested code works but is absolute junior level.
I am terrified of what is to come. Not just horrible code but also how people who blindly “autocomplete” this code are going to stall in their skill level progress.
You may score some story points but did you actually get any better at your craft?
This is self correcting. Code of this quality won't let you ship things. You are forced to understand the last 20%-30% of details the LLM can't help you with to pass all your tests. But, it also turns out, to understand the 20% of details the LLM couldn't handle, you need to understand the 80% the LLM could handle.
I'm just not worried about this, LLMs don't ship.
In the case where it write functionally "good enough" code that performs terribly, it rewards the LLM vendor...since the LLM vendor is also often your IaC vendor. And now you need to buy more infra.
That's one hell of a synergy. Win-win-lose
I sense a new position coming up: slop cleanup engineer
So an engineer.
This needs to be shouted from the rooftops. If you could do it yourself then LLMs can be a great help, speeding things up, offering suggestions and alternatives etc.
But if you’re asking for something you don’t know how to do you might end up with junk and not even know it.
But if that junk doesn't work (which it likely won't for any worthwhile problem) then you have to get it working. And to get it working you almost always have to figure out how the junk code works. And in that process I've found is where the real magic happens. You learn by fixing, pruning, optimizing.
I think there's a whole meta level of the actual dynamic between human<>LLM interactions that is not being sufficiently talked about. I think there's, potentially, many secondary benefits that can come from using them simply due to the ways you have to react to their outputs (if a person decides to rise to that occasion).
If the junk doesn't work right from the beginning, yes. The problem is that sometimes the junk might look like it works at first, and then later you find out that it doesn't, and you ended up having to make urgent fixes on a Friday night.
> And in that process I've found is where the real magic happens
It might be good way to learn if there's someone who's supervising the process, so they _know_ that the code is incorrect, and tells you to figure out what's wrong and how to fixes.
If you are shipping this stuff yourself, this sounds like a way of deploying giant foot-guns into production.
I still think it's a better to learn if you try to understand the code from the beginning (in the same way that a person should try to understand code they read from tutorials and stackoverflow), rather than delaying the learning until something doesn't work. This is like trying to make yourself do reinforcement learning on the outputs of an LLM, which sounds really inefficient to me.
I see what you’re saying. Maybe in a novice or learning type situation having the LLM generate code you need to check for errors could be educational. We all learn from debugging afterall. On the flip I suspect for most people course questions might be better for that however. For those already good at the craft I agree there might be some unexplored secondary effects.
What I find (being in the latter category) is most LLM code output falls on the spectrum of “small snippets that work but wouldn’t have taken me long to type out anyway” to “large chunk that saves me time to write but that I have to thoroughly check/test/tweak”. In other words, the more time it saves typing the more time I have to spend on it afterwards. Novices probably spend more time on the former part of that spectrum and experienced devs on the latter. I suspect the average productivity increase across the spectrum is fairly level which means the benefits don’t really scale with user ability.
I think this tracks with the main thing people need to understand about LLMs: they are a tool. Like any tool simply having access to it doesn’t automatically make you good at the thing it helps with. It might help you learn and it might help you do the thing better, but it will not do your job for you.
There are real dangers in code that appears to run but contains sneaky problems. I once asked ChatGPT to take a data set and run a separate t-test on each group. The code it generated first took the average of each group and then ran the test on that one value. The results were wrong, but the code ran and handed off results to my downstream analysis.
Wait till they come with auto review/merge agents, or maybe there already is. gulp
On the other hand it might become a next level of abstraction.
Machine -> Asm -> C -> Python -> LLM (Human language)
It compiles human prompt into some intermediate code (in this case Python). Probably initial version of CPython was not perfect at all, and engineers were also terrified. If we are lucky this new "compiler" will be becoming better and better, more efficient. Never perfect, but people will be paying the same price they are already paying for not dealing directly with ASM.
> Machine -> Asm -> C -> Python -> LLM (Human language)
Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable. With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
I’m not sure why that matters here. Users want code that solves their business need. In general most don’t care about repeatability if someone else tries to solve their problem.
The question that matters is: can businesses solve their problems cheaper for the same quality, or at lower quality while beating the previous Pareto-optimal cost/quality frontier.
Recognizable repetition can be abstracted, reducing code base and its (running) support cost.
The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
Sure. You seem to think that LLMs will be unable to identify abstraction opportunities if the code is not identical; that’s not obvious to me. Indeed there are some good (but not certain) reasons to think LLMs will be better at broad-not-deep stuff like “load codebase into context window and spot conceptual repetition”. Though I think the creative insight of figuring out what kind of abstraction is needed may be the spark that remains human for a while.
Also, maybe recognizing the repetition remains the human's job, but refactoring is exponentially easier and so again we get better code as a result.
Seems to me to be pretty early to be making confident predictions about how this is all going to pan out.
> The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
but why doesn't that happen today? Cheap code can be had by hiring in cheap locations (outsourced for example).
The reality is that customers are the ultimate arbiters, and if it satisfies them, the business will not collapse. And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
The code quality translates to speed of introduction of changes, fixes of defects and amount of user-facing defects.
While customers may not express any care about code quality directly they can and will express (dis)satisfaction with performance and defects of the product.
It happens today. However, companies fail for multiple problems that come together. Bad software quality (from whatever source) is typically not a very visible one among them because when business people take over, they only see (at most) that software development/maintenance cost more money that it could yield.
It is happening. There is a lot of bad software out there. Terrible to use, but still functional enough that it keeps selling. The question is how much crap you can pile on top of that already bad code before it falls apart.
> Cheap code can be had by hiring in cheap locations (outsourced for example).
If you outsource and like what you get, you would assume the place you outsourced to can help provide continued support. What assurance do you have with LLMs? A working solution doesn't mean it can be easily maintained and/or evolved.
> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
That is true, but they will complain if bugs cannot be fixed and features are added. It is true that customers don't care, and they shouldn't, until it does matter, of course.
The challenge with software development isn't necessarily with the first iteration, but rather it is with continued support. Where I think LLMs can really shine is in providing domain experts (those who understand the problem) with a better way to demonstrate their needs.
Recognizable repetition can be abstracted
... which is the whole idea behind training, isn't it?
The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
The problem is really the opposite -- most programmers are employed to create very minor variations on work done either by other programmers elsewhere, by other programmers in the same organization, or by their own younger selves. The resulting inefficiency is massive in human terms, not just in managerial metrics. Smart people are wasting their lives on pointlessly repetitive work.
When it comes to the art of computer programming, there are more painters than there are paintings to create. That's why a genuinely-new paradigm is so important, and so overdue... and it's why I get so frustrated when supposed "hackers" stand in the way.
Also, I bet that LLM's are not trained to abstract. In my experience they lately are trained to engage users in pointless dialogue as long as possible.
No, only the spec is important. How the software implements the spec is not important in the least. (To the extent that's not true, fix the spec!)
Nor is whether the implementation is the same from one build to the next.
LLMs use pseudo-random numbers. You can set the seed and get exactly the same output with the same model and input.
you won't because floating point arithmetic isn't associative
and the GPU scheduler isn't deterministic
You can set PyTorch to deterministic mode with a small performance penalty: https://pytorch.org/docs/stable/notes/randomness.html#avoidi...
Unfortunately, this is only deterministic on the same hardware, but there is no reason why one couldn't write reasonably efficient LLM kernels. It just has not been a priority.
Nevertheless, I still agree with the main point that it is difficult to get LLMs to produce the same output reliably. A small change in the context might trigger all kinds of changes in the generated code.
> > Machine -> Asm -> C -> Python -> LLM (Human language)
> Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable.
As long as you consider C and dragons flying out of your nose predictable.
(Insert similar quip about hardware)
There is no reason to assume that say C compiler generates the same machine code for the same source code. AFAIK, a C compiler that chooses randomly between multiple C-semantically equivalent sequences of instructions is a valid C compiler.
With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
That's something we'll have to give up and get over.
See also: understanding how the underlying code actually works. You don't need to know assembly to use a high-level programming language (although it certainly doesn't hurt), and you won't need to know a high-level programming language to write the functional specs in English that the code generator model uses.
I say bring it on. 50+ years was long enough to keep doing things the same way.
Even compiling code isn't deterministic given different compilers and different items installed on a machine can influence the final resulting code, right? Ideally they shouldn't have any noticeable impact, but in edge cases it might, which is why you compile your code once during a build step and then deploy the same compiled code to different environments instead of compiling it per environment.
> With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
Set temperature appropriately, that problem is then solved, no?
No, it is much more involved and not all providers allow the necessary tweakings. This means you will need to use local models (with hardware caveats) which will require us to ask:
- Are local models good enough?
- What are we giving up for deterministic behaviour?
For example, will it be much more difficult to write prompts. Will the output be nonsensical and more.
Aren't some models deterministic with temperature set to 0?
assuming you have full control over which compiler youre using for each step ;)
What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance
> assuming you have full control over which compiler youre using for each step ;)
With existing tools, we know if we need to do something, we can. The issue with LLMs, is they are very much black boxes.
> What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance
Honestly, having a compiler interface for LLMs isn't a bad idea...for some use cases. What I don't see us being able to do is use natural language to build complex apps in a deterministic manner. Solving this problem would require turning LLMs into deterministic machines, which I don't believe will be an easy task, given how LLMs work today.
I'm a strong believer in that LLMs will change how we develop and create software development tools. In the past, you would need Google and Microsoft level of funding to integrate natural language into a tool, but with LLMs, we can easily have LLMs parse input and have it map to deterministic functions in days.
It may be a “level of abstraction”, but not a good one, because it is imprecise.
When you want to make changes to the code (which is what we spend most of our time on), you’ll have to either (1) modify the prompt and accept the risk of using the new code or (2) modify the original code, which you can’t do unless you know the lower level of abstraction.
Recommended reading: https://ian-cooper.writeas.com/is-ai-a-silver-bullet
Yup!
No goal to become a programmer– But I like to build programs.
Build a rather complex AI-ecosystem simulator with me as the director and GPT-4 now Claude 3.5 as the programmer.
Would never have been able to do this beforehand.
I think there is a big difference between an abstraction layer that can improve -- one where you maybe write "code" in prompts and then have a compiler build through real code, allowing that compiler to get better over time -- and an interactive tool that locks bad decisions autocompleted today into both your codebase and your brain, involving you still working at the lower layer but getting low quality "help" in your editor. I am totally pro- compilers and high-level languages, but I think the idea of writing assembly with the help of a partial compiler where you kind of write stuff and then copy/paste the result into your assembly file with some munging to fix issues is dumb.
By all means, though: if someone gets us to the point where the "code" I am checking in is a bunch of English -- for which I will likely need a law degree in addition to an engineering background to not get evil genie with a cursed paw results from it trying to figure out what I must have meant from what I said :/ -- I will think that's pretty cool and will actually be a new layer of abstraction in the same class as compiler... and like, if at that point I don't use it, it will only be because I think it is somehow dangerous to humanity itself (and even then I will admit that it is probably more effective)... but we aren't there yet and "we're on the way there" doesn't count anywhere near as much as people often want it to ;P.
The most underrated thing I do on nearly every cursor suggestion is to follow up with “are there any better ways to do this?”.
A deeper version of the same idea is to ask a second model to check the first model’s answers. aider’s “architect” is an automated version of this approach.
https://aider.chat/docs/usage/modes.html#architect-mode-and-...
I always ask it to "analyze approached to achieve X and then make a suggestion, no code" in the chat. Then a refinement step where I give feedback on the generated code. I also always try to give it an "out" between making changes and keeping it to same to stave off the bias of action.
Yea, the "analyze and explain but no code yet" approach works well. Let's me audit its approach beforehand.
I used to know things. Then they made Google, and I just looked things up. But at least I could still do things. Now we have AI, and I just ask it to do things for me. Now I don't know anything and I can't do anything.
Programmers (and adjacent positions) of late strike me as remarkably shortsighted and myopic.
Cheering for remote work leading to loads of new positions being offered overseas opposed to domestically, and now loudly celebrating LLMs writing "boilerplate" for them.
How folks don't see the consequences of their actions is remarkable to me.
In both cases, you get what you pay for.
I feel like I've seen this comment so many times but actually genuine. The cult like dedication is kind of baffling.
I think that example says more about the company that chose to put that code as a demo in their homepage.
LLMs also love to double down on solutions that don't work.
Case in point, I'm working on a game that's essentially a website right now. Since I'm very very bad with web design I'm using an LLM.
It's perfect 75% of the time. The other 25% it just doesn't work. Multiple LLMs will misunderstand basic tasks. Let's add properties and invent functions.
It's like you had hired a college junior who insists their never wrong and keeps pushing non functional code.
The entire mindset is whatever it's close enough, good luck.
God forbid you need to do anything using an uncommon node module or anything like that.
> LLMs also love to double down on solutions that don't work.
“Often wrong but never in doubt” is not proprietary to LLMs. It’s off-putting and we want them to be correct and to have humility when they’re wrong. But we should remember LLMs are trained on work created by people, and many of those people have built successful careers being exceedingly confident in solutions that don’t work.
The issue is LLMs never say:
"I don't know how to do this".
When it comes to programming. Tell me you don't know so I can do something else. I ended up just refactoring my UX to work around it. In this case it's a personal prototype so it's not a big deal.
That is definitely an issue with many LLMs. I've had limited success including instructions like "Don't invent facts" in the system prompt and more success saying "that was not correct. Please answer again and check to ensure your code works before giving it to me" within the context of chats. More success still comes from requesting second opinions from a different model -- e.g. asking Claude's opinion of Qwen's solution.
To the other point, not admitting to gaps in knowledge or experience is also something that people do all the time. "I copied & pasted that from the top answer in Stack Overflow so it must be correct!" is a direct analog.
So now you have an overconfident human using an overconfident tool, both of which will end up coding themselves into a corner? Compilers at least, for the most part, offer very definitive feedback that act as guard rails to those overconfident humans.
Also, let's not forget LLMs are a product of the internet and anonymity. Human interaction on the internet is significantly different from in person interaction, where typically people are more humble and less overconfident. If someone at my office acted like some overconfident SO/reddit/HN users I would probably avoid them like the plague.
A compiler in the mix is very helpful. That and other sanity checks wielded by a skilled engineer doing code reviews can provide valuable feedback to other developers and to LLMs. The knowledgeable human in the loop makes the coding process and final products so much better. Two LLMs with tool usage capabilities reviewing the code isn't as good today but is available today.
The LLMs overconfidence is based on it spitting out the most-probable tokens based on its training data and your prompt. When LLMs learn real hubris from actual anonymous internet jackholes, we will have made significant progress toward AGI.
> people who blindly “autocomplete” this code are going to stall in their skill level progress
AI is just going to widen the skill level bell curve. Enables some people to get away with far more mediocre work than before, but also enables some people to become far more capable. You can't make someone put in more effort, but the ones who do will really shine.
Anybody care to comment whether the quality of the existing code influences how good the AI's assistance is? In other words, would they suggest sloppy code where the existing code is sloppy and better (?) code when the existing code is good?
What do you think? (I don't mean that in a snarky way.) Based on how LLMs work, I can't see how that would not be the case.
But in my experience there are nuances to this. It's less about "good" vs "bad"/"sloppy" code and more about discernable. If it's discernably sloppy (i.e. the type of sloppy a beginning programmer might do which is familiar to all of us) I would say that's better than opaque "good" code (good really only meaning functional).
These things predict tokens. So when you use them, help them increase their chances of predicting the thing you want. Good comments on code, good function names, explain what you don't know, etc. etc. The same things you would ideally do if working with another person on a codebase.
Reminds me of the 2000s outsourcing hype. I made a lot of money cleaning up that mess. Entire projects late, buggy, unreadable and unmaintainable.
Business pay big when they need to recover from that kind of thing and save face to investors.
As a cybersecurity professional (as in, the more cybersecurity problems there are, the less likely I am to ever find myself out of a job), I'm rooting for AI!
Keep in mind that this is the stupidest the LLM will ever be and we can expect major improvements every few months. On the other hand junior devs will always be junior devs. At some point python and C++ will be like assembly now, something that’s always out there but not something the vast majority of developers will ever need to read or write.
My experience observing commercial LLM's since the release of GPT-4 is actually the opposite of this.
Sure, they've gotten much cheaper on a per-token basis, but that cost reduction has come with a non-trivial accuracy/reliability cost.
The problem is, tokens that are 10x cheaper are still useless if what they say is straight up wrong.
> Sure, they've gotten much cheaper on a per-token basis, but that cost reduction has come with a non-trivial accuracy/reliability cost.
This only holds for OpenAI.
> Keep in mind that this is the stupidest the LLM will ever be and we can expect major improvements every few months.
We have seen no noticable improvements (at usable prices) for 7 months, when the original Sonnet 3.5 came out.
Maybe specialized hardware for LLM inference will improve so rapidly that o1 (full) will be quick and cheap enough a year from now, but it seems extremely unlikely. For the end user, the top models hadn't gotten cheaper for kore than a year until the release of Deepseek v3 a few weeks ago. Even that is currently very slow at non-Deepseek providers, and who knows just how subsidized the pricing and speed at Deepseek itself is, given political interests.
No major AI advancements for 7 months? Guess everyone's jobs are safe for another year, and after that we're all dead?
> No major AI advancements for 7 months?
For my caveat "at usable prices", no, there haven't been any. o1 (full) and now o3 have been advancements, but are hardly available for real-world use given limitations and pricing.
> we can expect major improvements every few months.
I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5. I do believe things will improve over time, mainly due to advancements in hardware. With better hardware, we can better brute force correct answers.
> junior devs will always be junior devs
Junior developers turn into senior developers over time.
> I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5.
Progress by Google, meta, Microsoft, Qwen and Deepseek is unhampered by OpenAI’s schedule. Their latest — including Gemini 2.0, Llama 3.3, Phi 4 — and the coding fine tunes that follow are all pretty good.
> unhampered by OpenAI’s schedule
Sure, but if the advancements are to catch up to OpenAI, then major improvements by other vendors are nice and all, but I don't believe that was what the commenter was implying. Right now the leaders in my opinion are OpenAI and Anthropic and unless they are making major improvements every few months, the industry as a whole is not making major improvements.
OpenAI and Anthropic are definitely among the leaders. Playing catch-up to these leaders' mind-share and technology is some of the motivation for others. Calling the progress being made in the space by Google (Gemini), MSFT (Phi), Meta (llama), Alibaba (Qwen) "nice and all" is a position you might be pleasantly surprised to reconsider if this technology interests you. And don't sleep on Apple and AMZ -
In the space covered by Tabby, Copilot, aider, Continue and others, capabilities continue to improve considerably month-over-month.
In the segments of the industry I care most about, I agree 100% with what the commenter said w/r/t expecting major improvements every few months. Pay even passing attention to huggingface and github and see work being done by indies as well as corporate behemoths happening at breakneck pace. Some work is pushing the SOTA. Some is making the SOTA more widely available. Lots of it is different approaches to solving similar challenges. Most of it benefits consumers and creators looking use and learn from all of this.
I wish this was true as being a shitty programmer who is old , I would benefit from this as much as anyone here but I think it is delusional.
From my experience I wouldn't even say LLMs are stupid. The LLM is a carrier and the intelligence is in the training data. Unfortunately, the training data is not going to get smarter.
If any of this had anything to do with reality then we should already have a programming specific model only trained on CS and math textbooks that is awesome. Of course, that doesn't work because the LLM is not abstracting the concepts how we normally think of in order to be stupid or intelligent.
It hardly shocking that next token prediction on math and CS textbooks is of limited use. You hardly have to think about it to see how flawed the whole idea is.
GitHub Copilot came out in 2021.
> I am terrified of what is to come.
Don't worry. Like everything else in life, you get what you pay for.
The silver lining is that the value of your skills is going up.
> The suggested code works but is absolute junior level
This isn't far the current status quo. Good software companies pay for people who write top quality code, and the rest pay juniors to work far above their pay grade or offshore it to the cheapest bidder. Now it will be offloaded to LLM's instead. Same code, different writer, same work for a contractor who knows what they're doing to come and fix later.
And so the cycle continues.
I mean you can treat it as just a general pseudocode-ish implementation of an O(n) find_max algorithm. Tons of people use Python to illustrate algorithms.
(Not to hide your point though -- people please review your LLM-generated code!)
Never imagined our project would make it to the HN front page on Sunday!
Tabby has undergone significant development since its launch two years ago [0]. It is now a comprehensive AI developer platform featuring code completion and a codebase chat, with a team [1] / enterprise focus (SSO, Access Control, User Authentication).
Tabby's adopters [2][3] have discovered that Tabby is the only platform providing a fully self-service onboarding experience as an on-prem offering. It also delivers performance that rivals other options in the market. If you're curious, I encourage you to give it a try!
[0]: https://www.tabbyml.com
[1]: https://demo.tabbyml.com/search/how-to-add-an-embedding-api-...
[2]: https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ
[3]: https://www.linkedin.com/posts/kelvinmu_last-week-i-introduc...
Do you have a plugin for MSVC?
Not yet, consider subscribe https://github.com/TabbyML/tabby/issues/322 for future updates!
https://github.com/codespin-ai/codespin-vscode-extension
Is it only compatible with Nvidia and Apple? Will this work with an AMD GPU?
Yes - AMD GPU is supported through vulkan backend:
https://github.com/TabbyML/tabby/releases/tag/v0.23.0
https://tabby.tabbyml.com/blog/2024/05/01/vulkan-support/
As someone unfamiliar with local AIs and eager to try, how does the “run tabby in 1 minute”[1] compare to e.g. chatgpt’s free 4o-mini? Can I run that docker command on a medium specced macbook pro and have an AI that is comparably fast and capable? Or are we not there (yet)?
Edit: looks like there is a separate page with instructions for macbooks[2] that has more context.
> The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA or ROCm.
[1]: https://github.com/TabbyML/tabby#run-tabby-in-1-minute
[2]: https://tabby.tabbyml.com/docs/quick-start/installation/appl...gpt-4o-mini might not be the best point of reference for what good LLMs can do with code: https://aider.chat/docs/leaderboards/#aider-polyglot-benchma...
A teeny tiny model such as a 1.5B model is really dumb, and not good at interactively generating code in a conversational way, but models in the 3B or less size can do a good job of suggesting tab completions.
There are larger "open" models (in the 32B - 70B range) that you can run locally that should be much, much better than gpt-4o-mini at just about everything, including writing code. For a few examples, llama3.3-70b-instruct and qwen2.5-coder-32b-instruct are pretty good. If you're really pressed for RAM, qwen2.5-coder-7b-instruct or codegemma-7b-it might be okay for some simple things.
> medium specced macbook pro
medium specced doesn't mean much. How much RAM do you have? Each "B" (billion) of parameters is going to require about 1GB of RAM, as a rule of thumb. (500MB for really heavily quantized models, 2GB for un-quantized models... but, 8-bit quants use 1GB, and that's usually fine.)
Also context size significantly impacts ram/vram usage and in programming those chats get big quickly
Thanks for your explanation! Very helpful!
Side question : open source models tend to be less "smart" than private ones, do you intend to compensate by providing a better context (eg query relevant technology docs to feed context)?
> Toggle IDE / Extensions telemetry
Cannot be turned off in the Community Edition. What does this telemetry data contain?
For something similar I use Continue.dev with ollama, it’s always nice to see more tools in the space! But as usual, you need pretty formidable hardware to run the actually good models, like the 32B version of Qwen2.5-coder.
All the examples are for code that would otherwise be found in a library. Some of the code is of dubious quality.
LLMs - a spam bot for your codebase?
> How to utilize multiple NVIDIA GPUs?
| Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.
So using 2 NVLinked GPU's with inference is not supported? Or is that situation different because NVLink treats the two GPU as a single one?
> So using 2 NVLinked GPU's with inference is not supported?
To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example
I see. So this is like, I can have tabby be my LLM server with this limitation or I can just turn that feature off and point tabby at my self hosted LLM as any other OpenAI compatible endpoint?
Yes - however, the FIM model requires careful configuration to properly set the prompt template.
Awesome project! I love the idea of not sending my data to a big company and trust their TOS.
The effectiveness of coding assistant is directly proportional to context length and the open models you can run on your computer are usually much smaller. Would love to see something more quantified around the usefulness on more complex codebases.
I hope for proliferation of 100% local coding assistants, but for now the recommendation of "Works best on $10K+ GPU" is a show stopper, and we are forced to use the "big company". :(
It’s not really that bad. You can run some fairly big models on an Apple Silicon machine costing £2k (M4 Pro Mac Mini with 64GB RAM).
What is the recommended hardware? GPU required? Could this run OK on an older Ryzen APU (Zen 3 with Vega 7 graphics)?
The usual bottleneck for self-hosted LLMs is memory bandwidth. It doesn't really matter if there are integrated graphics or not... the models will run at the same (very slow) speed on CPU-only. Macs are only decent for LLMs because Apple has given Apple Silicon unusually high memory bandwidth, but they're still nowhere near as fast as a high-end GPU with extremely fast VRAM.
For extremely tiny models like you would use for tab completion, even an old AMD CPU is probably going to do okay.
Good to know. It also looks like you can host TabbyML as an on-premise server with docker and serve requests over a private network. Interesting to think that a self-hosted GPU server might become a thing.
Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.
That thread doesn't seem to mention hardware. It would be really helpful to just put hardware requirements in the GitHub README.
Very cool. I'm especially happy to see that there is an Eclipse client[1]. One note though: I had to dig around a bit to find the info about the Eclipse client. It's not mentioned in the main readme, or in the list of IDE extensions in the docs. Not sure if that's an oversight or because it's not "ready for prime time" yet or what.
[1]: https://github.com/TabbyML/tabby/tree/3bd73a8c59a1c21312e812...
I’ve been using something similar called Twinny. It’s a vscode extension that connects to an ollama locally hosted LLM of your choice and works like CoPilot.
It’s an extra step to install Ollama, so not as plugnplay as tfa but the license is MIT which makes it worthwhile for me.
https://github.com/twinnydotdev/twinny
So does this run on your personal machine, or can you install it on a local company server and have everyone in the company connect to it?
Tabby is engineered for team usage, intended to be deployed on a shared server. However, with robust local computing resources, you can also run Tabby on your individual machine. Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.
I'm currently investigating a self hosted AI solution for my workplace.
I was wondering, how does this company make money?
From the pricing there is a free/community/opensource option, but how is the "up to 5 users" monitored?
https://www.tabbyml.com/pricing
* Up to 5 users
* Local deployment
* Code Completion, Answer Engine, In-line chat & Context Provider
What if we have more than 5 users?
Are you asking on a public forum, on how to get around using a product for a commercial setting by using the non-commercial version of the product?
I'm saying I don't understand their open source model. I thought open source meant you could use and modify code and run it yourself without having to pay a license. ie completely independent of the maintainer. So I was confused by this limit of how many were allowed to use something you are running yourself.
You have it wrong. Neither "open source" nor "free software" imply free-of-cost. Paid software with open and free license is very much a thing.
If you want to drill into the details of the licenses: https://github.com/TabbyML/tabby/blob/main/LICENSE
Didn’t you mean to name it Spacey?
I've heard of tab vs spaces flamewars, but never heard of "space completion" camp.
It clearly references to LLM doing a tab completion.
WHAT? TAB COMPLETION? YOU CANT BE SERIOUS.
Just joking. But yeah, Space completion is definitely a thing. Also triggering suggestions is often Ctrl+Space
"Ctrl+space" should be air/space related software house name :)
All these things that claim to be an alternative to GitHub Copilot, none of them seem to work in VS2022... So how is it really an alternative?
All I want is a self-hosted AI assistant for VS2022. VS2022 supports plugins yes, so what gives?
Not using VSCode, would be great to have Sublime Text or Zed support.
I will go out on a limb and predict that in the next 10 years AI code assistant will be forbidden:)
How would I tell this to use an api framework it doesn’t know ?
Tabby comes with builtin RAG support so you can add this api framework to it.
Example: https://demo.tabbyml.com/search/how-to-configure-sso-in-tabb...
Settings page: https://demo.tabbyml.com/settings/providers/doc
fyi the pricing page has a typo for "Singel Sign-On"
Appreciated! Fixed
Maybe a good product but terrible company to interview with. I went through several rounds and was basically ghosted after the 4th with no explanation or follow up. The last interview was to write an blog post for their blog which I submitted and then didn’t hear back until continuously nagging months later. It was pretty disheartening since all of the interviews were some form of a take-home and I spent a combined total of ~10 hours or more.
Such interview processes are big red flags. The company can't afford taking a risk with you and at the same time tests how desperate you are by making you work for free. They are likely short on cash and short on experience. Expect crunch and bad management. Run.
> The last interview was to write an blog post for their blog
Where you applying as a Software Dev.? Because that's not a software (or an interview) assignment.
Yes I was applying for software engineer. I think they wanted engineers who were good at explaining the product to users.
Sure. Writing and a good command of the language is important. There are multiple ways to showcase that. Writing a blog post for their blog is not one of them.
I was willing to jump through hoops—I really wanted the job.
Did the blog post get published on their blog?
IMHO companies should aim for courteous interviews, with faster decisions, and if there's any take home work then it's fully paid. I've seen your work at Beaver.digital and on GetFractals.com. If you're still looking, feel free to contact me; I'm hiring for a startup doing AI/ML data analysis. Specifically Figma + DaisyUI + TypeScript + Python + Pandas + AWS + Postgres.
Did their engineers spend time with you or did they get their blog post otherwise? I once made 1 minute videos for interview process of an AI training data company. I have a hunch they were just harvesting the data.
They did get the blog post but I don’t believe they used it; it’s possible that they didn’t think it was well written and that’s why I was ghosted but I will never know. I know they were interviewing many very talented people for the position. It’s okay to be disorganized as a startup, but I think that keeping people happy, employee or otherwise, should always be the top priority. It would have taken just a few seconds to write an email to me to reject me, and by not doing so, this comment has probably evolved into a big nightmare for them. I didn’t expect it to get this much attention, but yeah; I guess my general sentiment is shared by many.
Were you at least paid?
you know that paid interview processes are not the norm, "at least" is unlikely
If I was paid, I probably wouldn't be complaining publicly. :-) It's probably better for both interests if these types of engagements are paid.
I've worked with paid take home tests for a while, but stopped again. Hiring managers started to make the assignments more convoluted, i.e. stopped respecting the candidate's time. Candidates, on the flip side, always said they don't want to bother with the bureaucracy of writing an invoice and reporting it for their taxes etc., so didn't want to be paid.
Now my logic is: If a take home test is designed to take more than two hours, we need to redesign it. Two hours of interviews, two hours of take home test, that ought to suffice.
If we're still unsure after that, I sometimes offered the candidate a time limited freelance position, paid obviously. We've ended up hiring everyone who went into that process though.
I just finished interviewing with a company called Infisical. The take-homes were crazy (the kind of thing that normally takes a few days or a week). I was paid but it took me 12 hours.
Hope they paid for the work.
Did they post the blog publicly?
your first mistake was doing any kind of take-home exercise at all.
[dead]
Duplicated https://news.ycombinator.com/item?id=35470915
Not a dupe, as that was nearly two years ago. https://news.ycombinator.com/newsfaq.html#reposts
In that case I'm going to start reposting all good old links.
Unfortunate name. Can you connect Tabby to the OpenAI-compatible TabbyAPI? https://github.com/theroyallab/tabbyAPI
I though that Tabby, the ssh client [1], got AI capabilities...
[1] https://github.com/Eugeny/tabby
At least per Github, the TabbyML project is older than the TabbyAPI project.
Also, wildly more popular, to the tune of several magnitudes more forks and stars. If anything, this question should be asked of the TabbyAPI project.
I'm not sure what's going on with TabbyAPI's github metrics, but exl2 quants are very popular among nvidia local LLM crowd and TabbyAPI comes in tons of reddit posts of people using it. Might be just my bubble, not saying they're not accurate, just generally surprised such a useful project has under 1k stars. On the flip side, LLMs will hallucinate about TabbyML if you ask it TabbyAPI related questions, so I'd agree the naming is unfortunate.