> It's worth pointing out that on their eval set for "issues resolved" they are ...

mgummelt · on March 13, 2024

You can't compare the accuracy of speech recognition to LLM task completion rates. A nearly-there yet incomplete solution to a Github issue is still valuable to an engineer who knows how to debug it.

HarHarVeryFunny · on March 13, 2024

Sure, and no doubt people paying for speech recognition 25 years ago were finding uses for it too. It depends on your use case.

A 13% success rate is both wildly impressive and also WAY below the level where I would personally find something like this useful. I can't even see reaching for a tool that I knew would fail 90% of the time, unless I was desperate and out of ideas.

falcor84 · on March 14, 2024

I disagree. I think about this a bit as having a developer intern, on whom I can't rely to take much of a workload, and definitely nothing on the critical path, but I could say to them "Take a look at these particular well-defined tasks on the backlog and see which ones you could make some progress on" - I feel there's good value in that.

And the nice thing about an AI here is that I think it will actually find a different subset of these tasks to be easy than a human would.

HarHarVeryFunny · on March 14, 2024

Yeah, but a developer intern already has human-level AGI to support the on-the-job developer training you are going to help give them. Any LLM available today, or probably in next 5-10 years for that matter, has neither AGI nor the ability to learn on the job.

My experience of working with interns, or low-skill developers, is that the benefit normally flows one way. You are taking time out from completing the project to help them learn. Someone/something of low capability isn't going to be relieving you of the large or complex tasks that would actually be useful, and be a time saver - they are going to try to do the small/simple tasks you could have breezed through, and suck up a lot of your time having to find out and explain to them how they messed up. Of course Devin doesn't even have online learning, so he'd be making the same mistakes over and over.

oytis · on March 13, 2024

> A nearly-there yet incomplete solution to a Github issue is still valuable to an engineer who knows how to debug it.

Not sure if I can agree. There would definitely be a value in looking at what libraries the solution uses, but otherwise it may be easier to write it oneself, especially when the mistakes are not humanlike.

CapsAdmin · on March 14, 2024

I can see this being useful already (assuming context length is not an issue) as some sort of github service trying to solve github issues throughout the day.

Or for example if you commit todo's in your code, the ai will pick up on them and give you some options later on.

If the failure rate is 14%, just let it try a bunch of times. (half joking here)

The way I see it is at least the project issues are getting some attention, which is arguably better than no attention. If it can just fix simple things, at least you can focus on the complex things and not worry about postponing the low hanging fruits.

samstave · on March 14, 2024

What about Youre Holding It Wrong?

IE: Maybe rather than throwing it find problems, also use it to just build things from scratch - as an example:

The side hustle: make that a product:

Let someone like me who is not a coder have access to Devin for a month with the only goal of building a side hustle that brings a solo person a monthly income.

Then - Sell that so that the millions of people who have a solo idea and just need that "technical co-foumder" can use it to build. and limit it to one devin instance to a person to start...

I dont want it to do a one fell swoop - I'd like to say "build this module...

--

Have a contest where you get What can you build with Devin in TOPIC in 30 mins.

HarHarVeryFunny · on March 14, 2024

I could perhaps see more value for this, at this level of capability, in writing test cases, in cases where the project is setup in a way to let them be run and get feedback.

This would be useful in cases where test coverage is incomplete, maybe for auto-discovering/confirming bugs, and would really be needed if it's trying to fix bugs itself, especially if one dared let it commit bug fixes - would want to know that the fix worked and didn't break anything else (regression test - run other test cases too).

dukeyukey · on March 13, 2024

Even now, automatic speech recognition is a big timesaver, but you _need_ a human to look through the transcript to pick out the obviously wrong stuff, let alone the stuff that's wrong buy could be right in context.

pooya93 · on March 13, 2024

but* just wanted to mention the error because the comment was about speech recognition errors