This is the prompt describing the function call parameters:
When calling the automation, you need to provide three main parameters:
1. Title (title):
A brief descriptive name for the automation. This helps identify it at a glance. For example, "Check for recent news headlines".
2. Prompt (prompt):
The detailed instruction or request you want the automation to follow. For example:
"Search for the top 10 headlines from multiple sources, ensuring they are published within the last 48 hours, and provide a summary of any recent Russian military strikes in the Lviv Oblast."
3. Schedule (schedule):
This uses the iCalendar (iCal) VEVENT format to specify when the automation should run. For example, if you want it to run every day at 8:30 AM, you might provide:
In summary, the call typically includes:
• title (string): A short name.
• prompt (string): What you want the automation to do.
• schedule (string): The iCal VEVENT defining when it should run.
The beta is inconsistently showing (required a few refreshes to get something to show up), but my limited usage of it showed a plethora of issues:
- Assumed UTC instead of EST. Corrected it and it still continued to bork
- Added random time deltas to my asked times (+2, -10 min).
- Couple notifications didn't go off at all
- The one that did go off didn't provide a push notification.
---
On top of that, only usable without search mode. In search mode, it was totally confused and gave me a Forbes article.
Seems half baked to me.
Doing scheduled research behind the scenes or sending a push notification to my phone would be cool, but surprised they thought this was OK for a public beta.
You'd think Open AI's dev velocity and quality would be off the charts since they live and breathe "AI." If a company building ChatGPT itself often delivers buggy features then it doesn't bode well for this whole 'AI will eat the world' notion.
Well none of the labs have good frontend or mobile engineers or even infra engineers
Anthropic is ahead in this because they keep their UIs simplistic so the failure modes are also simple (bad connection)
OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).
Find it hilarious and sad that o1-pro just times out thinking on very long or image-intense chats. Need to reload page multiple times after it fails to reply and maybe answer will appear (or not? Or in 5 minutes?). Kinda shows they’re not testing enough and “not eating their own food” and feels like chatgpt 3.5 ui before the redesign
> Anthropic is ahead in this because they keep their UIs simplistic ... OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).
What's funny is that OpenAI's Canvas was their attempt to copy Anthropic's Artifacts! So it's not like Anthropic is stagnant and OpenAI is at least shipping, Anthropic is shipping and OpenAI can't even copy them right.
It's a good point, Anthropic is being VERY choosy and winds up knocking it out of the park with stuff like Artifacts. Meanwhile their MacOS app is junk, but obviously not a priority.
> because they keep their UIs simplistic
How do I edit a sent message in the Claude Android app? It's so simplistic I can't find it.
You can’t edit on iOS either
So far, I've found AI to be a great force multiplier in green field, small projects. In a huge corporate codebase, it has the power of advanced refactoring (which doesn't touch more than a handful files at a time) and a CSS wizard.
According to all the magazines I've been reading, all that is required is to just prompt it with "please fix all of these issues" and give it a bulleted list with a single sentence describing each issue. I mean, it's AI powered and therefore much better than overpaid prima-donna engineers, so obviously it should "just work" and all the problems will get fixed. I'm sure most of the bugs were the result of humans meddling in the AI's brilliant output.
Right now, in fact, my understanding is OpenAI is using their current LLM's to write the next generation ones which will far surpass anything a developer can currently do. Obviously we'll need to keep management around to tell these things what to do, but the days of being a paid software engineer are numbered.
I think you forgot the /s (sarcasm) in your post!
When I have it do a search I have to tell it to just get all the info it can in the search but wait for the next request. The I explicitly tell it we’re done searching and to treat the next prompt as a new request but using the new info it found.
That’s the only way I get it to have a halfway decent brain after a web search. Something about that mode makes it more like a PR drone version of whatever I asked it to search, repeating things verbatim even when I ask for more specifics in follow-up.
Can you give an example prompt for this approach?
i posted the system prompt part describing the function call; if you read it and adjust your prompt for creating the task it works way better.
[deleted]
I'd rather have buggy things now than perfect things in a year.
Doesn't need to be perfect- but using this would actively reduce productivity
First impressions matter, if the experience is this bad you're probably waiting a year to come back anyway.
Worked out great for Sonos when their timers and alarms didn’t work.
Found the PM
DateTime stuff is generally super annoying to debug. Can't fault them too badly. Adding a scheduler is a key enabling idea for a ton of use cases
> Can't fault them too badly
The same company that touts their super hyper advanced AI tool that can do everyone's (except the C-level's, apparently) jobs to the world can't figure out how to make a functional cron job happen? And we're giving them a pass, despite the bajillions of dollars that M$ and VC is funneling their way?
Quite interesting they wouldn't just throw the "proven to be AGI cause it passes some IQ tests sometimes" tooling at it and be done with it.
it would explain the bugs if they used the AI to make the datetime implementation though
Agreed on date/time being a frustrating area of software development.
But wouldn't a company like OpenAI use a tick-based system in this architecture? i.e. there's an event emitter that ticks every second (or maybe minute), and consumers that operate based on these events in realtime? Obviously things get complicated due to the time consumed by inference models, but if OpenAI knows the task upfront it could make an allowance for the inference time?
If the logic is event driven and deterministic, it's easy to test and debug, right?
The original cron was programmed this way, but it has to examine every job every tick to check if it should run, which doesn't scale well. Instead, you predict when the next run for a job will be and insert that into an indexed schedule. Then each tick it checks the front of the schedule in ascending order of timestamps until the remaining jobs are in the future.
This is also a bad case in terms of queueing theory. Looking at Kingmans equation, the arrival variance is very high (a ton of jobs will run at 00:00 and much fewer at 00:01), and the service time also has pretty high variance. That combo will either require high queue delay variance, low utilization (i.e. over-provosioning), or a sophisticated auto-scaler that aggressively starts and stops instances to anticipate the schedule. Most of the time it's ok to let jobs queue since most use cases don't care if a daily or weekly job is 5 minutes late.
Yeah, they're not exactly a scrappy startup- I'd be surprised if they had 0 QA.
Makes me wonder if they internally have "press releases / Q" as an internal metric to keep up the hype.
Maybe that's the Q* we've been hearing rumors about
Amazon had an insane number of people working on just the alarms feature in Alexa when they interviewed me for a position years ago. They had entire teams devoted to the tiniest edge case within the realm of scheduling things with Alexa. This is no doubt one of the biggest use cases in computing: getting your computer to tell you what to do at a given time.
Recurring schedules across time zones is an unbelievably maddening thing to implement. At first glance it seems simple, but it gets very weird very quickly.
this.
some people cant even wrap gheir heads around it, taking hours and hours of discussions. still my favourite problem though.
Yeah summer time in different countries switching on different days and often in a different direction (other hemisphere). I used to work on such matters and those weeks were the toughest.
Developers when they first start working with time across timezones: "This is a technical problem."
Developers after more research: "Oh... this is a political problem."
[deleted]
Considering my iPhone alarm still sometimes fails to go off (it just shows the alarm screen silently), I'd be inclined to believe you.
Thanks for that— I though I was going crazy (well still could be I guess) or had some strange habit or gesture I didn’t realize was silencing the alarm somehow.
Whenever I have to wake for something that I absolutely can’t miss, I set 2-3 extra reminders 5 minutes apart precisely because of this “silent alarm” bug. It’s only happened to me a couple of times but twice was enough to completely destroy my trust in the alarm. The first time I thought I just did something in my sleep to cause it, but the UI shows it as if the alarm worked. I’m lucky to have the privilege that if I oversleep an hour or so it’s no big deal, otherwise ye olde tabletop alarm clock would be back.
I love the questioning my sanity before I've completely opened my eyes part. It's like a jump start to my day.
Hah - I also just assumed that I was turning the alarm off in my sleep without noticing. I started doubting it and really wish there was a log of when you tapped snooze or stopped the alarm...
This is too much of a dev feature for apple to implement and there are probably third party apps that do this, but meh
Open AI just needs to create & release their own phone with Microsoft's help! H.E.R. the movie phone.
Apple has not innovated in years and a GPT Phone where your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you would give Apple a run for it's money! Pick up your phone & see your agent waiting to assist & it could be skinned to look like a deceased loved one (mom still guiding your through life).
To get things done it would interface with other AI Agents of businesses, companies, your doctor, friends & family to schedule things & used as a knowledgebase.
Maybe this is their step towards creating said agents?
> your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you
I just… don’t want this. I don’t think anyone I know wants this.
Cool, thanks for the comment!
I use chatGPT now for almost everything and when in the car have full back n forth conversations to get things done (or as a knowledge-base) there too. Recently i was discussing with it how do i properly get rid (junk) my old car in Pennsylvania. It provided all the steps and gave me local businesses. Though it didn't call them or interface with them to find their available times/costs, tell me such details & have me instruct it to schedule my preferred choice. Which i wish it did and prompted thoughts how it could do so, as technology that gets adopted mostly is tech that has simplified our lives.
I think my concept above is similar to what was seen in the movie H.E.R. (Joaquin Phoenix & Scarlett Johansson starred) so it's not that crazy or odd. Throwing in skinning it to be whoever like a deceased loved one, might to probably is.
And gmail schedule delivery just won't work if you want to email yourself a month later.
I'm sure it's brilliant, but I have no idea what it's capable of. What will it do? Send me a push notification? Have an answer waiting for me when I come back to it in a while?
I switched over to the "GPT4o with scheduled tasks" model and there were no UI hints as to how I might use the feature. So I asked it "what you can you follow up later on and how?"
It replied "Could you clarify what specifically you’d like me to follow up on later?"
This is a truly awful way to launch a new product.
After asking it to schedule something, it prompted me to allow or block notifications, so sounds like this is just chatGPT scheduling push notifications? We'll see!
So basically canibalizing Siri ?
Siri has access to a wealth of private existing and future on-device APIs to fuel context sensitive responses to queries on vendor locked devices used all day long. (Which Apple has apparently decided to just not use yet.)
OpenAI doesn't, they just have a ton of funding and (up to recently) a good mass media story, and the best natural language responses.
The moat around Siri is much deeper, and I don't really see any evidence OpenAI has any special sauce that can't be reproduced by others.
My prediction is that OpenAI's reliance on AI doomerism to generate a regulatory moat falters as they become unable to produce step changes in new models, while Apple's efforts despite being halting and incomplete become ubiquitous thanks to market share and access to on device context.
I wouldn't (and don't) put my money in OpenAI anymore. I don't see a future for them beyond being the first mover in an "LLM as a service" space in which they have no moat. On top of that they've managed to absorb the worst of criticism as a sort of LLM lightning rod. Worst of all, it may turn out that off-device isn't even really necessary for most consumer applications in which case they'll start to have to make most of their money on corporate contracts.
Maybe something will change, but right now OpenAI is looking like a bubble company with no guarantee to its dominant position. Because it is what it is: simply the largest pooling of money to try to corner this market. What else do they have?
I think there is an argument that currently Google Gemini is best place to tie everything together. Assuming Google executes on it well.
Most people use Gmail, Docs, Google Maps, Google Calendar above Apples alternatives. Gemini could really tie them up well.
The counter argument is that Google doesn't maintain any of those services beyond the bare minimum for customer facing interactions, and exchanges between their services are even more poorly supported if they even exist at all.
Remember Google Sheets (already the Tonka Toys of spreadsheets) adding named tables to Sheets?
You can't use them in any of the AppsScript APIs. You have to fall back to manual searching for strings and index arithmetic.
Google Drive still barely supports anything like moving an entire folder to another folder.
They have failed at least a half dozen times now to deliver a functional chat/VOIP app after they already had one in Google Talk.
They regularly sunset products that actually have devoted and zealous user bases for indiscernible reasons.
Android is just chugging along doing nothing interesting and still carrying the same baggage it did before. It's a painful platform to develop for and the Jetpack Compose/Kotlin shift hasn't ameliorated much of that at all.
Their search offering is now worse than Bing, worse than Kagi, and worse than some of the LLM based tools that use their index. It's increasingly common that you can't even find a single link that you know an entire verbatim sentence from via Google search for inexplicable reasons. Exact keyword or phrase searches no longer work. You can't require a keyword in results.
I don't trust Google to deliver a single functional software product at this point, let alone a compelling integration of many different ones developed in different siloes.
About the only thing going for them is how many people still have Gmail accounts from that initial invite only and generous limits campaign... 20 years ago?
Google is not a healthy company. I don't invest in them anymore, and barring some major change I probably won't again. It's a dying blue chip which is a terrible position to have your money in.
P.S. oh, and Gemini is awful by comparison in both price and quality to competitors. It isn't saving them. It's just a "me too".
P.P.S. I'm personally just waiting for their next "game changing" announcement bound to fail to get in at the top floor on shorting what stock I have. It's one of those cases where finance has rose coloured glasses based on brand name that anyone who's used Google products for years would be thoroughly disabused of.
Gemini 2.0 is not bad in quality, and great in terms of speed.
There are so many opportunities for google to improve their services.
For example, I found myself asking Claude about places to see in a city I’m visiting while switching back and forth to gmaps. This would have been a much better experience integrated directly with gmaps knowledge graph
Yep, this is a truly bad feature launch. I have no clue what this model does. Did they somehow lose their competent product people?
It could get really interesting if they allow webhooks and structured output
Ah, I've just stumbled on some hints after clicking around.. click on your avatar image (top right) and then click "Tasks"
Then there are some UI hints.
"Remind me of your mom's birthday on [X] date"
Wow, really maximising that $10bn GPU investment!
Glad to see that the thriving 2010 market of TODO list apps will see a resurgence in the AI era.
A todo app that you can write and modify by editing a natural language prompt, and that can parse inputs from the whole web with flexibility and nuance, is not a small thing.
That also seems to not get timezones right, has a confusing search function...?
More seriously, todo apps are about productivity, not just about becoming a huge bucket of tasks. I've always found that the productivity comes from getting context out of my head and scheduled for the right time. This release appears to be more about that big bag of tasks and less about productivity. I'm all for AI in products, I think it can be powerful, but I've not had a use-case for it in my todo app.
> a todo app that you can write and modify by editing a natural language prompt
no.
"a todo app that you can interact with by writing natural language input?"
okay.
> nuance
really?!
I've got about six apps written by Claude from prompts, all quite simple but useful. If you don't believe it I get it, because I didn't either until I tried it.
As for nuance, I've seen an astounding amount of divergent context incorporated into LLM responses. Not always, but far more than I've ever been able to encode into a parsing script, which is exactly nothing not explicitly programmed.
Maybe it's effective at hitting a goal which you do not see.
Pretty useless so far. I'm not sure what the intended application of this is so far, but I wanted it to schedule some work for me.
It only scheduled the first thing and that was after having to be specific by saying "7:30pm-11pm". I wanted to say "from now to 11pm" but it did couldn't process "now"
If you find a tool useless then it's likely that you lack imagination.
Okay, let's say I do lack imagination: please enlighten me after you've had a chance to actually use this half-baked feature.
Founder of Cronitor.io here — if you’re a developer considering using this, would it be valuable for you to be able to report in to Cronitor when it runs so we can keep an eye and alert you if your tasks are late, skipped or accidentally deleted?
We support just about every other job platform but I’d love to hear from potential users before I hack something together.
The UI is different in the desktop app for macOS. The ability to edit the schedule task is only available in the web UI for me.
I got the best results by not enabling Search the Web when I was trying to create tasks. It confuses the model. But scheduled tasks can successfully search the web.
It's flaky, but looks promising!
Less relevant but why isn't canvas available in the desktop app? I thought they had feature parity but it seems not.
Lots of complaints mentioned here. If you have a legitimate need for a product like Tasks that is more fully baked, I’d encourage you to check out lindy.ai (no affiliation). I’ve been using it to send periodic email updates on specific topics and it works flawlessly.
Whoops! Might have been built wrong? I'm seeing a source map error:
Source map error: Error: request failed with status 404
Stack in the worker:networkRequest@resource://devtools/client/shared/source-map-loader/utils/network-request.js:43:9
Also I am getting`Unable to display this message due to an error.`a lot.
So I opened "gpt4o with scheduled tasks" in the mobile app and there was no hint in the UI how to use it. I asked, "what's a scheduled task" and it answered with a generic response about scheduled tasks in general. Then I tried my luck and said, "remind me to pet my cat in 5 minutes," and it seemed to work. I then closed the mobile app, but no push notification came after 5 minutes, however I got an email, which I didn't expect (I expected push notifications). Clearly the feature needs more polish.
It seems there are some issues with the rollout.
Me:
> Give me positive feedback every hour
ChatGPT:
> Provide positive feedback
> Next run Jan 15, 2025
> Got it! I’ll send you positive feedback every hour.
An hour later, I received the following email:
```
Your scheduled task couldn't be completed
ChatGPT tried to complete Provide positive feedback multiple times, but it encountered an error and wasn't able to send. It will try again the next time this task is scheduled.
Open chat
If you have any questions, please contact through the help center.
All the best,
ChatGPT
```
This is interesting, although I am a little confused about the purpose of ChatGPT with this feature.
We already have many implementations where at a cron interval one could call the GPT APIs for stuff. And its nice to monitor it and see how things are working etc.
So I am curious whats the use case to embed a schedule inside the ChatGPT infrastructure. Seems like a little off its true purpose?
I think we all agree this feature seems broadly of use, and given that presumably a professional full-time 1P team was behind this feature I am gonna use this product over the other implementations
It's for normies.
There is an editable tasks list and in the settings menu you can choose to receive notifications via push and/or email.
[deleted]
It's a tech demo to get normies used to the idea of agents. HackerNews "20 years in industry" guys are flabbergasted because it defaults to UTC so is therefore totally useless, clearly. Perhaps you live in a bubble?
This seems like such a strange product decision - why clutter the interface with such a niche use case? I’m trying to imagine OpenAI’s reasoning - a new angle on long term memory maybe? Or a potential interface for their agents?
It's to warm normal people up to the fact that we have agents now.
how does it handle timezones?
i saw no mention of them on the help article, or the ui
if i ask for a daily early morning news summary will it show up in the middle of the night or around lunch time?
will it get updated when i travel?
seems interesting if what you're looking for is a reminder that is not time relevant, just a thing that should happen at some point with a time precision of about 1 day.
Many existing apps (like Todoist) have already had LLM integrations for a while now, and have more features like calendars and syncing.
Or do I completely not understand what this product is trying to be?
Why not? I already pay for chatgpt but I don't pay for todoist so that doesn't help me.
[deleted]
The link doesn't work, presumably because I won't pay OpenAI which stole my API credits by making them have an "expiration date".
This is shaping up to be as bad as the Sora release.
For those unable to find this, you can find it as a new model in the model drop-down menu.
These are best understood as scheduled tasks for the AI instead of tasks for the user.
The biggest outcome here is that now the app has memory.
why are they trying to be a model provider as well as service provider
Why wouldn’t they? Most big tech cos offer products at multiple layers of the stack.
Couldn't you do the same with giving an LLM access to your shell and a cron command?
Would you give an LLM that priviledge?
lol. I was a going to build this. Even purchased the domain alert.now
I even have news based version active implementation at alarms.global if you install it to your phone as PWA you get push notifications when something important happens in your region or can notify you before public holidays
I even have an automated x account @alarmsglobal
Imagine being an engineer on the Siri team, must be so demoralizing.
A glorified reminder? Really?
Sorry, I simply cannot use OpenAI because it's leadership is kissing the ring of Trump.
Friend, I've got some news about the leadership of the majority of tech services you will use over the next 4-8 years...
This is going to eat software, and is the beginning of agents. The orchestrator of these tasks will come, and OpenAI will turn into a general purpose compute system, the endgame of workflow software. Soon there will be a database, and your prompts will be able to directly read and write to an openai hosted postgres instance. And your CRUD app will begin to disappear. Programming will feel pointless
Possibly, but that's going to require 100% consistent, accurate outputs (tricky as that's not the nature of LLMs).
Otherwise, you'll have a lot of systems dependent on these orchestrators creating hard-to-debug mistakes up and down the pipeline. With software, you can reach a state where it does what you tell it to without having to worry if some model adjustment or API change is going to break the output.
If they solve that, then yes. Otherwise, what I personally expect is a lot of businesses rushing into implementing "agents" only to backpedal later when they start to have negative material effects on bottom lines.
Its inevitable. You can argue about what's possible right now, but I'm not looking at it from that angle. I think these issues will be solved with time
They are using infinity compute and can’t do simple notifications. How will changing the architecture slightly or ingesting more data change that?
That belief is at odds with the mechanics of how LLMs work. It's not a question of more effort/investment/compute/whatever, it's just a reality of how the underlying systems work (non-deterministic). If you can find a way to make the context window on the scale of the human brain, you may be able to mostly mitigate this.
People want us to be at "Her" levels of AI, but we're at a far earlier stage. We can fake certain aspects of that (using TTS), but blindly trusting an AI to run everything is going to be a big mistake in the short-term. And in order for the inevitability of what you describe to take place, the predecessor(s) to that have to work in a way that doesn't scare people and businesses away.
The plowing of money and hype into the current forms of AI (not to mention the gaslighting about their ability) makes me think the real inevitability is a meltdown in the next 5-10 years which leads to AI-hesitancy on a mass scale.
Have you tried o1 pro? I find people that are making these assertions are not deeply using the models on a daily basis. With each new release, I can see the increase of capability, and can do things. I have written software in the last year that is at a level of complexity beyond my skill set. I have 15 years of SWE experience, most at FAANG. You just arent close enough to the metal to see what's coming. It's not about what we have now, its about scaling and a reliable march of model improvements. The code has been cracked, given sufficient data, anything can be learned. Neural networks are generalized learners
Yes, I use LLMs every day. Primarily for coding (a mix of Claude and OAI). I was trying to implement a simple CSS optimization step to my JS framework's build system last night and both kept hallucinating to the point (literally inventing non-existent APIs and config patterns) where I gave up and just did it by hand w/ Google and browsing docs.
The problem with your "close to the metal" assertion is that this has been parroted about every iteration of LLMs thus far. They've certainly gotten better (impressively so), but again, it doesn't matter. By their very nature (whether today or ten years from now), they're a big risk at the business level which is ultimately where the rubber has to hit the road.
Yeah I don't think we're going to come closer to a real AGI until we manage to make a model that can actually understand and think. An LLM sounds smart but it just picks the most likely response from the echoes of a billion human voices. I'm sure we'll get there but not with this tech. I'm pretty sure even be OpenAI said this with their 5 steps to AGI, LLMs were only step 1. And probably the part that will do the talking in the final AI but not the thinking.
At the moment people are so wooed by the confidence of current LLMs that they forget that there's all sorts of types of AI models. I think the key is going to be to have them work together, each doing the part they're good at.
> An LLM sounds smart but it just picks the most likely response from the echoes of a billion human voices.
This is where reasoning models come in. Train models on many logical statements then give them enough time to produce a chain of thoughts that’s indistinguishable from “understanding” and “thinking”.
I’m not sure why this leap is so hard for some people to make.
I personally don't think that will go very far. It's just a way of extracting a little bit more out of a technology that's the wrong one for the purpose.
We just are not on your level of genius to understand these things.
So obviously completely full of shit.
Broadly I agree with your position, but:
> If you can find a way to make the context window on the scale of the human brain, you may be able to mostly mitigate this.
Human brains have a much smaller context window than AI do. We can't pay attention to the last 128,000 concepts that filtered past our sensory systems — our conscious considerations are for about seven things.
There's a lot of stuff that we don't yet understand well enough to reproduce with AI, but context length is the wrong criticism for these models.
> context length is the wrong criticism for these models
You're right. What I'm getting at is the overall speed, efficiency, and accuracy of the storage, retrieval, and processing capability of the human brain.
It's kinda crazy that it can run on a few slices of bread when LLMs need kilowatts of power to write a simple paragraph :)
Why? Past progress =/= equal rate of future progress.
Sure but do they have a moat here? Anyone that can connect to an LLM could make that app.
Yes, they have the name "ChatGPT". For non-technical people this appears to be the most important thing.
Is it a household name? Anecdotally, only two of my five millennial/gen-z siblings use an AI app at all, and one of them calls her's "Gary" instead of ChatGPT. I'd be interested in seeing some actual data showing how much ChatGPT is an actual household name versus one that us technical people assume is a household name due to its ubiquity in our space.
> Is it a household name?
I think it is, yes.
It was interviewed under that name on one of the UK's main news broadcasts almost immediately after it came out. Few hundred million users. Anecdotes about teachers whose students use it to cheat.
But who knows. I was surprising people about the existence of Wikipedia as late as 2004, and Google Translate's augmented reality mode some time around the start of the pandemic.
Does AWS have a moat on cloud computing?
Yes, it would take 10s of billions of dollars to recreate the infrastructure as far as servers and AWS has its own pipelines running under the oceans.
Then you have to recreate all of the services on top of the AWS.
Then you have to deal with regulations and certifications.
Then you have to convince decision makers to go against their own interests. “No one ever got fired for Amazon”.
Then you have to convince corporations to spend money to migrate.
Yes that requires huge infrastructure investments. Creating an LLM requires huge investments. Running an LLM requires medium to big investments but using one remotely require very little investment.
This significantly overestimates the reliability of LLMs -- both their output integrity and their ability to understand context.
Bit of advice: you might want to actually use an offering before claiming it is revolutionary.
I've got 15 years of engineering experience, worked on some of the largest distributed systems at FAANG. Its coming
> worked on some of the largest distributed systems at FAANG.
As have 10s of thousands of other people who could invert a btree on the whiteboard….
Oh wow good for you! Didn't realize you were a prodigy or that this was a contest. I take it all back. /s
Maybe try some humility. You're not helping yourself with the bragging about frankly underwhelming and common (here) experience.
This is the prompt describing the function call parameters:
When calling the automation, you need to provide three main parameters: 1. Title (title): A brief descriptive name for the automation. This helps identify it at a glance. For example, "Check for recent news headlines". 2. Prompt (prompt): The detailed instruction or request you want the automation to follow. For example: "Search for the top 10 headlines from multiple sources, ensuring they are published within the last 48 hours, and provide a summary of any recent Russian military strikes in the Lviv Oblast." 3. Schedule (schedule): This uses the iCalendar (iCal) VEVENT format to specify when the automation should run. For example, if you want it to run every day at 8:30 AM, you might provide:
BEGIN:VEVENT RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT
Optionally, you can also include: • DTSTART (start time): If you have a specific starting point, you can include it. For example:
BEGIN:VEVENT DTSTART:20250115T083000 RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT
In summary, the call typically includes: • title (string): A short name. • prompt (string): What you want the automation to do. • schedule (string): The iCal VEVENT defining when it should run.
What is the source for that claim?
https://gist.github.com/thenameless7741/a1957c2898d80ce99ebd...
That's different text.
The beta is inconsistently showing (required a few refreshes to get something to show up), but my limited usage of it showed a plethora of issues:
- Assumed UTC instead of EST. Corrected it and it still continued to bork
- Added random time deltas to my asked times (+2, -10 min).
- Couple notifications didn't go off at all
- The one that did go off didn't provide a push notification.
---
On top of that, only usable without search mode. In search mode, it was totally confused and gave me a Forbes article.
Seems half baked to me.
Doing scheduled research behind the scenes or sending a push notification to my phone would be cool, but surprised they thought this was OK for a public beta.
You'd think Open AI's dev velocity and quality would be off the charts since they live and breathe "AI." If a company building ChatGPT itself often delivers buggy features then it doesn't bode well for this whole 'AI will eat the world' notion.
Well none of the labs have good frontend or mobile engineers or even infra engineers
Anthropic is ahead in this because they keep their UIs simplistic so the failure modes are also simple (bad connection)
OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).
Find it hilarious and sad that o1-pro just times out thinking on very long or image-intense chats. Need to reload page multiple times after it fails to reply and maybe answer will appear (or not? Or in 5 minutes?). Kinda shows they’re not testing enough and “not eating their own food” and feels like chatgpt 3.5 ui before the redesign
> Anthropic is ahead in this because they keep their UIs simplistic ... OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).
What's funny is that OpenAI's Canvas was their attempt to copy Anthropic's Artifacts! So it's not like Anthropic is stagnant and OpenAI is at least shipping, Anthropic is shipping and OpenAI can't even copy them right.
It's a good point, Anthropic is being VERY choosy and winds up knocking it out of the park with stuff like Artifacts. Meanwhile their MacOS app is junk, but obviously not a priority.
> because they keep their UIs simplistic
How do I edit a sent message in the Claude Android app? It's so simplistic I can't find it.
You can’t edit on iOS either
So far, I've found AI to be a great force multiplier in green field, small projects. In a huge corporate codebase, it has the power of advanced refactoring (which doesn't touch more than a handful files at a time) and a CSS wizard.
According to all the magazines I've been reading, all that is required is to just prompt it with "please fix all of these issues" and give it a bulleted list with a single sentence describing each issue. I mean, it's AI powered and therefore much better than overpaid prima-donna engineers, so obviously it should "just work" and all the problems will get fixed. I'm sure most of the bugs were the result of humans meddling in the AI's brilliant output.
Right now, in fact, my understanding is OpenAI is using their current LLM's to write the next generation ones which will far surpass anything a developer can currently do. Obviously we'll need to keep management around to tell these things what to do, but the days of being a paid software engineer are numbered.
I think you forgot the /s (sarcasm) in your post!
When I have it do a search I have to tell it to just get all the info it can in the search but wait for the next request. The I explicitly tell it we’re done searching and to treat the next prompt as a new request but using the new info it found.
That’s the only way I get it to have a halfway decent brain after a web search. Something about that mode makes it more like a PR drone version of whatever I asked it to search, repeating things verbatim even when I ask for more specifics in follow-up.
Can you give an example prompt for this approach?
i posted the system prompt part describing the function call; if you read it and adjust your prompt for creating the task it works way better.
I'd rather have buggy things now than perfect things in a year.
Doesn't need to be perfect- but using this would actively reduce productivity
First impressions matter, if the experience is this bad you're probably waiting a year to come back anyway.
Worked out great for Sonos when their timers and alarms didn’t work.
Found the PM
DateTime stuff is generally super annoying to debug. Can't fault them too badly. Adding a scheduler is a key enabling idea for a ton of use cases
> Can't fault them too badly
The same company that touts their super hyper advanced AI tool that can do everyone's (except the C-level's, apparently) jobs to the world can't figure out how to make a functional cron job happen? And we're giving them a pass, despite the bajillions of dollars that M$ and VC is funneling their way?
Quite interesting they wouldn't just throw the "proven to be AGI cause it passes some IQ tests sometimes" tooling at it and be done with it.
it would explain the bugs if they used the AI to make the datetime implementation though
Agreed on date/time being a frustrating area of software development.
But wouldn't a company like OpenAI use a tick-based system in this architecture? i.e. there's an event emitter that ticks every second (or maybe minute), and consumers that operate based on these events in realtime? Obviously things get complicated due to the time consumed by inference models, but if OpenAI knows the task upfront it could make an allowance for the inference time?
If the logic is event driven and deterministic, it's easy to test and debug, right?
The original cron was programmed this way, but it has to examine every job every tick to check if it should run, which doesn't scale well. Instead, you predict when the next run for a job will be and insert that into an indexed schedule. Then each tick it checks the front of the schedule in ascending order of timestamps until the remaining jobs are in the future.
This is also a bad case in terms of queueing theory. Looking at Kingmans equation, the arrival variance is very high (a ton of jobs will run at 00:00 and much fewer at 00:01), and the service time also has pretty high variance. That combo will either require high queue delay variance, low utilization (i.e. over-provosioning), or a sophisticated auto-scaler that aggressively starts and stops instances to anticipate the schedule. Most of the time it's ok to let jobs queue since most use cases don't care if a daily or weekly job is 5 minutes late.
Yeah, they're not exactly a scrappy startup- I'd be surprised if they had 0 QA.
Makes me wonder if they internally have "press releases / Q" as an internal metric to keep up the hype.
Maybe that's the Q* we've been hearing rumors about
Amazon had an insane number of people working on just the alarms feature in Alexa when they interviewed me for a position years ago. They had entire teams devoted to the tiniest edge case within the realm of scheduling things with Alexa. This is no doubt one of the biggest use cases in computing: getting your computer to tell you what to do at a given time.
Recurring schedules across time zones is an unbelievably maddening thing to implement. At first glance it seems simple, but it gets very weird very quickly.
this.
some people cant even wrap gheir heads around it, taking hours and hours of discussions. still my favourite problem though.
Yeah summer time in different countries switching on different days and often in a different direction (other hemisphere). I used to work on such matters and those weeks were the toughest.
Developers when they first start working with time across timezones: "This is a technical problem."
Developers after more research: "Oh... this is a political problem."
Considering my iPhone alarm still sometimes fails to go off (it just shows the alarm screen silently), I'd be inclined to believe you.
Thanks for that— I though I was going crazy (well still could be I guess) or had some strange habit or gesture I didn’t realize was silencing the alarm somehow.
https://www.theverge.com/2025/1/9/24340238/apple-iphone-alar...
Whenever I have to wake for something that I absolutely can’t miss, I set 2-3 extra reminders 5 minutes apart precisely because of this “silent alarm” bug. It’s only happened to me a couple of times but twice was enough to completely destroy my trust in the alarm. The first time I thought I just did something in my sleep to cause it, but the UI shows it as if the alarm worked. I’m lucky to have the privilege that if I oversleep an hour or so it’s no big deal, otherwise ye olde tabletop alarm clock would be back.
I love the questioning my sanity before I've completely opened my eyes part. It's like a jump start to my day.
Hah - I also just assumed that I was turning the alarm off in my sleep without noticing. I started doubting it and really wish there was a log of when you tapped snooze or stopped the alarm...
This is too much of a dev feature for apple to implement and there are probably third party apps that do this, but meh
Open AI just needs to create & release their own phone with Microsoft's help! H.E.R. the movie phone.
Apple has not innovated in years and a GPT Phone where your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you would give Apple a run for it's money! Pick up your phone & see your agent waiting to assist & it could be skinned to look like a deceased loved one (mom still guiding your through life).
To get things done it would interface with other AI Agents of businesses, companies, your doctor, friends & family to schedule things & used as a knowledgebase.
Maybe this is their step towards creating said agents?
> your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you
I just… don’t want this. I don’t think anyone I know wants this.
Cool, thanks for the comment!
I use chatGPT now for almost everything and when in the car have full back n forth conversations to get things done (or as a knowledge-base) there too. Recently i was discussing with it how do i properly get rid (junk) my old car in Pennsylvania. It provided all the steps and gave me local businesses. Though it didn't call them or interface with them to find their available times/costs, tell me such details & have me instruct it to schedule my preferred choice. Which i wish it did and prompted thoughts how it could do so, as technology that gets adopted mostly is tech that has simplified our lives.
I think my concept above is similar to what was seen in the movie H.E.R. (Joaquin Phoenix & Scarlett Johansson starred) so it's not that crazy or odd. Throwing in skinning it to be whoever like a deceased loved one, might to probably is.
And gmail schedule delivery just won't work if you want to email yourself a month later.
I'm sure it's brilliant, but I have no idea what it's capable of. What will it do? Send me a push notification? Have an answer waiting for me when I come back to it in a while?
I switched over to the "GPT4o with scheduled tasks" model and there were no UI hints as to how I might use the feature. So I asked it "what you can you follow up later on and how?"
It replied "Could you clarify what specifically you’d like me to follow up on later?"
This is a truly awful way to launch a new product.
After asking it to schedule something, it prompted me to allow or block notifications, so sounds like this is just chatGPT scheduling push notifications? We'll see!
So basically canibalizing Siri ?
Siri has access to a wealth of private existing and future on-device APIs to fuel context sensitive responses to queries on vendor locked devices used all day long. (Which Apple has apparently decided to just not use yet.)
OpenAI doesn't, they just have a ton of funding and (up to recently) a good mass media story, and the best natural language responses.
The moat around Siri is much deeper, and I don't really see any evidence OpenAI has any special sauce that can't be reproduced by others.
My prediction is that OpenAI's reliance on AI doomerism to generate a regulatory moat falters as they become unable to produce step changes in new models, while Apple's efforts despite being halting and incomplete become ubiquitous thanks to market share and access to on device context.
I wouldn't (and don't) put my money in OpenAI anymore. I don't see a future for them beyond being the first mover in an "LLM as a service" space in which they have no moat. On top of that they've managed to absorb the worst of criticism as a sort of LLM lightning rod. Worst of all, it may turn out that off-device isn't even really necessary for most consumer applications in which case they'll start to have to make most of their money on corporate contracts.
Maybe something will change, but right now OpenAI is looking like a bubble company with no guarantee to its dominant position. Because it is what it is: simply the largest pooling of money to try to corner this market. What else do they have?
I think there is an argument that currently Google Gemini is best place to tie everything together. Assuming Google executes on it well.
Most people use Gmail, Docs, Google Maps, Google Calendar above Apples alternatives. Gemini could really tie them up well.
The counter argument is that Google doesn't maintain any of those services beyond the bare minimum for customer facing interactions, and exchanges between their services are even more poorly supported if they even exist at all.
Remember Google Sheets (already the Tonka Toys of spreadsheets) adding named tables to Sheets?
You can't use them in any of the AppsScript APIs. You have to fall back to manual searching for strings and index arithmetic.
Google Drive still barely supports anything like moving an entire folder to another folder.
They have failed at least a half dozen times now to deliver a functional chat/VOIP app after they already had one in Google Talk.
They regularly sunset products that actually have devoted and zealous user bases for indiscernible reasons.
Android is just chugging along doing nothing interesting and still carrying the same baggage it did before. It's a painful platform to develop for and the Jetpack Compose/Kotlin shift hasn't ameliorated much of that at all.
Their search offering is now worse than Bing, worse than Kagi, and worse than some of the LLM based tools that use their index. It's increasingly common that you can't even find a single link that you know an entire verbatim sentence from via Google search for inexplicable reasons. Exact keyword or phrase searches no longer work. You can't require a keyword in results.
I don't trust Google to deliver a single functional software product at this point, let alone a compelling integration of many different ones developed in different siloes.
About the only thing going for them is how many people still have Gmail accounts from that initial invite only and generous limits campaign... 20 years ago?
Google is not a healthy company. I don't invest in them anymore, and barring some major change I probably won't again. It's a dying blue chip which is a terrible position to have your money in.
P.S. oh, and Gemini is awful by comparison in both price and quality to competitors. It isn't saving them. It's just a "me too".
P.P.S. I'm personally just waiting for their next "game changing" announcement bound to fail to get in at the top floor on shorting what stock I have. It's one of those cases where finance has rose coloured glasses based on brand name that anyone who's used Google products for years would be thoroughly disabused of.
Gemini 2.0 is not bad in quality, and great in terms of speed.
There are so many opportunities for google to improve their services.
For example, I found myself asking Claude about places to see in a city I’m visiting while switching back and forth to gmaps. This would have been a much better experience integrated directly with gmaps knowledge graph
Yep, this is a truly bad feature launch. I have no clue what this model does. Did they somehow lose their competent product people?
It could get really interesting if they allow webhooks and structured output
Ah, I've just stumbled on some hints after clicking around.. click on your avatar image (top right) and then click "Tasks"
Then there are some UI hints.
"Remind me of your mom's birthday on [X] date"
Wow, really maximising that $10bn GPU investment!
Glad to see that the thriving 2010 market of TODO list apps will see a resurgence in the AI era.
A todo app that you can write and modify by editing a natural language prompt, and that can parse inputs from the whole web with flexibility and nuance, is not a small thing.
That also seems to not get timezones right, has a confusing search function...?
More seriously, todo apps are about productivity, not just about becoming a huge bucket of tasks. I've always found that the productivity comes from getting context out of my head and scheduled for the right time. This release appears to be more about that big bag of tasks and less about productivity. I'm all for AI in products, I think it can be powerful, but I've not had a use-case for it in my todo app.
> a todo app that you can write and modify by editing a natural language prompt
no.
"a todo app that you can interact with by writing natural language input?"
okay.
> nuance
really?!
I've got about six apps written by Claude from prompts, all quite simple but useful. If you don't believe it I get it, because I didn't either until I tried it.
As for nuance, I've seen an astounding amount of divergent context incorporated into LLM responses. Not always, but far more than I've ever been able to encode into a parsing script, which is exactly nothing not explicitly programmed.
Maybe it's effective at hitting a goal which you do not see.
Where are the release notes?
Edit: I suppose they'll be here at some point: https://help.openai.com/en/articles/9624314-model-release-no...
These seem like extremely shitty release notes. I have no clue why anybody pays for this model.
You might want this? It's more technical than the one you linked to:
https://platform.openai.com/docs/changelog
Does this show "Invalid DateTime" only for me? Kinda ironic! https://i.imgur.com/ZAcwhxT.png
Not for me, this time; they're "December, 2024", "Dec 18", "Dec 17" respectively.
Recently someone shared a link to one of their chat sessions here, and it reliably 404'd for me but not others.
It has consistently been the best model for the two last years and only Gemini is perhaps slightly better now.
Right, but free models you run on your local computer are just as good for 99% of use cases and don't cost an arm and a leg.
The docs for the beta seem to already be up: https://help.openai.com/en/articles/10291617-scheduled-tasks...
Nothing yet
Pretty useless so far. I'm not sure what the intended application of this is so far, but I wanted it to schedule some work for me.
It only scheduled the first thing and that was after having to be specific by saying "7:30pm-11pm". I wanted to say "from now to 11pm" but it did couldn't process "now"
If you find a tool useless then it's likely that you lack imagination.
Okay, let's say I do lack imagination: please enlighten me after you've had a chance to actually use this half-baked feature.
https://www.theverge.com/2025/1/14/24343528/openai-chatgpt-r...
What am I supposed to see at the link?
You click the drop down menu for model selection and choose 4o with scheduled tasks
There is more information in these twitter threads:
https://x.com/karinanguyen_/status/1879270529066262733 https://x.com/OpenAI/status/1879267276291203329
Founder of Cronitor.io here — if you’re a developer considering using this, would it be valuable for you to be able to report in to Cronitor when it runs so we can keep an eye and alert you if your tasks are late, skipped or accidentally deleted?
We support just about every other job platform but I’d love to hear from potential users before I hack something together.
The UI is different in the desktop app for macOS. The ability to edit the schedule task is only available in the web UI for me.
I got the best results by not enabling Search the Web when I was trying to create tasks. It confuses the model. But scheduled tasks can successfully search the web.
It's flaky, but looks promising!
Less relevant but why isn't canvas available in the desktop app? I thought they had feature parity but it seems not.
Lots of complaints mentioned here. If you have a legitimate need for a product like Tasks that is more fully baked, I’d encourage you to check out lindy.ai (no affiliation). I’ve been using it to send periodic email updates on specific topics and it works flawlessly.
Whoops! Might have been built wrong? I'm seeing a source map error: Source map error: Error: request failed with status 404 Stack in the worker:networkRequest@resource://devtools/client/shared/source-map-loader/utils/network-request.js:43:9
Resource URL: https://cdn.oaistatic.com/assets/jbl0aowda306m4s1.js Source Map URL: jbl0aowda306m4s1.js.map
Also I am getting`Unable to display this message due to an error.`a lot.
So I opened "gpt4o with scheduled tasks" in the mobile app and there was no hint in the UI how to use it. I asked, "what's a scheduled task" and it answered with a generic response about scheduled tasks in general. Then I tried my luck and said, "remind me to pet my cat in 5 minutes," and it seemed to work. I then closed the mobile app, but no push notification came after 5 minutes, however I got an email, which I didn't expect (I expected push notifications). Clearly the feature needs more polish.
It seems there are some issues with the rollout.
Me:
> Give me positive feedback every hour
ChatGPT:
> Provide positive feedback
> Next run Jan 15, 2025
> Got it! I’ll send you positive feedback every hour.
An hour later, I received the following email:
```
Your scheduled task couldn't be completed
ChatGPT tried to complete Provide positive feedback multiple times, but it encountered an error and wasn't able to send. It will try again the next time this task is scheduled.
Open chat If you have any questions, please contact through the help center.
All the best, ChatGPT
```
This is interesting, although I am a little confused about the purpose of ChatGPT with this feature.
We already have many implementations where at a cron interval one could call the GPT APIs for stuff. And its nice to monitor it and see how things are working etc.
So I am curious whats the use case to embed a schedule inside the ChatGPT infrastructure. Seems like a little off its true purpose?
I think we all agree this feature seems broadly of use, and given that presumably a professional full-time 1P team was behind this feature I am gonna use this product over the other implementations
It's for normies.
There is an editable tasks list and in the settings menu you can choose to receive notifications via push and/or email.
It's a tech demo to get normies used to the idea of agents. HackerNews "20 years in industry" guys are flabbergasted because it defaults to UTC so is therefore totally useless, clearly. Perhaps you live in a bubble?
This seems like such a strange product decision - why clutter the interface with such a niche use case? I’m trying to imagine OpenAI’s reasoning - a new angle on long term memory maybe? Or a potential interface for their agents?
It's to warm normal people up to the fact that we have agents now.
how does it handle timezones?
i saw no mention of them on the help article, or the ui
if i ask for a daily early morning news summary will it show up in the middle of the night or around lunch time? will it get updated when i travel? seems interesting if what you're looking for is a reminder that is not time relevant, just a thing that should happen at some point with a time precision of about 1 day.
https://help.openai.com/en/articles/10291617-scheduled-tasks...
Does the world need another reminder/todo app?
Many existing apps (like Todoist) have already had LLM integrations for a while now, and have more features like calendars and syncing.
Or do I completely not understand what this product is trying to be?
Why not? I already pay for chatgpt but I don't pay for todoist so that doesn't help me.
The link doesn't work, presumably because I won't pay OpenAI which stole my API credits by making them have an "expiration date".
This is shaping up to be as bad as the Sora release.
For those unable to find this, you can find it as a new model in the model drop-down menu.
These are best understood as scheduled tasks for the AI instead of tasks for the user.
The biggest outcome here is that now the app has memory.
why are they trying to be a model provider as well as service provider
Why wouldn’t they? Most big tech cos offer products at multiple layers of the stack.
Couldn't you do the same with giving an LLM access to your shell and a cron command?
Would you give an LLM that priviledge?
lol. I was a going to build this. Even purchased the domain alert.now I even have news based version active implementation at alarms.global if you install it to your phone as PWA you get push notifications when something important happens in your region or can notify you before public holidays
I even have an automated x account @alarmsglobal
Imagine being an engineer on the Siri team, must be so demoralizing.
A glorified reminder? Really?
Sorry, I simply cannot use OpenAI because it's leadership is kissing the ring of Trump.
Friend, I've got some news about the leadership of the majority of tech services you will use over the next 4-8 years...
This is going to eat software, and is the beginning of agents. The orchestrator of these tasks will come, and OpenAI will turn into a general purpose compute system, the endgame of workflow software. Soon there will be a database, and your prompts will be able to directly read and write to an openai hosted postgres instance. And your CRUD app will begin to disappear. Programming will feel pointless
Possibly, but that's going to require 100% consistent, accurate outputs (tricky as that's not the nature of LLMs).
Otherwise, you'll have a lot of systems dependent on these orchestrators creating hard-to-debug mistakes up and down the pipeline. With software, you can reach a state where it does what you tell it to without having to worry if some model adjustment or API change is going to break the output.
If they solve that, then yes. Otherwise, what I personally expect is a lot of businesses rushing into implementing "agents" only to backpedal later when they start to have negative material effects on bottom lines.
Its inevitable. You can argue about what's possible right now, but I'm not looking at it from that angle. I think these issues will be solved with time
They are using infinity compute and can’t do simple notifications. How will changing the architecture slightly or ingesting more data change that?
That belief is at odds with the mechanics of how LLMs work. It's not a question of more effort/investment/compute/whatever, it's just a reality of how the underlying systems work (non-deterministic). If you can find a way to make the context window on the scale of the human brain, you may be able to mostly mitigate this.
People want us to be at "Her" levels of AI, but we're at a far earlier stage. We can fake certain aspects of that (using TTS), but blindly trusting an AI to run everything is going to be a big mistake in the short-term. And in order for the inevitability of what you describe to take place, the predecessor(s) to that have to work in a way that doesn't scare people and businesses away.
The plowing of money and hype into the current forms of AI (not to mention the gaslighting about their ability) makes me think the real inevitability is a meltdown in the next 5-10 years which leads to AI-hesitancy on a mass scale.
Have you tried o1 pro? I find people that are making these assertions are not deeply using the models on a daily basis. With each new release, I can see the increase of capability, and can do things. I have written software in the last year that is at a level of complexity beyond my skill set. I have 15 years of SWE experience, most at FAANG. You just arent close enough to the metal to see what's coming. It's not about what we have now, its about scaling and a reliable march of model improvements. The code has been cracked, given sufficient data, anything can be learned. Neural networks are generalized learners
Yes, I use LLMs every day. Primarily for coding (a mix of Claude and OAI). I was trying to implement a simple CSS optimization step to my JS framework's build system last night and both kept hallucinating to the point (literally inventing non-existent APIs and config patterns) where I gave up and just did it by hand w/ Google and browsing docs.
The problem with your "close to the metal" assertion is that this has been parroted about every iteration of LLMs thus far. They've certainly gotten better (impressively so), but again, it doesn't matter. By their very nature (whether today or ten years from now), they're a big risk at the business level which is ultimately where the rubber has to hit the road.
Yeah I don't think we're going to come closer to a real AGI until we manage to make a model that can actually understand and think. An LLM sounds smart but it just picks the most likely response from the echoes of a billion human voices. I'm sure we'll get there but not with this tech. I'm pretty sure even be OpenAI said this with their 5 steps to AGI, LLMs were only step 1. And probably the part that will do the talking in the final AI but not the thinking.
At the moment people are so wooed by the confidence of current LLMs that they forget that there's all sorts of types of AI models. I think the key is going to be to have them work together, each doing the part they're good at.
> An LLM sounds smart but it just picks the most likely response from the echoes of a billion human voices.
This is where reasoning models come in. Train models on many logical statements then give them enough time to produce a chain of thoughts that’s indistinguishable from “understanding” and “thinking”.
I’m not sure why this leap is so hard for some people to make.
I personally don't think that will go very far. It's just a way of extracting a little bit more out of a technology that's the wrong one for the purpose.
We just are not on your level of genius to understand these things.
So obviously completely full of shit.
Broadly I agree with your position, but:
> If you can find a way to make the context window on the scale of the human brain, you may be able to mostly mitigate this.
Human brains have a much smaller context window than AI do. We can't pay attention to the last 128,000 concepts that filtered past our sensory systems — our conscious considerations are for about seven things.
There's a lot of stuff that we don't yet understand well enough to reproduce with AI, but context length is the wrong criticism for these models.
> context length is the wrong criticism for these models
You're right. What I'm getting at is the overall speed, efficiency, and accuracy of the storage, retrieval, and processing capability of the human brain.
It's kinda crazy that it can run on a few slices of bread when LLMs need kilowatts of power to write a simple paragraph :)
Why? Past progress =/= equal rate of future progress.
Sure but do they have a moat here? Anyone that can connect to an LLM could make that app.
Yes, they have the name "ChatGPT". For non-technical people this appears to be the most important thing.
Is it a household name? Anecdotally, only two of my five millennial/gen-z siblings use an AI app at all, and one of them calls her's "Gary" instead of ChatGPT. I'd be interested in seeing some actual data showing how much ChatGPT is an actual household name versus one that us technical people assume is a household name due to its ubiquity in our space.
> Is it a household name?
I think it is, yes.
It was interviewed under that name on one of the UK's main news broadcasts almost immediately after it came out. Few hundred million users. Anecdotes about teachers whose students use it to cheat.
But who knows. I was surprising people about the existence of Wikipedia as late as 2004, and Google Translate's augmented reality mode some time around the start of the pandemic.
Does AWS have a moat on cloud computing?
Yes, it would take 10s of billions of dollars to recreate the infrastructure as far as servers and AWS has its own pipelines running under the oceans.
Then you have to recreate all of the services on top of the AWS.
Then you have to deal with regulations and certifications.
Then you have to convince decision makers to go against their own interests. “No one ever got fired for Amazon”.
Then you have to convince corporations to spend money to migrate.
Yes that requires huge infrastructure investments. Creating an LLM requires huge investments. Running an LLM requires medium to big investments but using one remotely require very little investment.
This significantly overestimates the reliability of LLMs -- both their output integrity and their ability to understand context.
Bit of advice: you might want to actually use an offering before claiming it is revolutionary.
I've got 15 years of engineering experience, worked on some of the largest distributed systems at FAANG. Its coming
> worked on some of the largest distributed systems at FAANG.
As have 10s of thousands of other people who could invert a btree on the whiteboard….
Oh wow good for you! Didn't realize you were a prodigy or that this was a contest. I take it all back. /s
Maybe try some humility. You're not helping yourself with the bragging about frankly underwhelming and common (here) experience.