Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Reminiscing CGI Scripts (rednafi.com)
61 points by signa11 on Dec 27, 2023 | hide | past | favorite | 58 comments


Wow I havent written a CGI script since before Christmas.

Stuff still works just fine, a nice simple way to expose things to users without them having to ssh into a server.

Not everyone is trying to build the next adladen POS unicorn financed from abusing millions of users for a couple of cents, some are just trying to make peoples lives a little easier and make their jobs a little more efficient, and throwing some Perl, Python, PHP or whatever together often accomplishes that just fine.


Amen


CGI is still alive and well, just not popular. The Fossil SCM (https://fossil-scm.org/home) is one big dual-mode CGI/CLI app, written in C. That's what runs the sqlite source code repo (https://sqlite.org/src) and forum (https://sqlite.org/forum), too.


Fossil can work as a cgi app, it also acts as an http server(which I recommend, put it behind a http relay if you want) or if you are weird it can run as an scgi server. weird because fossil is the only scgi implementation that I know of.


> Fossil can work ... as an http server

FWIW/FYI, Fossil's developers do not typically run it that way - we use its builtin server primarily for the "ui" command and ad-hoc syncing across systems where setting up a web server is unnecessary or undesirable. The public-facing Fossil instances, for the core project and all of its sibling projects[^1], have Fossil running as a CGI.

That's not to say that you cannot or should not run the Fossil standalone server, just that those closest to the project typically do not run it that way (though i believe that Warren does so, via Docker, behind an nginx reverse proxy?).

[1]: that is, all projects headed up by Richard Hipp.


The elegance of CGI was in generically plumbing HTTP requests through to the standard unix environment whilst facilitating arbitrary server metadata. The reason it fell out of fashion was partly the objective limitations of speed, security, and no standard method of debugging, but probably also largely because of a generational shift toward interpreted languages written by new developers largely based on and used to non-unix style systems who quite pragmatically saw the server environment as a black box and were consequently motivated to use simpler and more standard environments such as that provided by php and its extensions in lieu of tediously manually configuring every server path in every script via telnet (and later ssh). Their ISPs and web hosts, aware of the security challenges of granting unix shells to inexperienced users, encouraged this and consequently embraced tooling such as cPanel to reduce the support load for their staff and spin up users faster.

Having cPanel-style user instantiation and management functions then became more and more critical as IP addresses ran out and virtual hosting took hold - the amount of manual config tweaking otherwise required (vhost, database, filesystem, (s)ftp access, etc.) exceeded a rational human workload for medium sized service providers, especially those with large numbers of users spread across an array of non-uniform infrastructure.

By the time the 'cloud' arrived, this stuff at typical ISPs and web hosts had become a mish-mash of band-aids so advanced that in hindsight we might well wonder why it took so damn long for the paradigm to shift.


CGI offered much better security than the alternatives that replaced it.


how?CGI runs with the same permissions as http server,if you need run privileged operations you have to let it sudo.. fastcgi is safer


It runs in a separate process, and is therefore isolated from the HTTP server. Making it run as the owner of the file is also quite common and trivially achieved (suExec).

mod_php, on the other hand, runs inside the HTTP server process and with the same uid and capabilities, with no way to change that.


Persistent servers moved the goal-posts for problems at least. With stateless HTTP, maintaining state somewhere was always part of the challenge, so CGI programmers would be tempted to put too much into client-side cookies. That meant the user could see and tamper with that info. On the other hand, the brief existence of that CGI program meant that the OS cleaned up much of what the programmer may not have.


Many of us that old remember independently implementing session engines before they were standard features in the various web-oriented languages of the era (mostly perl and php)... this solved the issue of trusted client-side state by isolating the shared secret to a session identifier and storing real state server side, usually in a file or database. Naïve implementations did not, however, solve for issues such as locking, enforced sequencing for multi-stage actions, token loss, or multiple sessions or clients under an individual account, leading to a plethora of web engine state related bugs, some of which probably re-emerge to this day in novel and embedded session engines.


Old head here. CGI scripts fell out of fashion for the same reason that PHP exists: people wanted a slightly easier, more flexible way to write and update web apps. It wasn't really performance or security as much as the fact that C is annoying to write and debug a web app in.

> Modern application servers like Uvicorn, Gunicorn, Puma, Unicorn or even Go’s standard server have addressed these inefficiencies by maintaining persistent server processes.

PHP and Perl were doing this 20 years ago, slightly better / more flexibly than these modern app servers. Apache2 and mod_php or mod_perl would let you choose between a threaded or forked model and give you the same choices you have today. New languages are just more trendy so they reinvent the wheel in them, but it's the same old thing.

Another advantage of using Apache was you had a ton of modules, could use multiple languages to write different web apps in one "app server", and serve both static and dynamic content out of the one server, or choose to move static out to a separate array of servers to lighten load. Today you'd use Nginx for your static content and Unicorn for your dynamic, so there's more and different moving pieces to do the same thing, and you lack features in both.

And people today while a lot about platforms and want to run their own VPS just to serve a CRUD app. But you know what we used to do? Pay $2 a month to someone who managed a web server for us, giving us a /cgi-bin/ we could upload our code to, along with .htaccess files to control Apache. Totally managed server, with a management web interface, and all you do is upload a script and you have a website. No server to maintain, no operations, just pure web dev. Now, they were laughably insecure, and things like Cron jobs and persistent dynamic applications kind of didn't exist. But as they say for VPSes today "you probably don't need all that, you just need to run a basic web app" and a PHP or Perl script was good enough for 99% of the web.


> the fact that C is annoying to write and debug a web app in.

Weren’t most CGI scripts written in Perl? At least that’s what I remember.


PHP took the market away from Perl from the late 90s onward as it got more features. There were still tons of Perl CGI scripts but PHP was miles easier, and it eclipsed Perl in adoption many times over. By the mid 2000s, a very large site written in Perl was oddly antiquated (the P in "LAMP" changed from Perl to PHP)

PHP was basically the server-side JavaScript of its time. But as programming trends changed, and more and more went "client-side", more JS frameworks popped up, until Node appeared, and then there was no need to learn two languages to make a web app. Python and Ruby had their time, but neither were as dominant as PHP was or JS became.


Or bash. Or python.


Yes, thank you. I was reading through this and just thinking...neither of those things are true. Especially since, by the tail end of their popularity, both issues had mostly been mitigated. FastCGI solved the issue of process startup latency by keeping worker processes running and security was mostly a solved problem via user process isolation (at least on *nix hosts, I'm not sure about IIS CGI or ISAPI). Especially since mod_php was far more known for it's security issues in those days.

The switch was 100% made due to ease of use and lack of desire to learn/manage/debug server architecture; the slight performance boost modules offered was a small cherry on top.



> ... and things like Cron jobs ... didn't exist

Funnily enough: i once maintained a site for a client whose hoster implemented cron jobs as client-defined URLs, which could point to your CGI or PHP scripts. The provider would hit the URL (which was presumably private) on a cron-style schedule, providing a castrated semblance of cron jobs.


That is still how my private site works, and how I do side gigs for Web development.


"why they didn't catch on" is sort of false because they were very dominant for many years. It might predate the author but we were doing a lot of CGI with ruby and PHP before the days of nginx or the fpm patch. In a time when Apache httpd lightpd were dominant. When BerkeleyDB was a standard. I think generations post a technology make some assumptions about their popularity given they were never the users of them. I assure you, perl that dominated all server scripting was primarily a CGI based model on the web. Eventually technologies die out to iteration in safety, security, ease of use and generally generational changeover in developers who always opt out of the current model for the new shiny thing. Anyway I'm hoping for the revival of CGI mostly because it just make sense.


That security issue is not particular to CGI. It is an application code problem. For example, the XSS risk is the same.

In general, CGI is more secure.


> Why they didn’t catch on

They actually did, they are called serveless nowadays, and the more the merrier, taken up by a generation that didn't understand why we moved away from them.


Could you elaborate? Moving away, that is.


They touch on that on the article, regarding why they didn't caught on.

Hence why we moved on to application servers built on top of Apache, IIS, Tomcat, and so on.

Basically what people end up doing to try to improve their serveless workloads, reduce their execution costs, is reinventing application servers, poorly.

Stuff like lets put severless in containers, managed by Kubernetes, compiled into WASM, for example.


WAGI and WCGI are the WASM based spiritual successors.

https://github.com/deislabs/wagi

https://wasmer.io/posts/announcing-wcgi


It's not yet dead, at least for me. Last year I wrote a Perl CGI script to generate a random flag and country name:

https://www.thran.uk/cgi-win/fleg.pl

Source: https://github.com/lordfeck/cgi-win

Needs improvement but it does the job, and I had fun.


The author accidentally left in a bit of ChatGPT's response after "here's the plan."


Which part are you referring to?


I would like a super lightweight host process for interpreters that is similar to php-fpm and Erlang but can have shared state for database connections but the main interpreter is not a separate process but simply a function call.

Serverless and WASM runtimes are similar and Firecracker are in this space.

It would have to be multithreaded, it could communicate with nginx via domain socket as php-fpm does. I don't know if there's a way to multiplex data over a domain socket, maybe you could use multiple domain sockets and load balance between them? Or custom framing.

Does head of line blocking come into this too?


>I would like a super lightweight host process for interpreters that is similar to php-fpm

This. It's an area where PHP really pulls ahead. php-fpm has preload, persistent db connections, worker pre-warm, etc. all built in- all of these optimisations combined makes CGI very viable.


> can have shared state for database connections but the main interpreter is not a separate process but simply a function call.

> It would have to be multithreaded, it could communicate with nginx via domain socket as php-fpm does.

If we define the communication with nginx over domain socket to be HTTP/1.1, then you have basically reinvented Rails / Flask / Django / Hyper / ... - you can define each script as a Python/Ruby/Rust function and everything runs in shared processes so you can cache database connections and other things if you fancy.


isn't this fastcgi? or python app server?


Oh, weird. I was around before CGI scripts became commonplace on the web, and worked on a handful of projects in that space -- mostly in Perl, but also a bit in ColdFusion (shudder). I think there are some inaccuracies here, but didn't expect to read about someone exploring "old" CGI script technology today.

> CGI scripts mostly went out of fashion because of their limitations around performance.

Nah, performance was fine. It's mostly always been the case that the code running on the server isn't the bottleneck.

CGI scripts went out of fashion for several reasons that fall into the categories "too complex to maintain" or "there's this new thing called PHP...".

CGI scripts were often written in Perl, but also sometimes good ol' C, and occasionally some other oddball thing. But, Perl ruled the CGI script space. Over time, CGI scripts became more complex, as software often does. When people wanted something more than a simple mail-form, or a page visit counter, or a guestbook (remember those?), they'd often be faced with downloading, installing, and then maintaining a complex Perl application, which was a nightmare. Wanna see what a big Perl application looks like? Check out https://github.com/movabletype/movabletype.

Perl just didn't lend itself to well-structured code. I say this as someone who really liked Perl, TMTOWTDI and all that, and resisted moving away from it for years. But really, once your CGI script started spanning multiple .pl files, you were gonna have a bad time.

The other thing we didn't have was good templating. Handlebars and the like now just didn't really exist then. There were all kinds of ways to sort of gin up a templating system, but there were no standards and everything was homebrewed. This was really icky if you wanted to do something like make a site with a web editor -- forget all the wysiwyg stuff, just rendering the html in a clean and safe way and spitting it back out was a bit of a faff.

Along came PHP. It had a few advantages right out of the gate: (a) you could inline it in html, and I really can't overstate just how amazing that was at the time -- suddenly you didn't need templates anymore, you just used the html you already had; (b) it came with a good enough standard library of calls that were mostly comprehensible; and (c) contrary to Perl, which Perl hackers readily referred to as "line noise", PHP's syntax was pretty clean. (Younger developers may scoff at the idea of PHP being easier to read than any other language, but it was true at the time, and older developers may grumble that it was possible to write clean Perl, and that's true too, but it's also true that Perl culture encouraged and delighted at horrid and inscrutable gibberish.)

The one other thing PHP had going for it was mod-php, which was easy for sysadmins to install and worked right alongside mod-perl, which pretty much all of them knew how to install already. So, every little web host added support for PHP practically overnight.

> When a CGI script is executed, it initiates a new process for each request. While this approach is straightforward, it becomes increasingly inefficient as web traffic volume grows.

This is true, but largely irrelevant to why CGI scripts fell out of favor. Lots of people on-prem'd or colo'd their own stuff back then (I had a beige box on an ISDN once upon a time!), and it was really hard to get enough traffic to make a machine fall over because it was spawning too many processes. Usually your bandwidth would get saturated before that happened.

There was, maybe, a brief period where this was sort of a thing, where 56k modems were everywhere that DSL wasn't and people started to pay companies to run stuff for them, but even then -- as now -- the bottleneck was usually not in the number of running processes.

> Modern application servers like Uvicorn, Gunicorn, Puma, Unicorn or even Go’s standard server have addressed these inefficiencies by maintaining persistent server processes. This, along with the advantage of not having to bear the VM startup cost, has led people to opt for these alternatives.

Heh, heh. Some people may be using those because they read on somebody's blog that everybody else is using those so they should probably use those too, but I could become a wealthy man betting $100 to every dollar that any web dev who thinks switching from LAMP, php-fpm, nginx, or whathaveyou to Spangly Animal Server is gonna be their big performance win hasn't actually done a comprehensive performance profile of their application. You are burning waaaaay more milliseconds on your JS dependencies than you are on spawning a new thread.


> Perl just didn't lend itself to well-structured code. … once your CGI script started spanning multiple .pl files, you were gonna have a bad time.

You would split your app across .pm files (modules) not .pl (scripts). At the control layer this meant classes based in CGI::Prototype, CGI::Application or the like. At the model layer it would be DBIx::Class or Class::DBI. View would be Template. Then in each layer you’d have a hierarchy of classes like every other language (as .pm files).

Perl has its issues but simply organizing code in a tidy way was not a problem. Problem had more to do with the readability of code, lack of basic OO facilities, awkward split between “references” (pointers) and direct values.

> Along came PHP. It had a few advantages right out of the gate: (a) you could inline it in html, and I really can't overstate just how amazing that was at the time -- suddenly you didn't need templates anymore, you just used the html you already had

While this was absolutely a win in terms of convenience when taking a site dynamic for the first time, and made PHP wildly popular especially for replacing old cgi scripts (va apps) it seems to have made PHP worse from a maintainability standpoint than well written Perl when it comes to sizable web apps.

In terms of libraries Perl had a very nice standard lib plus CPAN where the code and docs were generally to a very high standard. I don’t think PHP won at a code architecture level, it won at the very things that tended to produce the messiest, most amateurish Perl code that gave the language a poor reputation - hacking up a quick and dirty solution. It was superb at that and I admire it for that. But let’s not overstate the thoughtfulness of early PHP. It was an abomination of a language in many ways.


Not the OP, but when I was doing CGI scripts Perl 5 wasn't released yet. And even when it was it took a long while for people and systems to migrate. Quite a bit of the early web apps were written in Perl 4 style.


Perl 5 has been around since 1995.

I was already using it a year later, for our distributed computing labs, where doing a CGI in C and Perl was part of one exercise.


> CGI scripts went out of fashion for several reasons that fall into the categories "too complex to maintain" or "there's this new thing called PHP...".

This is how I remember it also. Doing CGI "well" was significantly harder. And new ways of making dynamic web content was rapidly coming out.

There were no HTTP server libraries in any languages. All web servers were large codebases. Most of the web sites ran on Apache in multiuser environments. Configuring (and compiling) Apache correctly and securely took a bit of voodoo in those environments.

The security issues with CGI wasn't just injection attacks. How should admins setup safeguards when every user could write a binary/script that remote people could execute? By the time best practices evolved PHP was taking off. And PHP offered a little bit more of a sandbox.

And then... ColdFusion. Java Servlets. JSP. ASP. And many more ways of creating dynamic websites that were often easier, had better libraries tailored to webdev, and included better sandboxing.

Web servers started to become proxies to long running processes and stateful web apps became a thing. For better or worse. (for worse IMHO. :D)

CGI with Go or Rust could be pretty interesting and significantly easier than my first C-based CGI binaries. Mostly because of their extensive web-dev library options and dependency management tools.


Even in the early 2000s I was writing (new) projects in perl, using the excellent "CGI::Application" module/framework. That worked with both old-school CGI, and FastCGI.

https://metacpan.org/pod/CGI::Application

Using that module gave a good foundation for structuring code in a maintainable way, and also providing test-cases alongside it.

One of the things I like about coding in golang is the strong emphasis on testing, but I'd say that this was also a big deal for people writing in Perl. Sure there is a reputation for line-noise, but CPAN is/was full of well-tested modules and extensions and I always appreciated the built in test::tap/prove support.

I like your comment about templating, it's something I'd never considered before. At the time I used HTML::Template, or some other module, and found it was "good enough".

At the time I remember bumping into a lot of PHP, but due to settings available to mod_php5 you'd find code that worked on one host didn't necessarily work on another host. That was something that was never a problem with perl. Though I suspect it wasn't so often that a site got so complex/slow that I had to resort to mod_perl.


CGI's performance may have been fine for Perl, which has very low startup overhead, but those who wanted to write apps in most other languages needed something else. And even then there was mod_perl, as you mentioned, which eliminated this overhead for Perl as well, suggesting that perhaps CGI performance was often not fine.


There also was fastCGI and mod_perl. Template::Toolkit was really nice. But yes, installing and configuring perl apps could take days.


The perl to PHP transition sounds like another "less is more" story.

I didn't know about the code readability issues, since the few perl I've seen was more readable than the PHP I saw (wp sociable plugin still haunts me).

In the end I thought it was mostly a mod perl issue causing memory bloat and security issues (and thus costs)


The example code in the article is a bit weird. Why are the query parameters handled on a case-by-case basis in the cgiHandler function (in the web server)? CGI normally passes parameters to the script in the QUERY_STRING variable or on standard input for POST requests. See the spec[0] for details.

An older version of the article[1] also handled the headers in the web server instead of leaving it up to the script as you normally would with CGI scripts. It's really more of an example of executing a subprocess than implementing CGI.

Also, the "significant vulnerability to injection attacks" has nothing to do with CGI. It comes from taking plain text as input and then treating it as HTML without actually converting it to HTML. The solution is not to "sanitize" it but to encode it into the format you want to output.

[0] https://datatracker.ietf.org/doc/html/rfc3875.html

[1] https://web.archive.org/web/20231226033541/https://rednafi.c...


I think the author missed the point of CGI. The point being that the web server doesn’t parse anything, it just passes the request to the CGI program via environment variables and standard input/output.


My first CGI was using Informix-4GL. If you're familiar with 4GL, you might emote "you what!?". It was a very crude tech demo, it was the hammer I had and shortest path the database data, and it was not a good fit.

But that's kind of the point. Ye Olde "request -> stdin -> program -> stdout -> response" makes the computing piece very tech agnostic.

My second foray was using SIOD (Scheme in One Defun). It had marginal HTTP/CGI support. More than enough for my purposes: running, caching, and rendering SQL reports from text files to PDF so they can be downloaded.

Those were internal projects, nothing on the public internet. But it just helps to demonstrate the flexibility of CGI in a new world of interconnectivity when the tech and tools were still forming and cooling out of the heated plasma of the new age.


Very interesting read, like the author I too recently went on almost that exact same journey except with Dart.

In my mind I was looking to do a bit of a compare and contrast exercise with a very modern solution like Envoy.


I've been funded (with later successful exit) on the basis of a quick'n'dirty demo that merely duct-taped CGI and sendmail.


Nothing wrong with cgi, it fills a distinct convention for interfacing a program with a web server. The real wtf is fastcgi. There is no good reason for fastcgi to exist, let alone be popular. it is almost like everybody got in the habit that you needed cgi to hook your program to a web server. Then, when we wanted persistent processes for efficiency, there was a collective brain fart, we forgot that http exists and went with a incompatible worse protocol on the sole basis that it has CGI in the name.


The very first HTTP based application I inherited was an internal system served by Apache on a Windows "server" using CGI to a "backend" written in VB6...

We would write the code in VB, compile the exe file (files? I forget), and FTP them to somewhere in c:\program files\

Pretty sure that system was a result of "when all you have is a hammer" from the previous IT manager.


I suspect it may have worked better than the overcomplicated setups you see today where people are running Kubernetes and CDNs and whatever to serve the equivalent of a low-volume blog.


The line of what's overcomplicated and not is different for everyone, spinning up a complete EKS cluster is simple, and it's something developers are familiar deploying to.


I have a very similar experience to parent. Except I inherited the program after the original programmer was fired and the company hired two programmers to try to upgrade their backend to PHP. And then they hired me because some "consultant" (really just a cousin of a friend or smth) told the company owner that they need RoR, and so I was tasked with porting everything to RoR while the other two guys were still porting it to PHP...

Eventually the three of us managed to convince the owner that maybe RoR porting wasn't such a great idea and joined forces to make the PHP thing work.

It was the time that PHP had, basically, only Drupal and Joomla to organize a typical backend for this kind of business. Zend framework was in the early stages, but definitely didn't catch on yet. Kohana, Laravel haven't been around yet.

Now, I'm not really even a Web developer, and my familiarity with PHP is... not very deep. Before then, I'd only worked on Web projects that were in either Java, Python or RoR, and usually the Web part wasn't where I was assigned to. I'd usually work on internal company's infrastructure for the project and less on the project itself.

Anyways, the first thing that was drastically different in the existing CGI from anything I'd work on before then was the "routing". I.e. all other Web frameworks tried to keep the definition of what are the things accessible on your Web site in one place. In some hierarchical fashion. This, I believe, why the ideas s.a. REST and Swagger caught on: conceptually they resembled the way developers thought about their Web applications. If there was a $site/user URL, then it was natural and expected that $site/user/shopping-cart would be where you'd find the information about what that user wanted to buy. CGI scripts, at least those I've inherited had nothing of the sort. Since this was ASP Classic, it was just a directory with files named like shopping_cart.vb or checkout.js.

Why this was awful: in a hierarchical routing definition a developer had no problem answering questions s.a. "is this endpoint supposed to be reachable?". Also, since these routing schemes often mapped to classes / some kind of other internal program hierarchy, it was quite obvious to the developer in what state (what session variables or cookies will be present) a certain endpoint might be visited. With CGI it was neigh impossible to tell if it's legal for the particular page to be displayed when a particular session variable isn't set or not.

Especially, given the age of the project and the lack of discipline and general understanding of version control on the part of the previous authors the project often contained multiple but slightly different copies of what conceptually would've been the same page. There'd be stuff like "shopping_cart Copy (1).vb" which you'd think was definitely a garbage leftover from someone copying files accidentally by dragging them with the mouse, and you'd be wrong... There'd also be "user_login_from_company_X_using_phone.js" which you'd later discover that company X went out of business at least five years ago.

A lot of these things are automatically prevented by using frameworks, where the authors would have to expend extra effort to cause the mayhem I've witnessed in this CGI application. While frameworks "stifle the creativity" of the Web programmers by making them all follow the same beaten path, knowing the average quality of Web programmer output, when not confined to the rules of such a framework, things go south very fast.

So, in practice, I'll say, it's a good thing people don't use CGI as much anymore. Average programmer is a bad programmer. If you have commercial goals to hit, the loss of freedom and creativity are more than compensated by the better guarantees on the lower margin on the product quality.


I learnt Perl to a fairly basic level but enough to build quite complex CRUD applications and build back end for flash apps. For a while I wrongly took CGI and Perl as pretty interchangable in my mind and in reality my actual usage of CGI probably didn't go much further than the shebang that invoked the Perl interpreter.

So this was quite an interesting read and long overdue!


I think the last time I wrote CGI was with Delphi 6 maybe. This article just brought up memories of old times.


I'm dying to know how Delphi fit into the CGI mix. Or are you just using Delphi 6 as an "era" marker. I too was using Delphi 5 and 6... and Kylix(!!!) during the CGI era, but all my web stuff was Perl/Python and some Tcl. Somehow the idea of piping CGI into a Delphi program, never crossed my mind!


I write CGIs in Lua with OpenResty's content_by_lua_file to script stuff on my home server. It's easier than setting up the kind of environments I develop with for work and it runs with few memory and Watts.


Nowdays I run my 'cgi scripts' as service workers.


It is a actual everywhere now, just in a fancier term. When you put CGI in a container then it is called AWS Lambda/Azure Function etc.


FCGI in mod_fcgid seems a nice combination for most uses. mod_fcgid's process manager kills off unused fcgi processes, so they don't linger.

There's so many ways to do dynamic HTTP, the real win though is identifying how much can be done as clientside javascript so that you don't have to dynamically generate a page and just serve those as static.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: