2022-09-23:

Crow HTTP framework use-after-free

vulnerability:crow

(Collaborative post by hebi and Gynvael Coldwind)

Crow is an asynchronous C++ HTTP/WebSocket framework for creating "flask-like" web services. In early August we discovered a pretty interesting use-after-free vulnerability. Since Crow takes advantage of the Asio library for asynchronous input/output operations, analysis of this vulnerability took a few long evenings since the cause was split between multiple interweaved tasks and callbacks. Eventually we traced the root cause to an interesting mismatch between two layers of code, one of which - the HTTP parser - was supporting HTTP pipelining (or rather was agnostic towards it, which resulted in pipelining being inadvertently supported), while the other - HTTP server logic - was not designed to take HTTP pipelining into account. This resulted in some interesting "race conditions" with one task "thinking" an HTTP connection was over (and deleting objects) while another still using them while processing a separate HTTP request.

One thing to note is that we never proved exploitability (as in: actual RCE, since it's pretty easy to just trigger this vulnerability to cause a DoS) due to ENOTIME, though we believe it should be possible, if highly complex.

The vulnerability in question was reported mid-August and fixed within 6 days.

CVSS, CVE, etc

Human readable details are in the next section.

  • CVE: CVE-2022-38667
  • CVSS 3.1: 8.1 High (AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H) [as originally reported]
  • CVSS 3.1: 9.8 Critical (AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H) [as rated by NIST/NVD]

Timeline

  • 2022-08-11: Vulnerability discovered.
  • 2022-08-17: Vulnerability reported.
  • 2022-08-21: Public fix was proposed.
  • 2022-08-22: Public fix was merged in.
  • 2022-08-22: CVE requested and assigned.
  • 2022-09-23: Details were published.

Original report with details

*** Summary: Affected: Crow version 1.0+4 and older https://github.com/CrowCpp/Crow - maintained version (fork) https://github.com/ipkn/crow - original version Discovered by: hebi Gynvael Coldwind Method of discovery: Fuzzing, Manual Analysis (reading the code) "Crow is a C++ framework for creating HTTP or Websocket web services. It uses routing similar to Python's Flask which makes it easy to use. It is also extremely fast, beating multiple existing C++ frameworks as well as non C++ frameworks." (source: project's README.md) Crow version 1.0+4 and older are vulnerable to a use-after-free condition due to incorrect Connection object lifetime management (though partial HTTP pipelining support is the deeper underlying problem). On successful exploitation this can potentially lead to either Remote Code Execution or Information Disclosure (a failed exploitation will lead to a Denial of Service as the server will crash). See sections below for details. IMPORTANT: This vulnerability is reported under the 90-day policy (version 2021), i.e. this report will be shared publicly with the defensive community on 16th November 2022 if a patch/fix is not available by that time, or 30 days after the fix becomes available. For details please see: https://googleprojectzero.blogspot.com/2021/04/policy-and-disclosure-2021-edition.html *** Crow execution model and life of a request Note: If you understand Crow's execution model you can safely skip this section. Note: The description below is a bit simplified and contains only the information relevant to the vulnerability. Crow is using an asynchronous queued task execution model (implemented using the Asio C++ library - https://think-async.com/Asio/), which is not unlike typical JavaScript model. Technically there are actually N+1 task queues (where N denotes the number of threads chosen by the application developer) - one per each worker thread and an extra "main" queue for some timers, signals, etc. When a new client is connected, the connection (represented by a Connection object) is bound to a certain thread and its task queue. This means that all processing done by that connection is going to happen within the constrains of the same thread. In case the server handles more simultaneous connections than there are threads, each thread will handle tasks related to multiple connections. On the flip side, if there are very few connections, then basically a connection has a task queue (and thread) for itself. After the connection is established (and a Connection object made) a task tasked with receiving some data is added to the task queue. Once it completes, the completion callback is called (that's the lambda function inside the Connection::do_read() method in http_connection.h). When data is received this callback passes the data to the parser (call to parser_.feed() method; the implementation is split between parser.h and http_parser_merged.h). The parser (still operating within the same task) can then invoke a set of callbacks on various parsing events like method received, URL received, header name / value received, and so on. While all of these callbacks are technically processed within parser.h, some of them invoke methods of the Connection object. Sooner (error) or later (no error) one of these called methods will end up with a prepared ready-to-send response (e.g. a 404 response on error or 200 response with some app-generated data on success). In such case the response will either be immediately (synchronously) sent out or a new task will be added to the task queue to send the response out later and call a completion callback (see the lambda function inside the Connection::do_write()'s method). Once the parser returns, it might either require more data (in which case a new task to receive more data will be added to the task queue) or decide it's time to close the connection and finish the task. Next, if any data was asynchronously being scheduled to be sent out, a completion callback (do_write()'s lambda) might be called. The completion callback clears the response buffers and might close the connection as well if needed. The last task related to the given connection will delete the Connection object. To summarize, the list of tasks executed while processing a connection might look like this: async_accept → async_accept's completion cb (creates Connection object) → async_receive (receive HTTP request) → async_receive's completion cb (invokes parser/schedules sending) → async_write (send HTTP response) → aync_write's completion cb (deletes Connection object if needed). There might be more async_receive related tasks if data is trickling in slowly (multiple receives per HTTP request), and multiple async_write tasks if there are multiple responses to be sent out (Connection: keep-alive). As mentioned, the last task is responsible for deleting the Connection object. To control and decide when that should happen a set of two flags is used: Connection::is_reading Connection::is_writing If any receiving task is scheduled, is_reading flag is set. If any sending task is scheduled, is_writing flag is set. Once the tasks are complete and the connection is being closed, respective flags are cleared. In various places a method called Connection::check_destroy() is called, which checks the status of both these flags. If neither is set (i.e. both are cleared), then the Connection object is deemed to be no longed needed and is deleted. *** High-level root cause In essence, the vulnerability in this report boils down to incomplete HTTP pipelining support. When using HTTP pipelining the client sends multiple HTTP requests as part of the same connection one after another without waiting to receive HTTP responses for any of the previous requests (in opposition to the "normal" way of doing it, where only one request is sent, then the client waits for the complete response, and only then it proceeds with the next request). In theory this allows HTTP servers to process requests in parallel and then just send replies in order. Historically though HTTP pipelining proved to be tricky to implement correctly on both the client and server side. As such none of popular modern browsers have it enabled by default. In case of Crow the HTTP parser does support HTTP pipelining, or rather it continues to invoke callbacks for all HTTP packets that are present in the received TCP packet (perhaps it's more fair to say it's HTTP pipelining agnostic). That being said, the asynchronous Connection layer is unaware of pipelining, which leads to various logic errors born from more than one HTTP request being handled at the same time, while being limited to a state tailored to a single request and response. The simplest (and benign) way to demonstrate this is to create two endpoints that return 512KB of "1" and "2", as shown below: CROW_ROUTE(app, "/one") ([] { return std::string(1024 * 512, '1'); }); CROW_ROUTE(app, "/two") ([] { return std::string(1024 * 512, '2'); }); And then request data from both endpoints in the same TCP packet: GET /one HTTP/1.1 Host: example.com GET /two HTTP/1.1 Host: example.com In the response we would expect (in this order): 1. HTTP response header for endpoint /one. 2. 512KB of "1"s. 3. HTTP response header for endpoint /two. 4. 512KB of "2"s. Here's what we actually receive: $ cat packet.http - | nc -v 127.0.0.1 18080 | hexdump -C Connection to 127.0.0.1 18080 port [tcp/*] succeeded! 00000000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.| 00000010 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a |.Content-Length:| 00000020 20 35 32 34 32 38 38 0d 0a 53 65 72 76 65 72 3a | 524288..Server:| 00000030 20 43 72 6f 77 2f 6d 61 73 74 65 72 0d 0a 44 61 | Crow/master..Da| 00000040 74 65 3a 20 54 75 65 2c 20 31 36 20 41 75 67 20 |te: Tue, 16 Aug | 00000050 32 30 32 32 20 31 39 3a 33 36 3a 34 38 20 47 4d |2022 19:36:48 GM| 00000060 54 0d 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b |T..Connection: K| 00000070 65 65 70 2d 41 6c 69 76 65 0d 0a 0d 0a 31 31 31 |eep-Alive....111| 00000080 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111111| * 00010000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.| 00010010 0a 63 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b 65 65 |.connection: Kee| 00010020 70 2d 41 6c 69 76 65 0d 0a 43 6f 6e 74 65 6e 74 |p-Alive..Content| 00010030 2d 4c 65 6e 67 74 68 3a 20 30 0d 0a 53 65 72 76 |-Length: 0..Serv| 00010040 65 72 3a 20 43 72 6f 77 2f 6d 61 73 74 65 72 0d |er: Crow/master.| 00010050 0a 44 61 74 65 3a 20 54 75 65 2c 20 31 36 20 41 |.Date: Tue, 16 A| 00010060 75 67 20 32 30 32 32 20 31 39 3a 33 36 3a 34 38 |ug 2022 19:36:48| 00010070 20 47 4d 54 0d 0a 43 6f 6e 6e 65 63 74 69 6f 6e | GMT..Connection| 00010080 3a 20 4b 65 65 70 2d 41 6c 69 76 65 0d 0a 31 31 |: Keep-Alive..11| 00010090 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111111| * 00020080 31 31 31 31 31 31 31 31 31 31 31 31 31 31 0d 0a |11111111111111..| 00020090 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111111| * 00080100 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111| 0008010d Note: The * sign above denotes a repeated row. To present the response in a form of the list, we get: 1. HTTP response header for endpoint /one. 2. 64KB (minus response header size) of "1"s. 3. HTTP response header for endpoint /two with Content-Length set to 0. 4. The remaining amount of "1"s. Pretty obviously responses to both packets were mixed, both in sending (HTTP response header of /two injected in the middle of /one's response) and in processing (Content-Length: 0 in /two's response). Note: As mentioned before, thankfully modern browsers disabled HTTP pipelining. Otherwise the above behavior under certain circumstances would allow for a UXSS (or straight-up JS injection). To summarize, the vulnerability discussed below exploits the fact that the Connection layer isn't aware it processes multiple requests before ever finalizing previous ones. *** Vulnerability (use-after-free) As mentioned before, if both the is_reading and is_writing flags are cleared, then the next call to Connection::check_destroy() will delete the Connection object. In case there would still be another task related to the Connection scheduled to be executed, this would lead to a use-after-free vulnerability (and potentially also a double-free in case Connection::check_destroy() is invoked again). One of the discovered ways to achieve this is to send two HTTP requests in one TCP packet: 1. The first requests any dynamic endpoint whose response is smaller than 1MB (required to fall into a specific code path in Connection::do_write_general()). 2. The second is both an invalid HTTP request (e.g. missing Host header) and is referring to a non-existing endpoint (to get a 404 error response). For example: GET /endpoint HTTP/1.1 Host: 127.0.0.1 GET /idontexist HTTP/1.1 The execution for the above example will flow as follows: async_accept → async_accept's completion cb → async_receive (receive HTTP request) → - The is_reading flag is set. async_receive's completion cb (lambda in do_read()) - The method parser_::feed() is invoked with the received data. - ... - Parser calls on_url callback, which in turn calls Connection::handle_url(), which establishes a route for /endpoint. - ... - Parser calls on_message_complete callback, which in turn invokes the endpoint's handler. - After the endpoint's handler finishes, the data is scheduled to be sent (i.e. a new "write" task is added to the task queue). This also causes the is_writing flag to be set. - ... parser continues with the second packet ... - Parser calls on_url callback, which in turn calls Connection::handle_url(), which establishes that there is no route specified for /idontexist. - This in turn causes a 404 packet to be scheduled to be sent to the client (i.e. a second "write" task is added to the task queue). This also causes the is_writing to be set (yes, it was already set). - The Parser notices there are no headers, which is invalid for HTTP/1.1 as at least Host: is required, and exists with an error. - Given Parser's error, the is_reading is cleared, the socket is closed, and this tasks finishes (note: Connection object is still alive, since is_writing is still set). → async_write (try to send first HTTP response) → aync_write's completion cb (lambda in do_write()) - The is_writing flag is cleared. - Since the socket was closed (and therefore sending failed), Connection::check_destroy() is called. - Since both is_reading and is_writing flags are cleared, the Connection object is deleted. - The task finishes. → async_write (try to send 404 HTTP response) → async_write's completion cb (lambda in do_write()) - Attempts to access the deleted Connection object, which by this point is already freed. Application crashes. In case of the example packet Crow will likely crash at that point, though it can potentially already crash when trying to execute the second scheduled async_write (since the buffers with the response data are already deleted). While we haven't attempted to exploit this vulnerability beyond triggering it, we believe successful use-after-free exploitation cannot be ruled out at this time. Exploitation would likely involve either using two Connections in the same thread (i.e. using the same task queue) with carefully chosen order of data transmission. Likely winning a race condition might be requires as well. Successful exploitation of this use-after-free vulnerability would lead to Remote Code Execution, i.e. executing arbitrary code chosen by the attacker in the context of the Crow app's process. *** Proposed fix: Given that HTTP pipelining support is not needed (browsers have it blocked, there is HTTP/2 which supports multiplexing), the best solution is to block HTTP pipelining altogether. This would likely require adding a mechanism to make sure that any evidence of multiple HTTP requests within a single parser_.feed() call (as indicated by the on_... callbacks being invoked) will result in the connection being dropped after the first reply, and no extra HTTP request will be processed. *** Other worrying patterns As mentioned before, the Connection::check_destroy() function potentially deletes the Connection object. Given this, if the object is deleted, no other operations can be performed on it (i.e. any other operation is technically a use-after-free). That said, there are 3 additional places where check_destroy() is called, but operations are still made on the object regardless. It must be said that we were not able to find a way to actually trigger these code paths, but these do look like accidents bound to happen some day, so perhaps some refactoring in the area might be in order. Place 1: End of do_write_static() function if (close_connection_) { adaptor_.shutdown_readwrite(); adaptor_.close(); CROW_LOG_DEBUG << this << " from write (static)"; check_destroy(); } res.end(); // Connection is potentially already deleted here. res.clear(); buffers_.clear(); } Place 2: End of do_write_general() for large (over 1MB) responses if (close_connection_) { adaptor_.shutdown_readwrite(); adaptor_.close(); CROW_LOG_DEBUG << this << " from write (res_stream)"; check_destroy(); } res.end(); // Connection is potentially already deleted here. res.clear(); buffers_.clear(); } } Place 3: do_write_sync()'s completion lambda (executed before function returns) Note: In this case the issue lies in functions calling do_write_sync() not being aware that the Connection object might be potentially deleted in this method. else { CROW_LOG_ERROR << ec << " - happened while sending buffers"; CROW_LOG_DEBUG << this << " from write (sync)(2)"; check_destroy(); return true; } }); Perhaps making check_destroy() return a boolean value indicating whether the object was or wasn't deleted might be a solution. That way the calling functions would know whether to immediately exit, or continue execution.

Add a comment:

Nick:
URL (optional):
Math captcha: 8 ∗ 4 + 1 =