[Dailydave] DARPA CGC Recap

Discussion:

Dan Guido

2017-04-04 04:56:10 UTC

Hey DailyDave,

I wanted to share a keynote I delivered recently on the Cyber Grand
Challenge and the broader advancements made in the field of automated
bug finding as of late. Dave was asking on Twitter if anyone had
released a detailed teardown of the CGC final event and I think my
presentation is the closest thing to it. It's pretty light, and might
be fun to watch on your way to Infiltrate.

https://blog.trailofbits.com/2017/02/16/the-smart-fuzzer-revolution/

Of course, DARPA has not released the raw data from the final event
yet so it's impossible to produce the analysis that I know Dave is
looking for. Maybe soon?

Have fun at Infiltrate everyone. I'll see you there!

-Dan

Our original conversation on Twitter:
https://twitter.com/dguido/status/841705081988870145

David Manouchehri

2017-04-10 04:33:02 UTC

Permalink

DARPA released a couple CFE datasets a few months ago.

https://cgcdist.s3.amazonaws.com/cfe/cfe-submissions.tgz
http://repo.cybergrandchallenge.com/cfe/

There's ~20k rcb binaries in there; what's still missing?

[image: Inline image 2]

The binaries look functional to me, I picked a couple at random and they
opened up fine.

[image: Inline image 1]

Post by Dan Guido
Hey DailyDave,
I wanted to share a keynote I delivered recently on the Cyber Grand
Challenge and the broader advancements made in the field of automated
bug finding as of late. Dave was asking on Twitter if anyone had
released a detailed teardown of the CGC final event and I think my
presentation is the closest thing to it. It's pretty light, and might
be fun to watch on your way to Infiltrate.
https://blog.trailofbits.com/2017/02/16/the-smart-fuzzer-revolution/
Of course, DARPA has not released the raw data from the final event
yet so it's impossible to produce the analysis that I know Dave is
looking for. Maybe soon?
Have fun at Infiltrate everyone. I'll see you there!
-Dan
https://twitter.com/dguido/status/841705081988870145
_______________________________________________
Dailydave mailing list
https://lists.immunityinc.com/mailman/listinfo/dailydave

Chris Eagle

2017-04-11 06:33:39 UTC

Permalink

I don't speak for DARPA.

FWIW, various CGC final event data is available here:

http://repo.cybergrandchallenge.com/cfe/

In particular, the score_data.json files contained in the round specific tar files in cfe-submissions.tgz allow you to see which teams fielded successful PoVs in each round.

Video of the dev team's CGC related Shmoocon panel is here: http://bit.ly/2p1LGcb

Some summary stats:

There were 82 challenge sets fielded during CFE.
Vulnerabilities were proven in 20 of them.
Unintended vulnerabilities were found in at least 5 of those 20.
The majority of flaws found were stack overflows.
In my opinion, there was only one legitimate, successful heap corruption PoV. Keep in mind that all of the challenges used custom heap implementations that the competitors had not seen before the final event.

A browsable archive of CGC data will be available soon.

Many papers are in various stages of publication by competitor teams and DARPA's CGC team. These should shed a lot of light on what took place during the final event.

Regards,

Chris

Julio Auto

2017-04-11 14:36:44 UTC

Permalink

Thought I would add that Phrack has published a really nice paper by
Shellphish (one of the teams in the finals, finished 3rd place) on CGC and
their CRS (Cyber Reasoning System):
http://phrack.org/papers/cyber_grand_shellphish.html

That's the best technical write up, to my knowledge, of the inner workings
of a top-notch CRS.

Julio Auto

Post by Chris Eagle
I don't speak for DARPA.
http://repo.cybergrandchallenge.com/cfe/
In particular, the score_data.json files contained in the round specific
tar files in cfe-submissions.tgz allow you to see which teams fielded
successful PoVs in each round.
http://bit.ly/2p1LGcb
There were 82 challenge sets fielded during CFE.
Vulnerabilities were proven in 20 of them.
Unintended vulnerabilities were found in at least 5 of those 20.
The majority of flaws found were stack overflows.
In my opinion, there was only one legitimate, successful heap corruption
PoV. Keep in mind that all of the challenges used custom heap
implementations that the competitors had not seen before the final event.
A browsable archive of CGC data will be available soon.
Many papers are in various stages of publication by competitor teams and
DARPA's CGC team. These should shed a lot of light on what took place
during the final event.
Regards,
Chris

_______________________________________________
Dailydave mailing list
https://lists.immunityinc.com/mailman/listinfo/dailydave

Chris Eagle

2017-04-11 19:10:56 UTC

Permalink

If you want to be able to do all of the performance measurements then yes that code is missing. If you want to study the successful PoVs then that code is not required. Most of them can be replayed on the publicly available VMs. However some of them depend on the specific CPUID values returned by the CFE hardware which you might need to emulate somehow. Even if all the code used to run the final event was released, the CPUID issue would continue to be a problem unless you are able to return the same CPUID values that the competitors saw during CFE.

* The kernel they ran the final event on
* The code they used to measure scores
This prevents a lot of analysis.

Dave Aitel

2017-04-20 19:51:53 UTC

Permalink

Ok, so the questions I have are still unanswered I think, possibly because
it's a lot of work. But I think they're important.

1. Was there any REAL difference between the competitors? Everyone is all
"oooh, ahh" about mayhem. But are there bugs or bugclasses it can find that
open source shellphish or the ToB work cannot? I.E. Is the final score
essentially noise for the thing we actually care about?
2. Is adding the SMT solver to the fuzzer 10% better or ... 1%? Would we be
better just special casing certain things into the fuzzer?
3. What bugs could nobody find? Why?

-dave

Post by Chris Eagle
If you want to be able to do all of the performance measurements then yes
that code is missing. If you want to study the successful PoVs then that
code is not required. Most of them can be replayed on the publicly
available VMs. However some of them depend on the specific CPUID values
returned by the CFE hardware which you might need to emulate somehow. Even
if all the code used to run the final event was released, the CPUID issue
would continue to be a problem unless you are able to return the same CPUID
values that the competitors saw during CFE.

* The kernel they ran the final event on
* The code they used to measure scores
This prevents a lot of analysis.

_______________________________________________
Dailydave mailing list
https://lists.immunityinc.com/mailman/listinfo/dailydave

David Manouchehri

2017-04-20 22:40:02 UTC

Permalink

Step 0.5: Figure out how to sanely sort and parse several thousand weird
CGC binaries.

e.g. CGC_Extended_Application.pdf is appended to the challenge binaries.
https://github.com/CyberGrandChallenge/cb-testing/blob/master/cgc-cb.mk#L194-L197

Chris: Is there any official notes on the directory/file structure
of cfe-submissions.tgz?

Post by Dave Aitel
Ok, so the questions I have are still unanswered I think, possibly because
it's a lot of work. But I think they're important.
1. Was there any REAL difference between the competitors? Everyone is all
"oooh, ahh" about mayhem. But are there bugs or bugclasses it can find that
open source shellphish or the ToB work cannot? I.E. Is the final score
essentially noise for the thing we actually care about?
2. Is adding the SMT solver to the fuzzer 10% better or ... 1%? Would we
be better just special casing certain things into the fuzzer?
3. What bugs could nobody find? Why?
-dave

* The kernel they ran the final event on
* The code they used to measure scores
This prevents a lot of analysis.

_______________________________________________
Dailydave mailing list
https://lists.immunityinc.com/mailman/listinfo/dailydave

Chris Eagle

2017-04-21 00:47:06 UTC

Permalink

As far as I know there is no official document that describes the layout of that file. I can probably cobble together something unofficial.

Chris: Is there any official notes on the directory/file structure of cfe-submissions.tgz?

David Manouchehri

2017-04-21 14:36:33 UTC

Permalink

Thanks Chris!

I'll pull up my old notes and put them on GitHub this weekend as a starting
point.

Ryan Hileman's usercorn is a great way to lower the entry bar for getting
CGC ELFs running. https://github.com/lunixbochs/usercorn (I know this isn't
news to any of you in the conversation, it's for the mail list lurkers.)

Related topic: Is anyone willing to mirror about ~1 TB of CTF PCAPs for
long term archival? Give me a ping if you can or if you know someone who
might be able to.

Post by Chris Eagle
As far as I know there is no official document that describes the layout
of that file. I can probably cobble together something unofficial.

Chris: Is there any official notes on the directory/file structure of

cfe-submissions.tgz?

Tyler Nighswander

2017-04-24 23:35:24 UTC

Permalink

Looks like a nice dump was just released that makes some of the CGC finals
info a little more friendly to look at: http://www.lungetech.com/cgc-corpus/
Doesn't answer all the questions I'm sure people have, but looks like a
great start

On Apr 24, 2017 10:54 AM, "David Manouchehri" <***@riseup.net>
wrote:

Thanks Chris!

I'll pull up my old notes and put them on GitHub this weekend as a starting
point.

Ryan Hileman's usercorn is a great way to lower the entry bar for getting
CGC ELFs running. https://github.com/lunixbochs/usercorn (I know this isn't
news to any of you in the conversation, it's for the mail list lurkers.)

Related topic: Is anyone willing to mirror about ~1 TB of CTF PCAPs for
long term archival? Give me a ping if you can or if you know someone who
might be able to.

Post by Chris Eagle
As far as I know there is no official document that describes the layout
of that file. I can probably cobble together something unofficial.

Chris: Is there any official notes on the directory/file structure of

cfe-submissions.tgz?

Kristian Erik Hermansen

2017-08-09 22:36:32 UTC

Permalink

A 2+ hour video recap released with interesting visuals and technical
analysis:

Watch "Cyber Grand Challenge: The Analysis" on YouTube

Jordan Wiens

2017-08-14 00:48:41 UTC

Permalink

Happy to answer any questions if there are any. (As best as I can remember
anyway--been a while since we first recorded it and even longer since most
of the analysis)

One of my favorite moments we found what looked like true back-and-forth
interaction between two of the CRS's. To be clear, we don't know at all
/why/ they behaved the way they did since they were black boxes from our
perspective. Even some of the teams I've talked to after the competition
have no idea why their systems did what they did -- whether because lack of
logging, or because the system architecture made introspection into which
component initiated which actions difficult.

These two systems had multiple rounds of back-and-forth behavior where:

1) a stack based BO was exploited against a service, and the payload
obfuscated the address of the flag page data it was stealing bytes from
(reading from the flag page was one mechanism for scoring).

2) a patch was submitted in the minimum time possible from the team being
scored upon that generically protected the binary by remapping the stack as
non-executable (and did a few other changes as well--they were all part of
the standard toolkit this team applied to some binaries)

3) the attacking team re-formulated their payload to use ROP gadgets,
successfully evading the NX stack protection, but now exposing the "flag
page" address they were reading data from in cleartext on the wire

4) the defending team deployed a network filter that fairly naively (but
effectively it turns out) blocked the first several bytes of the address of
the flag page, stopping the exploit.

And all it happened in less time than it would take even very good human
exploiters to land bug in the first place (at least when forced to work
with unfamiliar tools and a stressful environment). We actually have
reasonably good data on that from last year's Infiltrate NOPCert challenge.

On Wed, Aug 9, 2017 at 6:36 PM, Kristian Erik Hermansen <

Post by Kristian Erik Hermansen
A 2+ hour video recap released with interesting visuals and technical
Watch "Cyber Grand Challenge: The Analysis" on YouTube
http://youtu.be/SYYZjTx92KU
_______________________________________________
Dailydave mailing list
https://lists.immunityinc.com/mailman/listinfo/dailydave

Dave Aitel

2017-08-17 19:28:29 UTC

Permalink

I just want a list of which vulnerabilities were exploited by which engines
and in what round + all the vulnerabilities in source (which is in the repo
I think). :)

In a way, having them be able to SEE people throw vulnerabilities at each
other corrupts the data a bit I think, because you no longer no what they
FOUND and what they SAW, if that makes sense?
-dave

Post by Jordan Wiens
Happy to answer any questions if there are any. (As best as I can remember
anyway--been a while since we first recorded it and even longer since most
of the analysis)
One of my favorite moments we found what looked like true back-and-forth
interaction between two of the CRS's. To be clear, we don't know at all
/why/ they behaved the way they did since they were black boxes from our
perspective. Even some of the teams I've talked to after the competition
have no idea why their systems did what they did -- whether because lack of
logging, or because the system architecture made introspection into which
component initiated which actions difficult.
1) a stack based BO was exploited against a service, and the payload
obfuscated the address of the flag page data it was stealing bytes from
(reading from the flag page was one mechanism for scoring).
2) a patch was submitted in the minimum time possible from the team being
scored upon that generically protected the binary by remapping the stack as
non-executable (and did a few other changes as well--they were all part of
the standard toolkit this team applied to some binaries)
3) the attacking team re-formulated their payload to use ROP gadgets,
successfully evading the NX stack protection, but now exposing the "flag
page" address they were reading data from in cleartext on the wire
4) the defending team deployed a network filter that fairly naively (but
effectively it turns out) blocked the first several bytes of the address of
the flag page, stopping the exploit.
And all it happened in less time than it would take even very good human
exploiters to land bug in the first place (at least when forced to work
with unfamiliar tools and a stressful environment). We actually have
reasonably good data on that from last year's Infiltrate NOPCert challenge.
On Wed, Aug 9, 2017 at 6:36 PM, Kristian Erik Hermansen <

_______________________________________________
Dailydave mailing list
https://lists.immunityinc.com/mailman/listinfo/dailydave

Jordan Wiens

2017-08-17 19:41:06 UTC

Permalink

Bit of a crappy format, but here's a screenshot from the trace-api tool I
linked to in my other email that shows all the POVs from each team (it's
sorted by those that CRSPY got because I happened to have them selected)
but it shows all teams, just look for any non-"1" score.

Anyway, this, plus the source itself is a starting point. The source
includes ifdefs around all intended vulns. Of course, not all POVs were the
intended ones. We did some analysis but I forget the numbers offhand.

[image: Inline image 1]

Post by Dave Aitel
I just want a list of which vulnerabilities were exploited by which
engines and in what round + all the vulnerabilities in source (which is in
the repo I think). :)
In a way, having them be able to SEE people throw vulnerabilities at each
other corrupts the data a bit I think, because you no longer no what they
FOUND and what they SAW, if that makes sense?
-dave

Post by Jordan Wiens
Happy to answer any questions if there are any. (As best as I can
remember anyway--been a while since we first recorded it and even longer
since most of the analysis)
One of my favorite moments we found what looked like true back-and-forth
interaction between two of the CRS's. To be clear, we don't know at all
/why/ they behaved the way they did since they were black boxes from our
perspective. Even some of the teams I've talked to after the competition
have no idea why their systems did what they did -- whether because lack of
logging, or because the system architecture made introspection into which
component initiated which actions difficult.
1) a stack based BO was exploited against a service, and the payload
obfuscated the address of the flag page data it was stealing bytes from
(reading from the flag page was one mechanism for scoring).
2) a patch was submitted in the minimum time possible from the team being
scored upon that generically protected the binary by remapping the stack as
non-executable (and did a few other changes as well--they were all part of
the standard toolkit this team applied to some binaries)
3) the attacking team re-formulated their payload to use ROP gadgets,
successfully evading the NX stack protection, but now exposing the "flag
page" address they were reading data from in cleartext on the wire
4) the defending team deployed a network filter that fairly naively (but
effectively it turns out) blocked the first several bytes of the address of
the flag page, stopping the exploit.
And all it happened in less time than it would take even very good human
exploiters to land bug in the first place (at least when forced to work
with unfamiliar tools and a stressful environment). We actually have
reasonably good data on that from last year's Infiltrate NOPCert challenge.
On Wed, Aug 9, 2017 at 6:36 PM, Kristian Erik Hermansen <

_______________________________________________
Dailydave mailing list
https://lists.immunityinc.com/mailman/listinfo/dailydave