Previous Entry Share Next Entry

Introducing Daala part 2: Frequency Domain Intra Prediction

Now up, part two of the introduction I'm writing for Xiph's upcoming video codec Daala. The fact that we're using lapped transforms means we've had to apply a little cleverness to intra prediction, and so we've opted to do it in the frequency domain...

Tags: ,

  • 1
Took me a bit to understand why TrueMotion is 10th if it was added to 8 existing coding modes. Small tweak might make it obvious right away that 8 modes are in addition to DC prediction.

Also, your prediction error looks like a highpass filter. Do you think you can predict high frequencies better? Is inter-prediction going to take care of it, or maybe intra can do better? I don't suppose opus-style HF reconstruction is going to work good on images?

A few of my proofers mentioned this actually, and I simply forgot to make the tweak. Thanks for reminding me, I've made that clearer.

We're probably able to improve on every aspect of the prediction as yet. HF is inherently harder to predict well because features with HF are usually compactly localized. Eg, the VP8 predictor isn't actually doing better at 4x4 and 8x8 HF prediction than Daala, the nature of the artifacts is just very different.

Edited at 2013-07-25 06:18 am (UTC)

I was thinking, isn't the encoder trying to use bigger block sizes for smooth parts of the image, only switching to 4x4 when there's some detail? Then decoder could predict edge from the fact that it sees a 4x4 block, with features copied from some neighbors. In other words, does it make sense to train predictor not by splitting whole images into same sized 4x4 or 8x8 blocks, but by choosing block sizes as the encoder would do, as the choice of block size correlates with data?

the short answer to all of the above is yes :-)

>the short answer to all of the above is yes :-)
And does this mean that it's already done or only planned to be done?

Both; the blocksize choice algorithm is as new and untrained as the predictors AFAIK, and the modes should be trained for the blocksizes that actually get chosen. That's not yet been done for the predictors as demonstrated.

The other part of the 'long answer' is that we're not even sure.. and it's probably unlikely... that we'll use a straight prediction system with 4x4 predictors for 4x4 blocks, 8x8 predictors for 8x8 blocks and 16x16 predictors for 16x16 blocks. There's a couple reasons why that strategy is not likely to be optimal (for one thing, 16x16 predictor matrices are big) and we're also looking at several strategies for using 4x4 for everything via TF. But TF is the subject of the next demo.

Edited at 2013-07-26 08:58 pm (UTC)

Nice to read that Daala is sharing some similarities with Opus.

With every GIT update I check how the quality improved from a 10 second sample I have. A simple frame comparaison against Theora and the previous GIT of Daala.
Daala's output quality is improving quite well.

After re-encoding my DVDs to Theora-Opus I think I'll have to encode the next ones to Daala-Opus ^_^

...but a ways off as yet

Well, you'll have a few years yet before you need to worry about that.

I'm trying to keep interest up and document the potential future state-of-the-art for anyone who would like to get involved... not so much start any sort of 'any day now!' expectations. Our target for beta is 2015.

Error in VP8 samples

All of the VP8 "Prediction Error" images show an error, all right - the X and Y axes of each block have been swapped! It's especially visible when 16x16 blocks are selected.

Re: Error in VP8 samples

...huh, it certainly does. How very interesting. I wonder how that crept in. Well, time to go file a bug report...

[edit: fix pushed, corrected images uploaded]

Edited at 2013-07-27 09:39 am (UTC)

Very nice. Is it me, or does Daala's predictor tend to lose more fine details while preserving the general structure of the image, while VP8's predictor tends to lose small objects altogether if they straddle the border of two blocks? Mrio is a good example of what I'm talking about. It'll be interesting to see which turns out to be the worse error.

It certainly does right now--- but the predictors as demonstrated really are little more than early proof this idea can work. I'd expect the behavior to change markedly before anything is finalized.

  • 1

Log in