Home > reduc > calcs > remove_background.m

remove_background

PURPOSE ^

MSBACKADJ provides background correction for a signal with peaks

SYNOPSIS ^

function Y = remove_background(X,Y,varargin)

DESCRIPTION ^

MSBACKADJ provides background correction for a signal with peaks

   YOUT = MSBACKADJ(X,Y) adjusts the variable background (baseline) of a
   signal with peaks by following three steps: 1) estimates the background
   within multiple shifted windows of width 200 separation units along the
   x-axis, 2) regresses the varying baseline to the window points using a
   spline approximation, and 3) adjusts the background of the input signal Y.  

   X and Y are column vectors where paired values represent points in the
   signal. Y can be a matrix with several signals, all sharing the same X
   scale. Units in the X scale (separation units or s.u.) may quantify
   wavelength, frequency, distance, time or m/z depending on the type of
   instrument that generates the signal. 

   MSBACKADJ(...,'WINDOWSIZE',WS) sets the width for the shifting window.
   The default is 200 s.u., which means a background point is estimated
   for windows of 200 s.u. in width. WS may also be a function handle.
   The referenced function is evaluated at each X value to compute 
   a variable width for the windows. This option is useful for cases where
   the resolution of the signal is dissimilar at different regions.

   MSBACKADJ(...,'STEPSIZE',SS) sets the steps for the shifting window.
   The default is 200 s.u., which means a background point is estimated
   for windows at every 200 s.u. SS may also be a function handle. The
   referenced function is evaluated at each X value to compute the
   distance between adjacent windows.

   MSBACKADJ(...,'REGRESSIONMETHOD',RM) sets the method used to regress
   the window estimated points to a soft curve. The default is 'pchip';
   i.e., shape-preserving piecewise cubic interpolation. Other options are
   'linear' and 'spline' interpolation.

   MSBACKADJ(...,'ESTIMATIONMETHOD',EM) sets the method used to find the
   likely background value at every window. Default is 'quantile', in
   which the quantile value is set to 10%. An alternative method is 'em',
   which assumes a doubly stochastic model; i.e., every sample is the
   i.i.d. draw of any of two normal distributed classes (background or
   peaks). Because the class label is hidden, the distributions are
   estimated with an expectation-maximization algorithm. The ultimate
   background value is the mean of the background class.

   MSBACKADJ(...,'SMOOTHMETHOD',SM) sets the method used to smooth the
   curve of estimated points, useful to eliminate the effect of possible
   outliers. Options are 'none' (default), 'lowess' (linear fit), 'loess'
   (quadratic fit), or 'rlowess' and 'rloess' (robust linear and quadratic
   fit).

   MSBACKADJ(...,'QUANTILEVALUE',QV) changes the default quantile value.
   The default is 0.10.

   MSBACKADJ(...,'PRESERVEHEIGHTS',true) sets the baseline subtraction
   mode to preserve the height of the tallest peak in the signal when
   subtracting the baseline. By default heights are not preserved.

   MSBACKADJ(...,'SHOWPLOT',SP) plots the background estimated points, the
   regressed baseline, and the original signal. When SP is TRUE, the first
   signal in Y is used. If MSBACKADJ is called without output arguments, a
   plot will be shown unless SP is FALSE. SP can also contain an index to
   one of the signals in Y.

   Examples: 

      % Correct the baseline of SELDI-TOF mass-spectrograms:
      load sample_lo_res

      % Adjust the baseline of a group of spectrograms.
      YB = msbackadj(MZ_lo_res,Y_lo_res);

      % Plot the third spectrogram in Y_lo_res and its estimated baseline.
      msbackadj(MZ_lo_res,Y_lo_res,'SHOWPLOT',3);

      % Plot the estimated baseline for the fourth spectrogram in Y_lo_res
      % using an anonymous function to describe a MZ dependent parameter.
      wf = @(mz) 200 + .001 .* mz;
      msbackadj(MZ_lo_res,Y_lo_res(:,4),'STEPSIZE',wf);


   See also MSALIGN, MSHEATMAP, MSLOWESS, MSNORM, MSPREPRODEMO,
   MSRESAMPLE, MSSGOLAY, MSVIEWER.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function Y = remove_background(X,Y,varargin)
0002 %MSBACKADJ provides background correction for a signal with peaks
0003 %
0004 %   YOUT = MSBACKADJ(X,Y) adjusts the variable background (baseline) of a
0005 %   signal with peaks by following three steps: 1) estimates the background
0006 %   within multiple shifted windows of width 200 separation units along the
0007 %   x-axis, 2) regresses the varying baseline to the window points using a
0008 %   spline approximation, and 3) adjusts the background of the input signal Y.
0009 %
0010 %   X and Y are column vectors where paired values represent points in the
0011 %   signal. Y can be a matrix with several signals, all sharing the same X
0012 %   scale. Units in the X scale (separation units or s.u.) may quantify
0013 %   wavelength, frequency, distance, time or m/z depending on the type of
0014 %   instrument that generates the signal.
0015 %
0016 %   MSBACKADJ(...,'WINDOWSIZE',WS) sets the width for the shifting window.
0017 %   The default is 200 s.u., which means a background point is estimated
0018 %   for windows of 200 s.u. in width. WS may also be a function handle.
0019 %   The referenced function is evaluated at each X value to compute
0020 %   a variable width for the windows. This option is useful for cases where
0021 %   the resolution of the signal is dissimilar at different regions.
0022 %
0023 %   MSBACKADJ(...,'STEPSIZE',SS) sets the steps for the shifting window.
0024 %   The default is 200 s.u., which means a background point is estimated
0025 %   for windows at every 200 s.u. SS may also be a function handle. The
0026 %   referenced function is evaluated at each X value to compute the
0027 %   distance between adjacent windows.
0028 %
0029 %   MSBACKADJ(...,'REGRESSIONMETHOD',RM) sets the method used to regress
0030 %   the window estimated points to a soft curve. The default is 'pchip';
0031 %   i.e., shape-preserving piecewise cubic interpolation. Other options are
0032 %   'linear' and 'spline' interpolation.
0033 %
0034 %   MSBACKADJ(...,'ESTIMATIONMETHOD',EM) sets the method used to find the
0035 %   likely background value at every window. Default is 'quantile', in
0036 %   which the quantile value is set to 10%. An alternative method is 'em',
0037 %   which assumes a doubly stochastic model; i.e., every sample is the
0038 %   i.i.d. draw of any of two normal distributed classes (background or
0039 %   peaks). Because the class label is hidden, the distributions are
0040 %   estimated with an expectation-maximization algorithm. The ultimate
0041 %   background value is the mean of the background class.
0042 %
0043 %   MSBACKADJ(...,'SMOOTHMETHOD',SM) sets the method used to smooth the
0044 %   curve of estimated points, useful to eliminate the effect of possible
0045 %   outliers. Options are 'none' (default), 'lowess' (linear fit), 'loess'
0046 %   (quadratic fit), or 'rlowess' and 'rloess' (robust linear and quadratic
0047 %   fit).
0048 %
0049 %   MSBACKADJ(...,'QUANTILEVALUE',QV) changes the default quantile value.
0050 %   The default is 0.10.
0051 %
0052 %   MSBACKADJ(...,'PRESERVEHEIGHTS',true) sets the baseline subtraction
0053 %   mode to preserve the height of the tallest peak in the signal when
0054 %   subtracting the baseline. By default heights are not preserved.
0055 %
0056 %   MSBACKADJ(...,'SHOWPLOT',SP) plots the background estimated points, the
0057 %   regressed baseline, and the original signal. When SP is TRUE, the first
0058 %   signal in Y is used. If MSBACKADJ is called without output arguments, a
0059 %   plot will be shown unless SP is FALSE. SP can also contain an index to
0060 %   one of the signals in Y.
0061 %
0062 %   Examples:
0063 %
0064 %      % Correct the baseline of SELDI-TOF mass-spectrograms:
0065 %      load sample_lo_res
0066 %
0067 %      % Adjust the baseline of a group of spectrograms.
0068 %      YB = msbackadj(MZ_lo_res,Y_lo_res);
0069 %
0070 %      % Plot the third spectrogram in Y_lo_res and its estimated baseline.
0071 %      msbackadj(MZ_lo_res,Y_lo_res,'SHOWPLOT',3);
0072 %
0073 %      % Plot the estimated baseline for the fourth spectrogram in Y_lo_res
0074 %      % using an anonymous function to describe a MZ dependent parameter.
0075 %      wf = @(mz) 200 + .001 .* mz;
0076 %      msbackadj(MZ_lo_res,Y_lo_res(:,4),'STEPSIZE',wf);
0077 %
0078 %
0079 %   See also MSALIGN, MSHEATMAP, MSLOWESS, MSNORM, MSPREPRODEMO,
0080 %   MSRESAMPLE, MSSGOLAY, MSVIEWER.
0081 
0082 %   Copyright 2003-2008 The MathWorks, Inc.
0083 %   $Revision: 1.2 $  $Date: 2014/05/21 21:29:51 $
0084 
0085 % References:
0086 % [1] Lucio Andrade and Elias Manolakos, "Signal Background Estimation and
0087 %     Baseline Correction Algorithms for Accurate DNA Sequencing" Journal
0088 %     of VLSI, special issue on Bioinformatics 35:3 pp 229-243 (2003)
0089 
0090 % check inputs
0091 %bioinfochecknargin(nargin,2,mfilename);
0092 % set defaults
0093 stepSize = 200;
0094 windowSize = 200;
0095 regressionMethod = 'pchip';
0096 estimationMethod = 'quantile';
0097 smoothMethod = 'none';
0098 quantileValue = 0.1;
0099 preserveHeights = false;
0100 maxNumWindows = 10000;
0101 if nargout == 1
0102     plotId = 0; 
0103 else
0104     plotId = 1;
0105 end
0106 
0107 if  nargin > 2
0108     if rem(nargin,2) == 1
0109         error('Bioinfo:msbackadj:IncorrectNumberOfArguments',...
0110             'Incorrect number of arguments to %s.',mfilename);
0111     end
0112     okargs = {'stepsize','windowsize','regressionmethod',...
0113               'estimationmethod','quantilevalue','preserveheights',...
0114               'smoothmethod','showplot'};
0115     for j=1:2:nargin-2
0116         pname = varargin{j};
0117         pval = varargin{j+1};
0118         k = find(strncmpi(pname, okargs,length(pname)));
0119         if isempty(k)
0120             error('Bioinfo:msbackadj:UnknownParameterName',...
0121                 'Unknown parameter name: %s.',pname);
0122         elseif length(k)>1
0123             error('Bioinfo:msbackadj:AmbiguousParameterName',...
0124                 'Ambiguous parameter name: %s.',pname);
0125         else
0126             switch(k)
0127                 case 1  % step size
0128                     stepSize = pval;
0129                     if ~isscalar(stepSize) && ~isa(stepSize,'function_handle')
0130                         error('Bioinfo:msbackadj:NotValidStepSize',...
0131                               'STEP must be a scalar or a function handle.')
0132                     end
0133                 case 2 % window size
0134                     windowSize = pval;
0135                     if ~isscalar(windowSize) && class(windowSize,'function_handler')
0136                         error('Bioinfo:msbackadj:NotValidWindowSize',...
0137                               'WIDTH must be a scalar or a function handle.')
0138                     end
0139                 case 3 % regression method
0140                     regressionMethods = {'cubic','pchip','spline','linear'};
0141                     regressionMethod = strmatch(lower(pval),regressionMethods); 
0142                     if isempty(regressionMethod) 
0143                         error('Bioinfo:msbackadj:NotValidRegressionMethod',...
0144                               'Not a valid regression method.')
0145                     end
0146                     regressionMethod = regressionMethods{max(2,regressionMethod)};
0147                 case 4 % estimation method
0148                     estimationMethods = {'quantile','em'};
0149                     estimationMethod = strmatch(lower(pval),estimationMethods); 
0150                     if isempty(estimationMethods) 
0151                         error('Bioinfo:msbackadj:NotValidEstimationMethod',...
0152                               'Not a valid estimation method.')
0153                     end
0154                     estimationMethod = estimationMethods{estimationMethod};
0155                 case 5 % quantile value
0156                     quantileValue = pval;
0157                 case 6 % preserve heights
0158                     preserveHeights = opttf(pval);
0159                 case 7 % smoothing method
0160                     smoothMethods = {'none','lowess','loess','rlowess','rloess'};
0161                     smoothMethod = strmatch(lower(pval),smoothMethods); 
0162                     if isempty(smoothMethod) 
0163                         error('Bioinfo:msbackadj:NotValidSmoothMethod',...
0164                               'Not a valid smoothing method.')
0165                     elseif length(smoothMethod)>1
0166                         error('Bioinfo:msbackadj:AmbiguousSmoothMethod',...
0167                               'Ambiguous smoothing method: %s.',pname);
0168                     end
0169                     smoothMethod = smoothMethods{smoothMethod};
0170                 case 8 % show
0171                     if opttf(pval) 
0172                         if isnumeric(pval)
0173                             if isscalar(pval)
0174                                 plotId = double(pval); 
0175                             else
0176                                 plotId = 1;
0177                                 warning('Bioinfo:msbackadj:SPNoScalar',...
0178                                 'SHOWPLOT must be a single index to one of the signals in Y or a logical. Plotting the first column in Y only.')
0179                             end 
0180                         else
0181                             plotId = 1;
0182                         end
0183                     else
0184                         plotId = 0;
0185                     end
0186             end
0187         end
0188     end
0189 end
0190 
0191 % validate X and Y
0192 
0193 if ~isnumeric(Y) || ~isreal(Y)
0194    error('Bioinfo:msbackadj:IntensityNotNumericAndReal',...
0195        'Intensities must be numeric and real.') 
0196 end
0197 
0198 if ~isnumeric(X) || ~isreal(X)
0199    error('Bioinfo:msbackadj:XNotNumericAndReal',...
0200        'X scale must be numeric and real.') 
0201 end
0202 
0203 if size(X,1) ~= size(Y,1)
0204    error('Bioinfo:msbackadj:NotEqualNumberOfSamples',...
0205        'The intensity vector(s) must have the same number of samples as the X scale.')
0206 end
0207  
0208 numSignals = size(Y,2);
0209 
0210 if (plotId~=0) && ~any((1:numSignals)==plotId)
0211     warning('Bioinfo:msbackadj:InvalidPlotIndex',...
0212     'SHOWPLOT value is not a valid signal index, no plot was generated.')
0213 end
0214 
0215 multiple_X = false;
0216 if size(X,2)>1
0217     multiple_X = true;
0218     if size(X,2) ~= numSignals
0219         error('Bioinfo:msbackadj:NotEqualNumberOfXScales',...
0220         'A X scale must be supplied for every intensity vector when size(X,2)>1.')
0221     end
0222 end
0223 
0224 % change scalars to function handlers
0225 if isnumeric(stepSize)   
0226     stepSize   = @(x) repmat(stepSize,size(x));   
0227 end
0228 if isnumeric(windowSize) 
0229     windowSize = @(x) repmat(windowSize,size(x)); 
0230 end
0231 
0232 % allocate space for Xp and WE
0233 Xp = zeros(maxNumWindows,1);
0234 WE = nan(maxNumWindows,1);
0235 
0236 % calculates the location of the windows (when it is the same for all the
0237 % signals)
0238 if ~multiple_X
0239     Xpid = max(0,X(1));
0240     Xend = X(end);
0241     id = 1;
0242     while Xpid <= Xend
0243         Xp(id) = Xpid;
0244         Xpid = Xpid + stepSize(Xpid);
0245         id = id + 1;
0246         if id > maxNumWindows
0247             error('Bioinfo:msbackadj:maxNumWindowsExceeded',...
0248             'Maximum number of windows exceeded, verify the STEP value or function handle.')
0249         end
0250     end
0251     numWindows = id-1;
0252 end
0253 
0254 % iterate for every signal
0255 for ns = 1:numSignals 
0256 if nargout>0 || (ns == plotId)
0257     % find the location of the windows (when it is different for every
0258     % signal, otherwise this was done out of the loop)
0259     if multiple_X
0260        Xpid = max(0,X(1,ns));
0261        Xend = X(end,ns);
0262        id = 1; Xp = zeros(1,maxNumWindows);
0263        while Xpid <= Xend
0264            Xp(id) = Xpid;
0265            Xpid = Xpid + stepSize(Xpid);
0266            id = id + 1;
0267            if id > maxNumWindows
0268                 error('Bioinfo:msbackadj:maxNumWindowsExceeded',...
0269                 'Maximum number of windows exceeded, verify the STEP value or function handle.')
0270            end
0271        end
0272        Xp(id:end)=[];
0273        numWindows = id-1;
0274        nnss = ns;
0275     else
0276        nnss = 1;
0277     end    
0278     Xpt = Xp(1:numWindows); 
0279     Xw = windowSize(Xpt);
0280     
0281     % find the estimated baseline for every window
0282     for nw = 1:numWindows
0283         subw = Y(X(:,nnss)>=Xpt(nw) & X(:,nnss)<= (Xpt(nw)+Xw(nw)),ns);
0284         switch estimationMethod
0285             case 'quantile'
0286                 WE(nw) = quantile(subw,quantileValue);
0287             case 'em'
0288                 WE(nw) = em2c1d(subw);
0289         end
0290     end % for nw = 1:numWindows
0291     
0292     % smooth the estimated points
0293     if ~isequal('none',smoothMethod)
0294         WE(1:numWindows) = ...
0295             masmooth(Xpt+Xw/2,WE(1:numWindows),10,smoothMethod,2);
0296     end
0297             
0298     % regress the estimated points
0299     b = interp1(Xpt+Xw/2,WE(1:numWindows),X(:,nnss),regressionMethod);
0300     
0301     if (ns == plotId)
0302        figure
0303        plot(X(:,nnss),Y(:,ns))
0304        hold on
0305        plot(X(:,nnss),b,'r','linewidth',2)
0306        plot(Xpt+Xw/2,WE(1:numWindows),'kx')
0307        title(sprintf('Signal ID: %d',ns));
0308        xlabel('Separation Units')
0309        ylabel('Relative Intensity')
0310        legend('Original Signal','Regressed baseline','Estimated baseline points')
0311        axis([min(X(:,nnss)) max(X(:,nnss)) min(Y(:,ns)) max(Y(:,ns))])
0312        grid on
0313        hold off
0314        setAllowAxesRotate(rotate3d(gcf),gca,false)
0315     end
0316     
0317     % apply the correction
0318     if preserveHeights
0319         K = 1 - b/max(Y(:,ns));
0320         Y(:,ns) = (Y(:,ns) - b) ./ K;
0321         %[YMax,locMax] = max(Y(:,ns));
0322         %K = 1 - b(locMax)/YMax;
0323         %Y(:,ns) = (Y(:,ns) - b) / K;
0324     else
0325         Y(:,ns) = (Y(:,ns) - b);
0326     end 
0327 
0328 end % if nargout>0 || (ns == plotId)
0329 end % for ns = 1:numSignals
0330 
0331 if nargout == 0 
0332     clear Y
0333 end
0334 
0335 
0336 
0337     
0338

Generated on Sun 14-Jun-2015 17:12:45 by m2html © 2005