吴恩达机器学习

Sukuna发布

第一部分

第一题:WarmUp
让我们写一个函数返回一个单位阵
这真没啥好说的

function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
%   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix

A = [];

% ============= YOUR CODE HERE ==============
% Instructions: Return the 5x5 identity matrix 
%               In octave, we return values by defining which variables
%               represent the return values (at the top of the file)
%               and then set them accordingly. 
A = eye(5);
% ===========================================


end

第二题&第五题:求代价函数

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

% =================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
    f=X*theta;
    f=f-y;
    J=1/m/2*(f'*f);

% ============================================================

end

回忆公式

代价函数的表达式

由于h_\theta(x^i)=\theta^Tx那么我们可以使用这个定义求出应该有的值,再和y相减即可
注意这个h_\theta(x^i)在这里是齐次的,我们暂时不讨论非齐次的情况
怎么求和?利用向量的内积性质求出每一个测试集相差的平方和即可
如果是非齐次的怎么办?那就减去一个向量即可

第三题&第六题:梯度下降

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
%   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ================ YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCostMulti) and gradient here.
    %
    
    hyp = X*theta;
    dJ = X'*(hyp-y);利用矩阵乘法的性质与每一个元素进行了运算
    theta  = theta - (alpha/m)*dJ;
    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end
end

吴恩达给的代码里面出现了迭代,这个是用来保存每一次J的值的,我们在实际运用的时候一定要记录J的变换,判断梯度下降的停止的时刻
首先求出每一次h_\theta(x^i)-y,保存到yf里面,扩大f到和\theta一样维度的方阵里

第四题
归一化

mu=mean(X);
sigma=std(X);
X_norm=(X-ones(size(X,1),1)*mu)./(ones(size(X,1),1)*sigma);

mean:求出X每一个参数值的平均值,mean(X,维度)
std:求极差,std(X,维度)

第七题
正规方程

function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression 
%   NORMALEQN(X,y) computes the closed-form solution to linear 
%   regression using the normal equations.

theta = zeros(size(X, 2), 1);

% ==================== YOUR CODE HERE ======================
% Instructions: Complete the code to compute the closed form solution
%               to linear regression and put the result in theta.
%
theta = pinv(X' * X) * X' * y;


% ============================================================

end

第二部分

注意:X是一个m*(n+1)的矩阵,矩阵的第一列都是1!,\theta是一个(n+1)*1的向量,保存的是我们要预测的参数值

第一题:sigmoid函数
输入一个矩阵,返回每一个元素做计算的值

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%   g = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly 
g = zeros(size(z));

% ==================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).
g=1./(ones(size(z))+exp(-z))
% ============================================================
end

第二题:不正则化的梯度迭代:
返回

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));
% ==================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%
h = sigmoid(X * theta);
J = (-log(h.')*y - log(ones(1, m) - h.')*(ones(m, 1) - y)) / m;
grad = (X.' * (h - y)) /m;
% ============================================================

end

首先先求出预测值,就用第一题的那个函数,求出预测值(注意:X*theta是第一个实验作业里面的线性回归函数的表达式)注意:h.就是求转置的意思,这样可以带入代价函数算出J
grad就是每一次下降的时候递减的值,注意:没算学习效率

第三题:0&1判断
这时候就用一手if对矩阵每个元素进行判定
像exp,log,if这些作用在矩阵里面就是对每个元素求值

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic 
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, 1); % Number of training examples

% You need to return the following variables correctly
p = zeros(m, 1);

% ===================== YOUR CODE HERE =====================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters. 
%               You should set p to a vector of 0's and 1's
%
h = sigmoid(X * theta);
p = (h >= 0.5);
% ============================================================
end

第四题:有正则化的代价函数求值:

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ==================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
h = sigmoid(X * theta);
J = (-log(h.')*y - log(ones(1, m) - h.')*(ones(m, 1) - y)) / m +(lambda/(2*m)) * sum(theta(2:end).^2);
grad(1) = (X(:, 1).' * (h - y)) /m;
grad(2:end) = (X(:, 2:end).' * (h - y)) /m + (lambda/m) * theta(2:end);
% ============================================================

end

这里主要区分了grad中1和2之间的区别,其他的和公式相差无二

第三部分

多分类

X:是一个m*(n+1)维的矩阵,里面存的是m组数据集

第一题:
正则化的逻辑回归表达式

function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ==================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations. 
%
% Hint: When computing the gradient of the regularized cost function, 
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta; 
%           temp(1) = 0;   % because we don't add anything for j = 0  
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%
h=sigmoid(X*theta)
J = (-log(h.')*y - log(ones(1, m) - h.')*(ones(m, 1) - y)) / m +(lambda/(2*m)) * sum(theta(2:end).^2);
grad(1) = (X(:, 1).' * (h - y)) /m;
grad(2:end) = (X(:, 2:end).' * (h - y)) /m + (lambda/m) * theta(2:end);

% ============================================================

grad = grad(:);

end

第一题就是求正则化的代价函数,这操作基本上一样的,就提一嘴数据集的格式:X是测试用的数据集,每个行向量都是一组数据,h求出来的就是一组0-1向量

第二题:
一对多的分类

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta 
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logistic regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds 
%   to the classifier for label i

% Some useful variables
m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly 
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ==================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda. 
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%     
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
% 
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost 
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%
options = optimset('GradObj', 'on', 'MaxIter', 50);
initial_theta = zeros(size(X, 2), 1);
for c = 1:num_labels
 [all_theta(c, :)] = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), initial_theta, options);
end
% ============================================================
end

options = optimset(‘param1′,value1,’param2’,value2,…) %设置所有参数及其值,未设置的为默认值

ParameterValueDescription
Display‘off’ | ‘iter’ | ‘final’ | ‘notify’‘off’ 表示不显示输出; ‘iter’ 显示每次迭代的结果; ‘final’ 只显示最终结果; ‘notify’ 只在函数不收敛的时候显示结果.
MaxFunEvalspositive integer函数求值运算(Function Evaluation)的最高次数
MaxIterpositive integer最大迭代次数.
TolFunpositive scalar函数迭代的终止误差.
TolXpositive scalar结束迭代的X值.

fmincg这个函数说明的是可以返回\theta和cost值,具体怎么实现不提,根据这个函数可以求出向量,每次就把行向量的值赋值给X里,迭代即可,返回的是\Theta需要我们

第三题:预测(离散化)迭代函数

这里all_theta是一个k*(n+1)维的矩阵

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels 
%are in the range 1..K, where K = size(all_theta, 1). 
%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples) 

m = size(X, 1);
num_labels = size(all_theta, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ==================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters (one-vs-all).
%               You should set p to a vector of predictions (from 1 to
%               num_labels).
%
% Hint: This code can be done all vectorized using the max function.
%       In particular, the max function can also return the index of the 
%       max element, for more information see 'help max'. If your examples 
%       are in rows, then, you can use max(A, [], 2) to obtain the max 
%       for each row.
%       
[~, p] = max(X * all_theta.', [], 2);
% ============================================================
end

C = max(A,[],dim)

返回A中有dim指定的维数范围中的最大值。比如C=max(A,[],2),在矩阵中,第2维度表示列,第1维度表示行

max这个函数有两个输出,但是调用这个函数的程序只把第二个输出赋值给了p,不需要第一个输出,于是第一个输出就写成~
第一个输出就是一个索引表,记录着何时会取最大,这个略过,我们不需要
max:求出最可能的特征的值,求每行最大的值即可
X * all_theta.’乘出来就是一个m*(n+1)维的矩阵,保存的是每一个数据集属于哪一个集合的可能性

第二部分 神经网络

第四题:计算神经网络(不要求反向学习)

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% ==================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a 
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%
X = [ones(size(X), 1), X]; % Add ones to the X data matrix

X1 = sigmoid(X * Theta1.');

X1 = [ones(size(X1), 1), X1]; % Add ones to the X1 data matrix

[~, p] = max(X1 * Theta2.', [], 2);


% ============================================================


end

纯计算,theta是已经计算好的矩阵值,就直接算就行,注意要给计算出来的值一个bias值,就是加一个全是1的行


0 条评论

发表回复

Avatar placeholder

您的电子邮箱地址不会被公开。 必填项已用*标注

隐藏

总访问量:1162025    今日访问量:189    您是今天第:189 位访问者