Constructing Distributed Hippocratic Video Databases for Privacy-Preserving Online Patient Training and Counseling

Digital video now plays an important role in supporting more profitable online patient training and counseling, and integration of patient training videos from multiple competitive organizations in the health care network will result in better offerings for patients. However, privacy concerns often prevent multiple competitive organizations from sharing and integrating their patient training videos. In addition, patients with infectious or chronic diseases may not want the online patient training organizations to identify who they are or even which video clips they are interested in. Thus, there is an urgent need to develop more effective techniques to protect both video content privacy and access privacy. In this paper, we have developed a new approach to construct a distributed Hippocratic video database system for supporting more profitable online patient training and counseling. First, a new database modeling approach is developed to support concept-oriented video database organization and assign a degree of privacy of the video content for each database level automatically. Second, a new algorithm is developed to protect the video content privacy at the level of individual video clip by filtering out the privacy-sensitive human objects automatically. In order to integrate the patient training videos from multiple competitive organizations for constructing a centralized video database indexing structure, a privacy-preserving video sharing scheme is developed to support privacy-preserving distributed classifier training and prevent the statistical inferences from the videos that are shared for cross-validation of video classifiers. Our experiments on large-scale video databases have also provided very convincing results.


I. INTRODUCTION
T O SAVE the huge cost for national health care plan, online patient training and counseling are becoming very attractive by using digital videos to educate patients on early detection and self-treatment of their life-threatening diseases [1]. Because increasing the amounts of available patient training videos and increasing the diversity of video content may result in better offerings for patient training, it is very attractive to integrate the patient training videos from multiple competitive organizations in the health care network. However, privacy regulations, consumer backlash, and other privacy concerns often prevent multiple competitive organizations from sharing their patient training videos [1]- [10]. In addition, patients with infectious or chronic diseases, such as human immunodeficiency virus syndrome (AIDS), severe acute respiratory syndrome, bird flu, hepatitis, and diabetes, may not want the professional patient trainers and organizations to identify who they are or even which video clips they are interested in because disclosing private disease information may seriously affect their employment opportunities. Such privacy concerns may prevent the patients from using online training systems for early detection and selftreatment of their life-threatening infectious and chronic diseases. Thus, there is a strong need of new techniques that are capable of protecting both access privacy and video content privacy. Unfortunately, no comprehensive framework is available today to address the following inter-related issues effectively.
1) Privacy-preserving video database indexing and retrieval: To integrate the patient training videos from multiple competitive organizations for supporting more profitable online patient training and counseling, there is an urgent need of constructing a centralized indexing of the distributed video content [6]- [10], [11], [12]. As mentioned in [13], next-generation database systems referred to as Hippocratic databases should include responsibilities for both the data privacy and the access privacy.
2) Privacy-preserving video sharing for distributed classifier training: Classifying the video clips into a set of semantic video concepts is one promising approach to achieve concept-oriented video database indexing and access [9], [14], [15]. In order to learn the accurate concept models (i.e., video classifiers) for enabling a centralized indexing of the distributed video content, it is very important to support privacy-preserving video sharing among multiple competitive organizations. Many techniques have been proposed recently to support privacy-preserving data sharing and mining [16]- [26]. By randomizing the original data or adopting secure multiparty computation, these techniques can protect the data privacy at the individual level and the algorithms are still able to recover aggregate information or to build data mining models. Data perturbation can protect individual data records, but it may also result in information loss as well as in privacy breaches due to the disclosure of perturbed data [16]- [20]. On the other hand, secure multiparty computation (SMC) approaches are too expensive to be useful for large-scale video database because of high communication costs [21]- [26]. Thus, these existing methods cannot directly be extended to enable privacy-preserving video sharing for distributed classifier training.
3) Video content privacy protection: The content privacy for the patient training videos consists of two major parts: a) privacy for the human objects who are shown in a video as the professional patient trainers or doctors; and b) privacy for the human objects who are shown in a video as the patients to illustrate the relevant clinic examples. When the patient training videos are released to the authorized patients or other parties, the video content privacy is disclosed [1], [2], [4]- [10].
To address these issues more effectively, we have developed a new system for supporting privacy-preserving online patient training and counseling and its major components are given in Fig. 1. This paper is organized as follows: Section II introduces our scheme for concept-oriented video database modeling by using concept ontology [14], [15]; Section III presents our privacypreserving video sharing scheme to enable privacy-preserving distributed classifier training; Section IV introduces our framework on privacy-preserving centralized video database indexing and retrieval with a relaxed security model; Section V gives our experimental results; We conclude this paper at Section VI.

II. CONCEPT-ORIENTED VIDEO DATABASE MODELING
The video clips are first partitioned into a set of video shots automatically [14]. The video shots are used as the basic units for video content representation and feature extraction. To protect video content privacy, human objects are further extracted from the video clips automatically. The major steps of our human object detection function are illustrated in Fig. 2. 1) Automatic image segmentation is first performed on each video frame to obtain the homogeneous image regions [27].
2) The human face regions are then located by using traditional face detection techniques [28], [29]. The detected face regions are taken as the object seeds of human object determination [27]. 3) The region relationship graph of object seeds (that is determined automatically in the image segmentation procedure) is then matched with the region constraint graph of human object (that is used for designing the human object generation function). If they are in good matching, the relevant image regions relevant are then aggregated to detect the human objects [27].
4) The detected human objects are then tracked among the video frames within the same video shot.
Obviously, our human object detection function may fail in obtaining the meaningful human objects in some cases. To address this issue, two approaches can be used to improve human object detection: a) human-system interaction can be involved for defining the regions of human objects interactively [14]; and b) detection results for all the video frames within the same video shot are integrated and the relevant confidence maps for the detection results are calculated to provide a valuable information for human object detection as shown in Fig. 3. The confidence region is generated by transforming the relevant confidences for our detection results into a binary image via thresholding. Our experimental results on automatic human object detection and tracking are shown in Fig. 4. After the human objects are extracted, all the video clips are decomposed into a set of video shots with the associated human objects. It is well accepted that the visual properties of the videos are also important for video retrieval [9], [14], thus, both the global visual features and the local visual features are extracted for characterizing various visual properties of the semantic video concepts more precisely. In this  paper, multiple feature subsets are extracted for video content representation: i) cumulative color histogram; ii) histogram of cumulative wavelet texture features; and iii) motion histogram.
A successful implementation of a distributed Hippocratic video database system for online patient training and counseling requires a well-defined database model for video indexing and privacy protection [1], [9]. Motivated by this observation, we have developed a concept-oriented framework for video database management that uses the notion of the concept ontology [14], [15]. The key idea of such concept-oriented video database organization is that the video clips are classified into a set of video concepts at different semantic levels [14], [15].
The concept ontology consists of two components: 1) semantic video concepts; and 2) their inter-concept contextual and logical relationships. The lower the level of a semantic video concept node, the narrower is its coverage of the subjects. The semantic video concepts at the first level of the concept ontology are named as atomic video concepts. Thus, the database nodes for the semantic video concepts at a lower semantic level can characterize more specific aspects of video content and they are easier to release the video content privacy and may have higher degree of privacy. To extract the text terms for semantic video concept interpretation, automatic entity extraction is first performed on large amounts of medical documents and medical experts are further involved to select the most meaningful video concepts interactively, so that these semantic video concepts are meaningful for domain experts and patient training.
The semantic similarity context (C i , C j ) between two semantic video concepts C i and C j is defined as where L(C i , C j ) is the length of the shortest path between the text terms for interpreting the semantic video concepts C i and C j in an one-direction IS-A taxonomy, D is the maximum depth of such one-direction IS-A taxonomy [30], and P (C i , C j ) is the co-occurrence probability of the semantic video concepts C j and C i in the available medical documents. From this definition, one can observe that the semantic video concepts with smaller value of L(·, ·) on the taxonomy and higher co-occurrence probability P (·, ·) will correspond to stronger inter-concept semantic similarity context (·, ·). Rather than using one single kernel function to characterize the diverse visual similarity contexts between the video clips, the visual similarity context between two video clips under each feature subset is characterized more precisely by using one particular basic kernel function. Finally, a mixture-of-kernels is used to integrate all these three basic kernel functions (i.e., three feature subsets) to characterize the diverse visual similarity contexts between the video clips more precisely where α i is the importance factor for the ith basic kernel function For two semantic video concepts C i and C j , their interconcept visual similarity context γ(C i , C j ) can be determined by performing canonical correlation analysis [31] on their video sets S i and S j : where θ and ϑ are the parameters for determining the optimal projection directions to maximize the correlations between two video sets S i and S j for C i and C j , κ(S i ) and κ(S j ) are the cumulative kernel functions for characterizing the visual correlations between the videos in the same video sets S i and S j κ(S i ) = where the visual correlation between the video clips is defined as their kernel-based visual similarity. The parameters θ and ϑ for determining the optimal projection directions are obtained automatically by solving the following eigenvalue equations: where the eigenvalues λ θ and λ ϑ follow the additional constraint λ θ = λ ϑ . The inter-concept visual similarity context γ(C i , C j ) is first normalized into the same interval as the inter-concept semantic similarity context (C i , C j ). The inter-concept semantic similarity context and the inter-concept visual context are further integrated to achieve more precise characterization of the interconcept similarity context ϕ(C i , C j ) between C i and C j where and η are the relative importance factors for the interconcept visual similarity context and the inter-concept semantic similarity context. One part of our domain-dependent concept ontology is given in Fig. 5.
The advantages of our approach for concept-oriented video database modeling and organization include: 1) by using the concept ontology for semantic video concept organization [14], [15], it is able to achieve concept-oriented video database indexing and retrieval; and 2) it is able to assign a degree of privacy of video content for each database level automatically. The lower-level video concepts on the concept ontology are used to describe more specific video content, thus, they are easier to release the privacy of video content and higher degrees of privacy should be assigned automatically.

III. PRIVACY-PRESERVING VIDEO CLASSIFIER TRAINING
To learn the classifiers collaboratively and assign the patient training video clips into the most relevant semantic video concepts automatically, we assume that there have M collaboration parties in the health care network as shown in Fig. 6. By treating these M collaboration parties as M horizontally partitions of the available video sources, we have developed a new scheme to enable privacy-preserving SVM classifier training. Because the patient training videos (which are available at each of these M collaboration parties in the collaboration network) are incomplete to characterize the diverse properties of the given semantic video concept precisely, it is impossible for any one of these M collaboration parties to independently learn a complete SVM classifier with high prediction accuracy. Thus, it is very important for each party to integrate the most significant videos from other parties to validate and improve its own weak SVM classifier and learn a complete SVM classifier collaboratively. For a given classifier training task (i.e., learning the SVM classifier for one certain video concept of interest), our algorithm takes the following steps: 1) one weak SVM video classifier is learned locally at each party by using its own training videos; and 2) these M weak SVM video classifiers are integrated as a strong classifier (i.e., complete SVM classifier) by sharing their support vectors (i.e., feature subsets that are extracted from the most significant video clips and are used to characterize the most significant and diversified properties of video content) for incremental SVM classifier training.
It is important to note that some common video clips may appear among these M collaboration parties, thus sharing the original support vectors may allow the dishonest parties in the collaboration network to integrate their own video clips and the shared support vectors to inference the privacy of other parties. Thus there is an urgent need to support privacy-preserving sharing of the support vectors for learning the complete SVM classifier collaboratively.
For a given video concept C j on the concept ontology, oneagainst-all rule is used to label the training videos Ω c j = {X l , Y l |l = 1, . . . , N} as positive videos or negative videos for learning the weak SVM classifier locally at each party. Each labeled video is a pair (X l , Y l ) that consists of three feature subsets X l and the semantic label Y l . It is important to note that the confidential training videos (i.e., original labeled training videos) are not shared with other parties for collaborative classifier training.
For each party on the collaboration network, one weak SVM classifier is first learned by using its own labeled videos. For the positive videos X l , that is, videos with Y l = +1, transformation Similarly, for the negative videos X l , that is, videos with is the function that maps X l into a higher dimensional space and the kernel function is defined as κ(X i , X j ) = Φ(X i ) T Φ(X j ). In our current implementation, we select radial basis function The margin between these two supporting planes is thus 2/ W 2 . The weak SVM classifier is then designed for maximizing the margin with the constraints f (X l ) = W Φ(X l ) + b ≥ +1 for the positive videos and f (X l ) = W Φ(X l ) + b ≤ −1 for the negative videos [32].
To incorporate the diversified videos from multiple parties for collaborative SVM classifier training, each party in the collaboration network has to share its support vectors with other parties. However, sharing the original support vectors with other parties is undesirable from privacy perspective, because the dishonest parties may leverage such original support vectors to inference the confidential information for other parties on the collaboration network, e.g., the dishonest parties may integrate such original support vectors and their own video clips to inference the privacy of other parties (i.e., whether other parties have similar video clips). There are two conflicting issues that must be addressed for achieving privacy-preserving SVM classifier training. On the one hand, sharing the original support vectors may leak confidential information of the original videos. On the other hand, the support vectors are used to interpret the underlying decision boundaries of the relevant week SVM classifier and characterize the principal properties of the original videos for the corresponding party, and thus sharing the original support vectors is critical for collaborative learning of the complete SVM classifier.
Based on this observation, we have developed a distributed framework to enable privacy-preserving SVM classifier training by generating and sharing synthetic support vectors. The synthetic support vectors are automatically generated from the original support vectors, and are used to approximate the decision boundaries (i.e., SVM margin boundaries) interpreted by the original support vectors [33]. Incorporating the synthetic support vectors for distributed SVM classifier training has at least two advantages: 1) using the synthetic values (i.e., the synthetic support vectors) to replace the original support vectors can effectively protect the video privacy because the shared synthetic support vectors are not extracted from the original videos; and 2) the synthetic support vectors can precisely preserve the underlying SVM decision boundaries that are interpreted by the original support vectors, and thus, they can be used to learn the complete SVM classifier.
The main problem for automatically generating such synthetic support vectors is to provide sufficient protection of the privacy for the individual value of each original support vector without damaging the important information (i.e., SVM margin boundaries) interpreted by the original support vectors. Based on this understanding, we have developed a novel framework to automatically generate the synthetic support vectors from the original support vectors by preserving the decision boundaries for the relevant weak SVM classifier. By the dual form for the weak SVM classifier can be defined as where X i is the ith video clip and κ(X, X i ) is the underlying kernel function; the feature subsets for the video clips with α i = 0 are called the support vectors.
To generate the synthetic support vectors from the available weak SVM classifier f (X), a synthetic SVM classifier ψ(Z) is used to approximate f (X) [i.e., approximate the decision boundaries of f (X)] where N z < N. To evaluate the approximation efficiency, two vector sets are defined as where ω represents the set for the original support vectors, and ω represents the set of the synthetic support vectors that are used to approximate the underlying SVM decision boundaries interpreted by the set of the original support vectors ω. Thus, the set of the synthetic support vectors ω can automatically be determined by minimizing The crucial point is that even, if Φ(·) is not given explicitly, (10) can be computed and minimized in terms of the kernel function [33]. In our current implementation, we have incorporated a novel iteration approach to generate the synthetic support vectors from the set of the original support vectors by minimizing the criterion function ω − ω 2 . Our iteration framework takes the following major steps.
1) The first approximation, that is, βΦ(Z), N z = 1, can be achieved by minimizing Rather than minimizing ω − βΦ(Z) 2 , we can maximize the following new criterion function Once the maximum of (12) is obtained, the value of β is determined by Because the RBF kernel is used in our current implementation, the relationship between the synthetic support vectors and the original support vectors can be obtained as and it devises an iteration relationship as From above equation, one can observe that the value for each synthetic support vector is approximated by using a linear combination of all the original support vectors in ω and the synthetic support vectors that are previously used to approximate the underlying SVM decision boundaries. This linear combination process provides two significant benefits: a) it can suppress the privacy-sensitive values for the original support vectors and limit the privacy disclosure risk effectively; and b) it can enable a negotiable approach for incremental synthetic support vector generation, that is, generating and sharing different versions of synthetic support vectors with different levels of approximation efficiency (i.e., with different numbers of synthetic support vectors) for different parties or the same party at different negotiation loops according to their individual negotiation agreements.
2) Once the first approximation of ω is available, the iteration relationship for achieving more accurate approximation of the underlying SVM decision boundaries at the next level can be defined as In the Gaussian RBF case, it is worth noting that the SVM decision boundaries that are approximated by using the synthetic support vectors can never be as good as the real decision boundaries interpreted by the original support vectors, i.e., no approximation with N z < N will lead to ω m +1 = 0. This observation also provides an effective solution for privacy protection by selecting the optimal number of the synthetic support vectors to achieve a good balance between the privacy disclosure risk, the generation and transmission cost, and the approximation efficiency.
Once the synthetic support vectors are available, the coefficient β is updated as follows: where Σ and Ξ are matrices, the components of which can be obtained by The above iteration procedure for generating the synthetic support vectors is terminated when one of the following criteria is meet: 1) a good balance (i.e., an optimal value of N z ) is reached between the privacy disclosure risk, the approximation efficiency, and the generation and transmission cost; and 2) ω m +1 falls below a specified threshold.
By performing the above iterations, we can automatically generate a set of synthetic support vectors ω = N z i=1 β i Φ(Z i ) to approximate the underlying SVM decision boundaries that are interpreted by a set of original support vectors ω = N i=1 α i Φ(X i ). These synthetic support vectors in ω are then shared with the collaboration parties according to their individual negotiation agreements. The complete SVM classifier can finally be achieved by incorporating the synthetic support vectors for incremental SVM classifier training.
In order to achieve an acceptable tradeoff between the approximation accuracy (i.e., data utility and benefit for data sharing), the generation and transmission cost, and the privacy disclosure risk, we have developed a computational scheme to enable automatic benefit/risk negotiation. The outputs of our automatic benefit-risk negotiation scheme is further incorporated to control the procedures for synthetic support vector generation and sharing. Thus, each party in the collaboration network is able to generate and share different versions of the synthetic support vectors with different levels of approximation efficiency for different parties or the same party at different negotiation loops according to their individual negotiation agreements.
We have developed a computational approach to achieve a good balance between the approximation efficiency, the generation and transmission cost, and the privacy disclosure risk. The approximation efficiency (i.e., data utility and benefit for data sharing) is defined as To illustrate the approximation efficiency, one synthetic dataset is used (i.e., using different numbers of synthetic support vectors to approximate the underlying SVM decision boundaries). This synthetic dataset consists of four spiral data groups found in pattern classification toolbox [34] and [35], and is generated by using the following steps: first, a set of synthetic data records are randomly generated from a uniform distribution of angles between 0 and 2π with radius between 0 and 1.5; second, the data records in the first and third quadrants are labeled as one class, and the rest data records are labeled as the other class; and finally, all these data records are transformed by linearly adding a radius-dependent angle to each data record, so that the added data records have larger angles than these original data records. As shown in Fig. 7, one can observe that the approximation efficiency can be improved step-wise by using more synthetic support vectors to approximate the underlying SVM margin boundaries interpreted by the original support vectors; this approximation procedure converges when an optimal number of synthetic support vectors is reached. By controlling the number of synthetic support vectors to be generated and to be shared with other parties, our negotiable framework for synthetic support vector generation and sharing is bi-directional, that is, each party has full control on its privacy disclosure risk and its individual benefit for data sharing.
From the experimental results shown in Fig. 7, one can observe that such approximation procedure converges to the underlying SVM decision boundaries interpreted by the original support vectors when the number of synthetic support vectors is close to 32, and the total number of original support vectors is 96. After this approximation procedure converges to the underlying SVM decision boundaries interpreted by the original support vectors, adding more synthetic support vectors does not result in significant improvements of the approximation efficiency. On the other hand, sharing more synthetic support vectors may also induce a higher cost for synthetic support vector generation and transmission. Thus the generation and transmission cost Υ(ω, ω ) is defined as CPU and I/O cost for synthetic support vector generation and transmission. Obviously, Υ(ω, ω ) is a monotonically increasing function of the number of synthetic support vectors, i.e., N z .
To quantify the privacy disclosure risk (ω, ω ), we have developed a computational approach to quantify three different types of privacy disclosure risk: 1) re-identification disclosure risk δ r (ω, ω ): the risk of disclosing the one-to-one relationship between the synthetic support vectors and the original support vectors; 2) linkage disclosure risk ρ(ω, ω ): the risk of reidentifying the values of the original support vectors by using the shared synthetic support vectors; and 3) confidentiality-interval inference risk φ(ω, ω ): the risk of disclosing the tight bounds of the interval values of the original support vectors where X i and Z i are the ith original support vector and the ith synthetic support vector, and ∆(ω, ω ) is the information loss incurred by the use of the synthetic support vectors to replace the original support vectors where N and N z are the total numbers of the original support vectors and the synthetic support vectors. Thus the privacy disclosure risk (ω, ω) is defined as where α 1 , α 2 , and α 3 are the weighting factors denoting the relative importance between δ r (ω, ω ), ρ(ω, ω ), and φ(ω, ω ), α 1 + α 2 + α 3 = 1. To enable customizable privacy modeling, each party can select different values for these weighting factors according to their individual concerns of various privacy disclosure risks. Based on the above descriptions, it is very important to determine the optimal number N optimal of the synthetic support vectors to be shared between two parties, and N optimal is determined by achieving a good tradeoff between the privacy disclosure risk (ω, ω ), the generation and transmission cost Υ(ω, ω ), and the approximation efficiency D(ω, ω ) where λ is a weighting factor and δ is the upper bound of the privacy disclosure risk accepted by the corresponding party. Because different parties may obtain different versions of the synthetic support vectors with different levels of approximation efficiency from other parties in the collaboration network according to their individual negotiation agreements, they may finally learn different versions of the complete SVM classifier with different prediction powers according to their individual contributions for the given collaboration task, that is, a party by sharing more high-utility synthetic support vectors with other parties may also obtain more high-utility synthetic support vectors from other parties. Thus, each party in the collaboration network can have full control on its privacy disclosure risks and its individual benefits for data sharing. To illustrate the empirical relationship between the misclassification rate and the number of synthetic support vectors to be shared, we have tested our algorithms on multiple machine learning datasets and our results are given in Fig. 8. One can observe that sharing more synthetic support vectors can result in a complete SVM classifier with low-classification error rate. We have also obtained the empirical relationship between the privacy disclosure risk and the number of synthetic support vectors to be shared as shown in Fig. 9. One can observe that the privacy disclosure risk sequentially decreases with the number of synthetic support vectors which are shared for approximating the underlying SVM decision boundaries interpreted by the original support vectors. When more synthetic support vectors are shared to approximate the underlying SVM decision boundaries, it becomes more difficult to identify the one-to-one relationships between the original support vectors and the synthetic support vectors. However, the privacy disclosure risk consists of three individual parts: re-identification risk, linkage disclosure risk, and confidentiality-interval inference risk. Sharing more synthetic support vectors may also increase the ability to predict the value intervals of the original support vectors (i.e., confidentiality-interval inference risk) and induce higher linkage disclosure risk [36]- [39], and thus, we have also obtained a slight increasing of the privacy disclosure risk when more synthetic support vectors are shared as shown in Fig. 9.
To assess the effectiveness of the complete SVM classifier, it is very important for each party to share some video clips for Fig. 9. Empirical relationship between the privacy disclosure risk (ω, ω ) (i.e., classifier performance) and the number of synthetic support vectors to be shared. cross-validation. After the human object detection function is available, it is then used to protect the content privacy at the individual video clip level. To filter out the privacy-sensitive human objects (i.e., doctors, professional patient trainers, patients in video), digital human models (i.e., virtual human objects) are used to replace the appearances of privacy-sensitive human objects in a video [1], [10]. Thus, the blurred video streams (i.e., video streams by using the virtual human objects to replace the appearances of privacy-sensitive human objects) are able to protect the privacy-sensitive information about who appear in the video. On the other hand, the blurred video streams are still able to provide necessary information about the real medical treatment procedure for one certain infectious disease and enable high-quality online patient training and counseling. Our experimental results on video content privacy protection are given in Figs. 10-13.

IV. PRIVACY-PRESERVING VIDEO DATABASE INDEXING AND RETRIEVAL
By using the underlying concept models (i.e., support vectors and SVM classifiers for video classification) for database node representation, our new framework for hierarchical video concept modeling and organization has provided an effective framework to enable concept-oriented video database indexing. After all the video clips are classified into the video concepts at different semantic levels, the concept ontology can further be incorporated to construct hierarchical video database indexing structure, where the nodes of semantic video concepts become the nodes of the video database at different semantic levels, upon which the root node of the video database can be constructed automatically.
The following techniques have been used to support conceptoriented video database indexing: 1) the support vectors for semantic video concept modeling are used to characterize the statistical property of the relevant database node (i.e., one certain video concept); 2) the concept ontology can be incorporated to determine the hierarchical structure for video database indexing (i.e., contextual relationships among the parent node and the children nodes); and 3) each database node is jointly described by multiple parameters such as keyword and support vectors.
To achieve a good balance between the access privacy and the access efficiency, we have developed a novel technique to assign the degree of privacy for each database level. The degree of privacy is defined as a monotonically increasing function of the sensitivity of video content, and thus the degree of privacy for the lth database level is defined as where L is the total number of database levels (i.e., depth of the database indexing structure from the root node to the leaf nodes), α is a constant to control the increasing rate of the degree of privacy, S(0) = 0 when l = 0 (i.e., database root node cannot release any privacy information), and S(L) = 1 when l = L (i.e., database leaf nodes may have the highest degree of privacy). From the database root node to the database leaf nodes, the degree of privacy may increase sequentially because the database nodes at a lower semantic level are used to characterize more specific aspects of video content with higher degree of privacy.
When the degree of privacy S(l) for the lth database level is higher than a threshold T δ , S(l) ≥ T δ , all the nodes at the lth database level are stored in a separate hash table, and the contextual relationship between the parent node and its children nodes is not indexed for the low-level database nodes with higher degree of privacy. For the high-level database nodes with low degree of privacy, it is very important to explicitly index the underlying contextual relationships between the parent node and its children nodes, so that higher access efficiency can be achieved for large-scale video database applications. Thus, the associated pointers can be used to index the higher-level database nodes when their degree of privacy S(l) is less than the threshold T δ , S(l) < T δ . Based on these observations, we have developed an effective technique to achieve a good balance between the access efficiency and the access privacy by selecting a suitable threshold T δ . In our current experiments, the threshold T δ is set as T δ = 0.35 and good results have been obtained.
When the degree of privacy S(l) for the lth database level is less than the threshold T δ , the contextual relationships between the parent node and its children nodes can be explicitly indexed and the following parameters are used to index the database node Q at the lth database level    Node representation 1 : (27) where Q i is the pointer associated to the ith children node of Q, N S is the number of children nodes of Q, [(Child 1 , Q 1 ), . . . , (Child i , Q i ), . . . , (Child N s , Q N s )] is the entries for the children nodes of Q, L i is the keyword for interpreting the semantics of the ith child node of Q, Θ i is the complete set of synthetic support vectors to characterize the statistical property and the decision boundaries for the ith child node of Q.
When the degree of privacy S(k) for the kth database level is higher than the threshold T δ , all the nodes at the kth database level are stored in a separate hash  (28) where Node i indicates the ith node at the (k + 1)th database level without indexing its contextual relationship with its parent node and children nodes.
After the privacy-preserving video database indexing structure is available, it is used to support more effective semi-private video retrieval over large-scale video databases. To prevent the participating video providers from tracking the patients' access interest, each video clip for online patient training is distributed on M video servers (i.e, servers for M participating video providers). To prevent the host video server from misusing the video clips shared from other competitive video providers, the video clips are encrypted by using cryptography techniques. To achieve a good balance between the access efficiency (i.e., it depends on the number of database nodes to be accessed) and the access privacy, we have proposed a relaxed security model to support low-cost semi-private video retrieval over large-scale video databases.
To reliably assure the access privacy, our distributed Hippocratic video database system is able to allow the patients to submit ν random queries to the centralized video database indexing structure. Instead of simply submitting one target query, the patients are allowed to submit ν − 1 randomly selected queries in addition to one target query. From the server's view, these ν random queries are independent, and the patients are able to reconstruct the target video clip from the query results obtained by these ν random queries. On the other hand, our distributed Hippocratic video database system is unable to identify which one of these ν random queries is the target query for the particular patient. In addition, these ν random queries are independently performed on the underlying centralized privacy-preserving video database indexing structure, and our Hippocratic video database can get the final result for each of these ν random queries from one of these M video servers (i.e., participating video providers). Thus, each video server can observe only the query sent to it and the corresponding video server can get no information on the particular video clip that the patients are really interested in.
In order to prevent the video providers in the health care network from tracking the traversal path of a particular query, the access redundancy is introduced when the degree of privacy for the lth database level is higher than the predefined threshold T δ . Instead of simply retrieving one target node and its children nodes, our private video retrieval scheme can request the database server η − 1 randomly selected nodes in addition to the target node, all these η database nodes are stored in the same hash table without the pointers associated to their parent nodes and children nodes. By introducing the access redundancy for access privacy protection, it is hard for the video server to identify which privacy-sensitive database node and particular video clips that the patients have accessed and finally got.
To achieve a good balance between the access redundancy (i.e., accessing more database nodes rather than only the target node and its children nodes) and the access privacy, we can define the disclosure of access privacy φ(η, ν) as where ∆ is the total number of database levels stored by using separate hash tables without using the pointers to identify their contextual relationships with their children nodes. When more database levels are stored by using separate hash tables (i.e., with large value of ∆), the disclosure of access privacy φ(η, ν) is reduced by introducing more access redundancy. In addition, submitting more random queries (i.e., with large value of ν) is also able to reduce the disclosure of access privacy. The access redundancy ϕ(η, ν) is defined as where c = 2.8 is a predefined constant. By quantifying the disclosure of access privacy and the access redundancy, the problem for enabling low-cost semi-private video retrieval reduces to choosing the appropriate values for η and ν, so that we can achieve a good balance between the access privacy and the access efficiency automatically. Our semi-private video retrieval technique may leak the patients' privacy when all these three conditions are reached simultaneously: 1) multiple queries for the same video concept are received from the same patient and these queries are recorded by the video providers; 2) these competitive video providers are collaborated (i.e., sharing the queries they received) to infer the patients' privacy; and 3) these competitive video providers know exactly who submitted the given query. When all these three conditions are reached simultaneously, these competitive video providers in the health care network can collaborate to infer the patients' query privacy. On the other hand, oblivious or history-independent database indexing structure can support private information retrieval [10]- [26], but it is too expensive to be useful for real applications. Thus, our future works will focus on developing more efficient oblivious database indexing structures which are able to protect the access privacy effectively while reducing the implementation cost significantly.

V. ALGORITHM EVALUATION
Our experimental algorithm evaluation focuses on: 1) evaluating the performance of our distributed framework for privacypreserving classifier training; 2) evaluating the performance of our privacy-preserving video database indexing structure; and 3) evaluating the performance of our private video retrieval framework.
The benchmark metric for the classifier evaluation includes precision ρ and recall . They are defined as where ϑ is the set of true positive samples that are related to the corresponding concept and are classified correctly, γ is the set of true negative samples that are irrelevant to the corresponding concept and are classified incorrectly, and ν is the set of false positive samples that are related to the corresponding concept but are misclassified. To evaluate our distributed framework for privacy preserving classifier training, the video set is partitioned into three individual groups and weak classifier training is first performed on these three individual data groups independently. The complete SVM classifier is then learned from the shared weak classifiers and the synthetic support vectors. One advantage of our distributed framework for privacy preserving classifier training is that it is able to control the privacy breaches by sharing different numbers of synthetic support vectors. Because these support vectors are used to approximate the decision boundaries of the complete SVM classifier for the given semantic video concept, sharing different number of support vectors can achieve different versions of the complete SVM classifier with different classification accuracy rates. Thus, a good balance between the disclosure of private information and the classification accuracy rate (i.e., depending on the quality of the complete SVM classifier) can be achieved effectively by sharing different resolutions of these weak classifiers. Based on this understanding, we have obtained the empirical relationships between the quality of the complete SVM classifier (i.e., precision of the complete SVM classifier) and the privacy disclosures as shown in Fig. 14. For three data sites, the total number of support vectors and the number of synthetic support vectors to be shared are given in Table I.
For validating the complete SVM classifier at the central site, each video provider has to share not only his/her weak classifier but also a limited number of blurred test videos. To prevent statistical inferences [36]- [39], we have also obtained the empirical relationships between the privacy disclosures and the number of blurred test videos to be shared as shown in Fig. 15. One can find that sharing more blurred test samples decreases the individual video provider's ability on controlling  the statistical inferences and results in the privacy breaches [36]- [39]. To evaluate our framework for privacy-preserving video database indexing and private video retrieval, we have also tested our distributed Hippocratic video database system with 400 h medical education videos and our experimental results are given in Fig. 16. One can find that our system can achieve a good balance between the access privacy and the access efficiency by selecting the suitable values for ν and λ.
We have also compared the differences on the transmission costs between our proposed framework and the perturbation and SMC approaches. The transmission cost is defined as Fig. 16. Empirical relationship between the access privacy (y-axis) and the access efficiency (x-axis) with ν = 5, η = 5, and ∆ = 3. the percentage between the number of shared videos and the total number of videos that are needed for achieving accurate classifier training. As shown in Fig. 17, one can find that our proposed framework can reduce the transmission cost significantly because only a limited number of the blurred test videos are needed to be shared for classifier validation.

VI. CONCLUSION
To enable privacy-preserving video sharing among multiple competitive video providers, we have developed a novel framework able to both protect the video content privacy and control the statistical inferences. By detecting the privacy-sensitive human objects automatically, our algorithm is able to effectively protect the video content privacy at the level of individual video clip. By determining the optimal size of blurred test videos for classifier validation, our proposed framework for privacy-preserving distributed classifier training is able to not only limit the privacy breaches but also improve the classifier's accuracy significantly. Our experiments in a specific domain of patient training videos have also provided very convincing results.