Updated on 2024/02/02

写真b

 
MASADA Tomonari
 
*Items subject to periodic update by Rikkyo University (The rest are reprinted from information registered on researchmap.)
Affiliation*
Graduate School of Artificial Intelligence and Science Master's Program in Artificial Intelligence and Science
College of Economics Department of Economics
Graduate School of Artificial Intelligence and Science Doctoral Program in Artificial Intelligence and Science
Title*
Professor
Degree
修士(学術) ( 東京大学 ) / 修士(理学) ( 東京大学 ) / 博士(情報理工学) ( 東京大学 ) / 学士(理学) ( 東京大学 )
Research Theme*
  • 確率モデルによるテキストマイニング、特にトピックモデルを使った大規模コーパスの分析を中心に研究を進めている。潜在的ディリクレ配分法を拡張したモデルによる学術情報の分析や、ベイズ的データモデリングによるセンサデータ分析に取り組んできた。最近は、言語モデルを用いたテキストデータの分析に関心がある。【略歴】東京大学理科Ⅰ類に入学、理学系研究科情報科学専攻と総合文化研究科広域科学専攻科学史・科学哲学研究室とで修士修了。光学メーカに勤務後、東京大学大学院情報理工学系研究科で博士号取得。長崎大学での13年の教員生活を経て現在に至る。

  • Research Interests
  • probabilistic models

  • text mining

  • machine learning

  • data mining

  • Campus Career*
    • 4 2022 - Present 
      Graduate School of Artificial Intelligence and Science   Master's Program in Artificial Intelligence and Science   Professor
    • 4 2022 - Present 
      Graduate School of Artificial Intelligence and Science   Doctoral Program in Artificial Intelligence and Science   Professor
    • 4 2020 - Present 
      College of Economics   Department of Economics   Professor
    • 4 2020 - 3 2022 
      Graduate School of Artificial Intelligence and Science   Artificial Intelligence and Science   Professor
     

    Research Areas

    • Informatics / Theory of informatics

    • Informatics / Intelligent informatics

    • Informatics / Database

    Research History

    • 4 2020 - Present 
      立教大学大学院 人工知能科学研究科   教授

      More details

    • 4 2012 - 3 2020 
      Nagasaki University   Graduate school of Engineering   Associate Professor

      More details

      Country:Japan

      researchmap

    • 2008 - 2012 
      Nagasaki University   Faculty of Engineering

      More details

    • 2008 - 2012 
      Assistant Professor,Electrical and Electronic ,Faculty of Engineering,Nagasaki University

      More details

    • 2007 - 2008 
      Nagasaki University   Faculty of Engineering, Department of Computer and Information Sciences

      More details

    • 2007 - 2008 
      Assistant Professor,Computer and Information Sciences,Faculty of Engineering,Nagasaki University

      More details

    • 10 1999 - 9 2001 
      富士写真光機株式会社 職員(技術系)   光学設計部

      More details

    • 1999 - 2001 
      Engineering Staff

      More details

    ▼display all

    Education

    • - 2004 
      The University of Tokyo

      More details

      Country: Japan

      researchmap

    • - 1999 
      The University of Tokyo

      More details

      Country: Japan

      researchmap

    • - 1995 
      The University of Tokyo

      More details

      Country: Japan

      researchmap

    • - 1993 
      The University of Tokyo   Faculty of Science   Department of Information Science

      More details

      Country: Japan

      researchmap

    Awards

    • 2 2020  
      九州半導体・エレクトロニクスイノベーション協議会  令和元年度 第二回「SIIQ技術大賞」 金賞 
       
      正田備也

      More details

    • 5 2018  
      Science and Engineering Institute  Best Oral Presentation  Document Modeling with Implicit Approximate Posterior Distributions
       
      Tomonari MASADA

      More details

    • 6 2011  
      INSTICC  Best Paper Award  DOCUMENTS AS A BAG OF MAXIMAL SUBSTRINGS - An Unsupervised Feature Extraction for Document Clustering

      More details

    • 2006  
      情報処理学会論文賞 

      More details

      Country:Japan

      researchmap

    • 2003  
      DEWS優秀プレゼンテーション賞 

      More details

      Country:Japan

      researchmap

    Papers

    • Myanmar Text-to-Speech System based on Tacotron-2.

      Yuzana Win, Tomonari Masada

      International Conference on Information and Communication Technology Convergence(ICTC)   578 - 583   2020

      More details

      Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      DOI: 10.1109/ICTC49870.2020.9289599

      researchmap

      Other Link: https://dblp.uni-trier.de/db/conf/ictc/ictc2020.html#WinM20

    • Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model).

      Yuzana Win, Htoo Pyae Lwin, Tomonari Masada

      International Conference on Information and Communication Technology Convergence(ICTC)   572 - 577   2020

      More details

      Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      DOI: 10.1109/ICTC49870.2020.9289277

      researchmap

      Other Link: https://dblp.uni-trier.de/db/conf/ictc/ictc2020.html#WinLM20

    • Context-Dependent Token-Wise Variational Autoencoder for Topic Modeling. Peer-reviewed

      Tomonari Masada

      Current Trends in Web Engineering - ICWE 2019 International Workshops   35 - 47   2019

      More details

      Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

      DOI: 10.1007/978-3-030-51253-8_6

      researchmap

      Other Link: https://dblp.uni-trier.de/db/conf/icwe/icwe2019w.html#Masada19

    • Difference between Similars: A Novel Method to Use Topic Models for Sensor Data Analysis. Peer-reviewed

      Tomonari Masada, Takumi Eguchi, Daisuke Hamaguchi

      2019 International Conference on Data Mining Workshops   391 - 398   2019

      More details

      Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      DOI: 10.1109/ICDMW.2019.00064

      researchmap

      Other Link: https://dblp.uni-trier.de/db/conf/icdm/icdm2019w.html#MasadaEH19

    • Mini-Batch Variational Inference for Time-Aware Topic Modeling. Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, August 28-31, 2018, Proceedings, Part II   156 - 164   2018

      More details

      Authorship:Lead author   Publisher:Springer  

      DOI: 10.1007/978-3-319-97310-4_18

      researchmap

    • LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)10862   395 - 402   2018

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Verlag  

      This paper proposes a method of scoring sequences generated by recurrent neural network (RNN) for automatic Tanka composition. Our method gives sequences a score based on topic assignments provided by latent Dirichlet allocation (LDA). When many word tokens in a sequence are assigned to the same topic, we give the sequence a high score. While a scoring of sequences can also be achieved by using RNN output probabilities, the sequences having large probabilities are likely to share much the same subsequences and thus are doomed to be deprived of diversity. The experimental results, where we scored Japanese Tanka poems generated by RNN, show that the top-ranked sequences selected by our method were likely to contain a wider variety of subsequences than those selected by RNN output probabilities.

      DOI: 10.1007/978-3-319-93713-7_33

      Scopus

      researchmap

    • Document Modeling with Implicit Approximate Posterior Distributions. Peer-reviewed

      Tomonari Masada

      Proceedings of the International Conference on Data Processing and Applications, ICDPA 2018, Guangdong, China, May 12-14, 2018   45 - 48   2018

      More details

    • Adversarial Learning for Topic Models. Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      Advanced Data Mining and Applications - 14th International Conference, ADMA 2018, Nanjing, China, November 16-18, 2018, Proceedings   292 - 302   2018

      More details

      Publisher:Springer  

      DOI: 10.1007/978-3-030-05090-0_25

      researchmap

    • Estimating Word probabilities with neural networks in latent dirichlet allocation Peer-reviewed

      Tomonari Masada

      Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)10526   129 - 137   2017

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Verlag  

      This paper proposes a new method for estimating the word probabilities in latent Dirichlet allocation (LDA). LDA uses a Dirichlet distribution as the prior for the per-document topic discrete distributions. While another Dirichlet prior can be introduced for the per-topic word discrete distributions, point estimations may lead to a better evaluation result, e.g. in terms of test perplexity. This paper proposes a method for the point estimation of the per-topic word probabilities in LDA by using multilayer perceptron (MLP). Our point estimation is performed in an online manner by mini-batch gradient ascent. We compared our method to the baseline method using a perceptron with no hidden layers and also to the collapsed Gibbs sampling (CGS). The evaluation experiment showed that the test perplexity of CGS could not be improved in almost all cases. However, there certainly were situations where our method achieved a better perplexity than the baseline. We also discuss a usage of our method as word embedding.

      DOI: 10.1007/978-3-319-67274-8_12

      Scopus

      researchmap

    • Exploring OOV Words from Myanmar Text Using Maximal Substrings Peer-reviewed

      Yuzana Win, Tomonari Masada

      PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016   657 - 663   2016

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      This paper proposes a method for exploring out-of-vocabulary (OOV) words from Myanmar text by using maximal substrings. Our main purpose is to find OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words that do not exist in the Myanmar dictionary. Our method consists of two steps. In the first step, we extract maximal substrings, i.e., the substrings whose number of occurrences are decreased only after appending a character before or after them, from Myanmar news articles. In the second step, we make the post processing of maximal substrings, because the results obtained by maximal substrings contain noisy characters. Our post-processing is threefold. First, we reduce the number of maximal substrings. Second, we remove maximal substrings whose prefixes and suffixes are meaningless characters. Third, we find OOV words that are the substrings consisting of the two words from the existing dictionary. Consequently, we obtain the substrings as candidates of new compound words that can be inserted into the existing Myanmar dictionary after being scrutinized by native speakers. We evaluate the accuracy of new compound words by using the subjective perspective. It is found that our results do seem promising. We appeal that new compound words obtained by our method are useful for expressing the words as a single unit of meaning that can be utilized in Myanmar text effectively.

      DOI: 10.1109/IIAI-AAI.2016.73

      researchmap

      Other Link: http://dblp.uni-trier.de/db/conf/iiaiaai/iiaiaai2016.html#conf/iiaiaai/WinM16

    • Extraction of Proper Names from Myanmar Text Using Latent Dirichlet Allocation Peer-reviewed

      Yuzana Win, Tomonari Masada

      2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI)   96 - 103   2016

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      This paper proposes a method for proper names extraction from Myanmar text by using latent Dirichlet allocation (LDA). Our method aims to extract proper names that provide important information on the contents of Myanmar text. Our method consists of two steps. In the first step, we extract topic words from Myanmar news articles by using LDA. In the second step, we make a post-processing, because the resulting topic words contain some noisy words. Our post-processing, first of all, eliminates the topic words whose prefixes are Myanmar digits and suffixes are noun and verb particles. We then remove the duplicate words and discard the topic words that are contained in the existing dictionary. Consequently, we obtain the words as candidate of proper names, namely personal names, geographical names, unique object names, organization names, single event names, and so on. The evaluation is performed both from the subjective and quantitative perspectives. From the subjective perspective, we compare the accuracy of proper names extracted by our method with those extracted by latent semantic indexing (LSI) and rule-based method. It is shown that both LSI and our method can improve the accuracy of those obtained by rule-based method. However, our method can provide more interesting proper names than LSI. From the quantitative perspective, we use the extracted proper names as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by LSI and rule-based method in precision, recall and F-score.

      DOI: 10.1109/TAAI.2016.7880176

      researchmap

    • A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2016, PT IV9789   232 - 245   2016

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER INT PUBLISHING AG  

      This paper proposes a new inference for the latent Dirichlet allocation (LDA) [4]. Our proposal is an instance of the stochastic gradient variational Bayes (SGVB) [9,13]. SGVB is a general framework for devising posterior inferences for Bayesian probabilistic models. Our aim is to show the effectiveness of SGVB by presenting an example of SGVB-type inference for LDA, the best-known Bayesian model in text mining. The inference proposed in this paper is easy to implement from scratch. A special feature of the proposed inference is that the logistic normal distribution is used to approximate the true posterior. This is counterintuitive, because we obtain the Dirichlet distribution by taking the functional derivative when we lower bound the log evidence of LDA after applying a mean field approximation. However, our experiment showed that the proposed inference gave a better predictive performance in terms of test set perplexity than the inference using the Dirichlet distribution for posterior approximation. While the logistic normal is more complicated than the Dirichlet, SGVB makes the manipulation of the expectations with respect to the posterior relatively easy. The proposed inference was better even than the collapsed Gibbs sampling [6] for not all but many settings consulted in our experiment. It must be worthwhile future work to devise a new inference based on SGVB also for other Bayesian models.

      DOI: 10.1007/978-3-319-42089-9_17

      researchmap

    • A simple stochastic gradient variational bayes for the correlated topic model Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)9932   424 - 428   2016

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Verlag  

      This paper proposes a new inference for the correlated topic model (CTM) [3]. CTM is an extension of LDA [4] for modeling correlations among latent topics. The proposed inference is an instance of the stochastic gradient variational Bayes (SGVB) [7,8]. By constructing the inference network with the diagonal logistic normal distribution, we achieve a simple inference. Especially, there is no need to invert the covariance matrix explicitly. We performed a comparison with LDA in terms of predictive perplexity. The two inferences for LDA are considered: the collapsed Gibbs sampling (CGS) [5] and the collapsed variational Bayes with a zero-order Taylor expansion approximation (CVB0) [1]. While CVB0 for LDA gave the best result, the proposed inference achieved the perplexities comparable with those of CGS for LDA.

      DOI: 10.1007/978-3-319-45817-5_39

      Scopus

      researchmap

    • Heuristic Pretraining for Topic Models Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE9101   351 - 360   2015

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      This paper provides a heuristic pretraining for topic models. While we consider latent Dirichlet allocation (LDA) here, our pretraining can be applied to other topic models. Basically, we use collapsed Gibbs sampling (CGS) to update the latent variables. However, after every iteration of CGS, we regard the latent variables as observable and construct another LDA over them, which we call LDA over LDA (LoL). We then perform the following two types of updates: the update of the latent variables in LoL by CGS and the update of the latent variables in LDA based on the result of the preceding update of the latent variables in LoL. We perform one iteration of CGS for LDA and the above two types of updates alternately only for a small, earlier part of the inference. That is, the proposed method is used as a pretraining. The pretraining stage is followed by the usual iterations of CGS for LDA. The evaluation experiment shows that our pretraining can improve test set perplexity.

      DOI: 10.1007/978-3-319-19066-2_34

      researchmap

    • Traffic Speed Data Investigation with Hierarchical Modeling Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      FUTURE DATA AND SECURITY ENGINEERING, FDSE 20159446   123 - 134   2015

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER INT PUBLISHING AG  

      This paper presents a novel topic model for traffic speed analysis in the urban environment. Our topic model is special in that the parameters for encoding the following two domain-specific aspects of traffic speeds are introduced. First, traffic speeds are measured by the sensors each having a fixed location. Therefore, it is likely that similar measurements will be given by the sensors located close to each other. Second, traffic speeds show a 24-hour periodicity. Therefore, it is likely that similar measurements will be given at the same time point on different days. We model these two aspects with Gaussian process priors and make topic probabilities location-and time-dependent. In this manner, our model utilizes the metadata of the traffic speed data. We offer a slice sampling to achieve less approximation than variational Bayesian inferences. We present an experimental result where we use the traffic speed data provided by New York City.

      DOI: 10.1007/978-3-319-26135-5_10

      researchmap

    • Exploring Technical Phrase Frames from Research Paper Titles Peer-reviewed

      Yuzana Win, Tomonari Masada

      2015 IEEE 29TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS WAINA 2015   558 - 563   2015

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      This paper proposes a method for exploring technical phrase frames by extracting word n-grams that match our information needs and interests from research paper titles. Technical phrase frames, the outcome of our method, are phrases with wildcards that may be substituted for any technical term. Our method, first of all, extracts word trigrams from research paper titles and constructs a co-occurrence graph of the trigrams. Even by simply applying PageRank algorithm to the co-occurrence graph, we obtain the trigrams that can be regarded as technical keyphrases at the higher ranks in terms of PageRank score. In contrast, our method assigns weights to the edges of the co-occurrence graph based on Jaccard similarity between trigrams and then apply weighted PageRank algorithm. Consequently, we obtain widely different but more interesting results. While the top-ranked trigrams obtained by unweighted PageRank have just a self-contained meaning, those obtained by our method are technical phrase frames, i.e., a word sequence that forms a complete technical phrase only after putting a technical word (or words) before or/and after it. We claim that our method is a useful tool for discovering important phraseological patterns, which can expand query keywords for improving information retrieval performance and can also work as candidate phrasings in technical writing to make our research papers attractive.

      DOI: 10.1109/WAINA.2015.37

      researchmap

    • ChronoSAGE: Diversifying Topic Modeling Chronologically Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      WEB-AGE INFORMATION MANAGEMENT, WAIM 20148485   476 - 479   2014

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      This paper provides an application of sparse additive generative models (SAGE) for temporal topic analysis. In our model, called ChronoSAGE, topic modeling results are diversified chronologically by using document timestamps. That is, word tokens are generated not only in a topic-specific manner, but also in a time-specific manner. We firstly compare ChronoSAGE with latent Dirichlet allocation (LDA) in terms of pointwise mutual information to show its practical effectiveness. We secondly give an example of time-differentiated topics, obtained by ChronoSAGE as word lists, to show its usefulness in trend detection.

      DOI: 10.1007/978-3-319-08010-9_51

      researchmap

    • A topic model for traffic speed data analysis Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8482 ( 2 ) 68 - 77   2014

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Verlag  

      We propose a probabilistic model for traffic speed data. Our model inherits two key features from latent Dirichlet allocation (LDA). Firstly, unlike e.g. stock market data, lack of data is often perceived for traffic speed data due to unexpected failure of sensors or networks. Therefore, we regard speed data not as a time series, but as an unordered multiset in the same way as LDA regards documents not as a sequence, but as a bag of words. This also enables us to analyze co-occurrence patterns of speed data regardless of their positions along the time axis. Secondly, we regard a daily set of speed data gathered from the same sensor as a document and model it not with a single distribution, but with a mixture of distributions as in LDA. While each such distribution is called topic in LDA, we call it patch to remove text-mining connotation and name our model Patchy. This approach enables us to model speed co-occurrence patterns effectively. However, speed data are non-negative real. Therefore, we use Gamma distributions in place of multinomial distributions. Due to these two features, Patchy can reveal context dependency of traffic speed data. For example, a 60 mph observed on Sunday can be assigned to a patch different from that to which a 60 mph on Wednesday is assigned. We evaluate this context dependency through a binary classification task, where test data are classified as either weekday data or not. We use real traffic speed data provided by New York City and compare Patchy with the baseline method, where a simpler data model is applied. © 2014 Springer International Publishing Switzerland.

      DOI: 10.1007/978-3-319-07467-2_8

      Scopus

      researchmap

    • Explaining Prices by Linking Data: A Pilot Study on Spatial Regression Analysis of Apartment Rents Peer-reviewed

      Bin Shen, Tomonari Masada

      2014 IEEE 3RD GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE)   188 - 189   2014

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      This paper reports a pilot study where we link different types of data for explaining prices. In this study, we link the apartment rent data with the publicly accessible location data of landmarks like supermarkets, hospitals, etc. We apply the regression analysis to find the most important factor determining the apartment rents. We claim that the results of this type of spatial data mining can enhance the user experience in the apartment search system, because we can indicate a rationale behind pricing as additional information to users and thus can make them more confident in their choices.

      DOI: 10.1109/GCCE.2014.7031088

      researchmap

    • Collaborator Recommendation for Isolated Researchers Peer-reviewed

      Tin Huynh, Atsuhiro Takasu, Tomonari Masada, Kiem Hoang

      2014 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA)   639 - 644   2014

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      Successful research collaborations may facilitate major outcomes in science and their applications. Thus, identifying effective collaborators may be a key factor that affects success. However, it is very difficult to identify potential collaborators and it is particularly difficult for young researchers who have less knowledge about other researchers and experts in their research domain. This study introduces and defines the problem of collaborator recommendation for 'isolated' researchers who have no links with others in coauthor networks. Existing approaches such as link-based and content-based methods may not be suitable for isolated researchers because of their lack of links and content information. Thus, we propose a new approach that uses additional information as new features to make recommendations, i.e., the strength of the relationship between organizations, the importance rating, and the activity scores of researchers. We also propose a new method for evaluating the quality of collaborator recommendations. We performed experiments by crawling publications from the Microsoft Academic Search website. The metadata were extracted from these publications, including the year, authors, organizational affiliations of authors, citations, and references. The metadata from publications between 2001 and 2005 were used as the training data while those from 2006 to 2011 were used for validation. The experimental results demonstrated the effectiveness and efficiency of our proposed approach.

      DOI: 10.1109/WAINA.2014.105

      researchmap

    • Trimming prototypes of handwritten digit images with subset infinite relational model Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      Lecture Notes in Electrical Engineering240   129 - 134   2013

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

      We propose a new probabilistic model for constructing efficient prototypes of handwritten digit images. We assume that all digit images are of the same size and obtain one color histogram for each pixel by counting the number of occurrences of each color over multiple images. For example, when we conduct the counting over the images of digit "5", we obtain a set of histograms as a prototype of digit "5". After normalizing each histogram to a probability distribution, we can classify an unknown digit image by multiplying probabilities of the colors appearing at each pixel of the unknown image. We regard this method as the baseline and compare it with a method using our probabilistic model called Multinomialized Subset Infinite Relational Model (MSIRM), which gives a prototype, where color histograms are clustered column- and row-wise. The number of clusters is adjusted flexibly with Chinese restaurant process. Further, MSIRM can detect irrelevant columns and rows. An experiment, comparing our method with the baseline and also with a method using Dirichlet process mixture, revealed that MSIRM could neatly detect irrelevant columns and rows at peripheral part of digit images. That is, MSIRM could "trim" irrelevant part. By utilizing this trimming, we could speed up classification of unknown images. © 2013 Springer Science+Business Media Dordrecht(Outside the USA).

      DOI: 10.1007/978-94-007-6738-6_16

      Scopus

      researchmap

    • A revised inference for correlated topic model Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)7952 ( 2 ) 445 - 454   2013

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

      In this paper, we provide a revised inference for correlated topic model (CTM) [3]. CTM is proposed by Blei et al. for modeling correlations among latent topics more expressively than latent Dirichlet allocation (LDA) [2] and has been attracting attention of researchers. However, we have found that the variational inference of the original paper is unstable due to almost-singularity of the covariance matrix when the number of topics is large. This means that we may be reluctant to use CTM for analyzing a large document set, which may cover a rich diversity of topics. Therefore, we revise the inference and improve its quality. First, we modify the formula for updating the covariance matrix in a manner that enables us to recover the original inference by adjusting a parameter. Second, we regularize posterior parameters for reducing a side effect caused by the formula modification. While our method is based on a heuristic intuition, an experiment conducted on large document sets showed that it worked effectively in terms of perplexity. © 2013 Springer-Verlag Berlin Heidelberg.

      DOI: 10.1007/978-3-642-39068-5-54

      Scopus

      researchmap

    • Three-way nonparametric Bayesian clustering for handwritten digit image classification Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8228 ( 3 ) 149 - 156   2013

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

      This paper proposes a new approach for handwritten digit image classification using a nonparametric Bayesian probabilistic model, called multinomialized subset infinite relational model (MSIRM). MSIRM realizes a three-way clustering, i.e., a simultaneous clustering of digit images, pixel columns, and pixel rows, where the numbers of clusters are adjusted automatically with Chinese restaurant process (CRP). We obtain MSIRM as a modification of subset infinite relational model (SIRM) by Ishiguro et al. [4] While this modification is straightforward, our application of MSIRM to handwritten digit image classification leads to an impressive result. To represent a large number of training digit images in a compact form, we cluster the training images and then classify a test image to the class of the cluster most similar to the test image. By extending this line of thought, MSIRM clusters not only digit images but also pixel columns and pixel rows to obtain a more compact representation. With this three-way clustering, we achieved 2.95% and 5.38% test error rates for MNIST and USPS datasets, respectively. © Springer-Verlag 2013.

      DOI: 10.1007/978-3-642-42051-1_20

      Scopus

      researchmap

    • Clustering Documents with Maximal Substrings Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

      ENTERPRISE INFORMATION SYSTEMS, ICEIS 2011102   19 - 34   2012

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      This paper provides experimental results showing that we can use maximal substrings as elementary building blocks of documents in place of the words extracted by a current state-of-the-art supervised word extraction. Maximal substrings are defined as the substrings each giving a smaller number of occurrences even by appending only one character to its head or tail. The main feature of maximal substrings is that they can be extracted quite efficiently in an unsupervised manner. We extract maximal substrings from a document set and represent each document as a bag of maximal substrings. We also obtain a bag of words representation by using a state-of-the-art supervised word extraction over the same document set. We then apply the same document clustering method to both representations and obtain two clustering results for a comparison of their quality. We adopt a Bayesian document clustering based on Dirichlet compound multinomials for avoiding overfitting. Our experiment shows that the clustering quality achieved with maximal substrings is acceptable enough to use them in place of the words extracted by a supervised word extraction.

      DOI: 10.1007/978-3-642-29958-2_2

      researchmap

    • Extraction of topic evolutions from references in scientific articles and its GPU acceleration Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu

      ACM International Conference Proceeding Series   1522 - 1526   2012

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

      This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics. Recent trends in text mining point to a high demand for exploiting metadata. Especially, exploitation of reference relationships among documents induced by hyperlinking Web pages, citing scientific articles, tumblring blog posts, retweeting tweets, etc., is put in the foreground of the effort for an effective mining. We focus on scholarly activities and propose a topic model for obtaining a corpus-wide view on how research topics evolve along citation relationships. Our model, called TERESA, extends latent Dirichlet allocation (LDA) by introducing a corpus-wide topic transition probability matrix, which models reference relationships as transitions among topics. Our approximated variational inference updates LDA posteriors and topic transition posteriors alternately. The main issue is execution time amounting to O(MK2), where K is the number of topics and M is that of links in citation network. Therefore, we accelerate the inference with Nvidia CUDA compatible GPUs. We compare the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F). We also present topic evolution examples our method gives. © 2012 ACM.

      DOI: 10.1145/2396761.2398465

      Scopus

      researchmap

      Other Link: http://dblp.uni-trier.de/db/conf/cikm/cikm2012.html#conf/cikm/MasadaT12

    • Unsupervised segmentation of bibliographic elements with latent permutations Peer-reviewed

      Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

      Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)6724 LNCS   254 - 267   2011

      More details

      Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

      This paper introduces a novel approach for large-scale unsupervised segmentation of bibliographic elements. Our problem is to segment a word token sequence representing a citation into subsequences each corresponding to a different bibliographic element, e.g. authors, paper title, journal name, publication year, etc. Obviously, each bibliographic element should be represented by contiguous word tokens. We call this constraint contiguity constraint. Therefore, we should infer a sequence of assignments of word tokens to bibliographic elements so that this constraint is satisfied. Many HMM-based methods solve this problem by prescribing fixed transition patterns among bibliographic elements. In this paper, we use generalized Mallows models (GMM) in a Bayesian multi-topic model, effectively applied to document structure learning by Chen et al. [4], and infer a permutation of latent topics each of which can be interpreted as one among the bibliographic elements. According to the inferred permutation, we arrange the order of the draws from a multinomial distribution defined over topics. In this manner, we can obtain an ordered sequence of topic assignments satisfying contiguity constraint. We do not need to prescribe any transition patterns among bibliographic elements. We only need to specify the number of bibliographic elements. However, the method proposed by Chen et al. works for our problem only after introducing modification. The main contribution of this paper is to propose strategies to make their method work also for our problem. © 2011 Springer-Verlag.

      DOI: 10.1007/978-3-642-24396-7_20

      Scopus

      researchmap

    • Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

      ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I6634   435 - 447   2011

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      This paper provides a new approach to topical trend analysis. Our aim is to improve the generalization power of latent Dirichlet allocation (LDA) by using document timestamps. Many previous works model topical trends by making latent topic distributions time-dependent. We propose a straightforward approach by preparing a different word multinomial distribution for each time point. Since this approach increases the number of parameters, overfitting becomes a critical issue. Our contribution to this issue is two-fold. First, we propose an effective way of defining Dirichlet priors over the word multinomials. Second, we propose a special scheduling of variational Bayesian (VB) inference. Comprehensive experiments with six datasets prove that our approach can improve LDA and also Topics over Time, a well-known variant of LDA, in terms of test data perplexity in the framework of VB inference.

      DOI: 10.1007/978-3-642-20841-6-36

      DOI: 10.1007/978-3-642-20841-6_36

      researchmap

    • DOCUMENTS AS A BAG OF MAXIMAL SUBSTRINGS An Unsupervised Feature Extraction for Document Clustering Peer-reviewed

      Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

      ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1   5 - 13   2011

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:INSTICC-INST SYST TECHNOLOGIES INFORMATION CONTROL & COMMUNICATION  

      This paper provides experimental results showing how we can use maximal substrings as elementary features in document clustering. We extract maximal substrings, i.e., the substrings each giving a smaller number of occurrences even after adding only one character at its head or tail, from the given document set and represent each document as a bag of maximal substrings after reducing the variety of maximal substrings by a simple frequency-based selection. This extraction can be done in an unsupervised manner. Our experiment aims to compare bag of maximal substrings representation with bag of words representation in document clustering. For clustering documents, we utilize Dirichlet compound multinomials, a Bayesian version of multinomial mixtures, and measure the results by F-score. Our experiment showed that maximal substrings were as effective as words extracted by a dictionary-based morphological analysis for Korean documents. For Chinese documents, maximal substrings were not so effective as words extracted by a supervised segmentation based on conditional random fields. However, one fourth of the clustering results given by bag of maximal substrings representation achieved F-scores better than the mean F-score given by bag of words representation. It can be said that the use of maximal substrings achieved an acceptable performance in document clustering.

      researchmap

    • Semi-supervised Bibliographic Element Segmentation with Latent Permutations Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

      DIGITAL LIBRARIES: FOR CULTURAL HERITAGE, KNOWLEDGE DISSEMINATION, AND FUTURE CREATION7008   60 - +   2011

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      This paper proposes a semi-supervised bibliographic element segmentation. Our input data is a large scale set of bibliographic references each given as an unsegmented sequence of word tokens. Our problem is to segment each reference into bibliographic elements, e.g. authors, title, journal, pages, etc. We solve this problem with an LDA-like topic model by assigning each word token to a topic so that the word tokens assigned to the same topic refer to the same bibliographic element. Topic assignments should satisfy contiguity constraint, i.e., the constraint that the word tokens assigned to the same topic should be contiguous. Therefore, we proposed a topic model in our preceding work [8] based on the topic model devised by Chen et al. [3]. Our model extends LDA and realizes unsupervised topic assignments satisfying contiguity constraint. The main contribution of this paper is the proposal of a semi-supervised learning for our proposed model. We assume that at most one third of word tokens are already labeled. In addition, we assume that a few percent of the labels may be incorrect. The experiment showed that our semi-supervised learning improved the unsupervised learning by a large margin and achieved an over 90% segmentation accuracy.

      DOI: 10.1007/978-3-642-24826-9_11

      researchmap

    • Implementation of a programming environment with a multithread model for reconfigurable systems Peer-reviewed

      Keisuke Dohi, Yuichiro Shibata, Tsuyoshi Hamada, Tomonari Masada, Kiyoshi Oguri, Duncan A. Buell

      ACM SIGARCH Computer Architecture News38 ( 4 ) 40 - 45   14 9 2010

      More details

      Publishing type:Research paper (scientific journal)   Publisher:Association for Computing Machinery (ACM)  

      DOI: 10.1145/1926367.1926375

      researchmap

    • Infinite Latent Process Decomposition Peer-reviewed

      Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

      2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW)   810 - 811   2010

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE COMPUTER SOC  

      This paper presents infinite latent process decomposition (iLPD), a new microarray analysis method, as an extension of latent process decomposition in Our method assumes an infinite number of latent processes. Further, our new collapsed variational Bayesian inference improves the inference proposed in [2] in the treatment of Dirichlet hyperparameters. We also give the results of the comparison experiment.

      researchmap

    • Modeling Topical Trends over Continuous Time with Priors Peer-reviewed

      Tomonari Masada, Daiji Fukagawa, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

      ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 2, PROCEEDINGS6064   302 - +   2010

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      In this paper, we propose a new method for topical trend analysis. We model topical trends by per-topic Beta distributions as in Topics over Time (TOT), proposed as an extension of latent Dirichlet allocation (LDA). However, TOT is likely to overfit to timestamp data in extracting latent topics. Therefore; we apply prior distributions to Beta distributions in TOT. Since Beta distribution has no conjugate prior; we devise a trick, where we set one among the two parameters of each per-topic Beta distribution to one based on a Bernoulli trial and apply Gamma distribution as a conjugate prior. Consequently; we can marginalize out the parameters of Beta distributions and thus treat; timestamp data in a Bayesian fashion. In the evaluation experiment, we compare our method with LDA and TOT in link detection task on TDT4 dataset. We use word predictive probabilities as term weights and estimate document similarities by using those weights in a TFIDF-like scheme. The results show that our method achieves a moderate fitting to timestamp data.

      DOI: 10.1007/978-3-642-13318-3_38

      researchmap

    • A novel multiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs - Towards cost effective, high performance N-body simulation Peer-reviewed

      Tsuyoshi Hamada, Keigo Nitadori, Khaled Benkrid, Yousuke Ohno, Gentaro Morimoto, Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri, Makoto Taiji

      Computer Science - Research and Development24 ( 1-2 ) 21 - 31   9 2009

      More details

      Publishing type:Research paper (scientific journal)  

      Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple O(N2)algorithm was used for GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were based on more optimized O(N log N) algorithms such as the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC clusters for N-bodysimulations. In this paper, we report a new method for efficient parallel implementation of the tree algorithm on GPUs. Our novel tree code allows the realization of an N-bodysimulation on a GPU cluster at a much higher performance than that on general PC clusters. We practically performed a cosmological simulation with 562 million particles on a GPU cluster using 128 NVIDIA GeForce 8800GTS GPUs at an overall cost of 168172 $. We obtained a sustained performance of 20.1 Tflops, which when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved cost/performance was hence a mere $19.8 /Gflops which shows the high competitiveness of GPGPUs. © 2009 Springer-Verlag.

      DOI: 10.1007/s00450-009-0089-1

      Scopus

      researchmap

    • Accelerating the Phase Only Correlation method using GPUs Peer-reviewed

      MATSUO Kentaro, MIYOSHI Masayuki, HAMADA Tsuyoshi, SHIBATA Yuichiro, MASADA Tomonari, OGURI Kiyoshi

      ITE Technical Report33 ( 0 ) 201 - 206   2009

      More details

      Language:Japanese   Publisher:The Institute of Image Information and Television Engineers  

      The Phase Only Correlation (POC) method demonstrates high robustness and subpixel accuracy in the pattern matching and the image registration. However, there is a disadvantage in computational speed because of the calculation of 2D-FFT etc. We have proposed a novel approach to accelerate POC method using GPU to solve the calculation cost problem. Using our GPU-based POC implementation, each POC calculation can be done within 2.36 seconds for 256×256 pixels, within 7.92 seconds for 512×512 pixels, and 27.65 seconds for 1024×1024 pixels.

      DOI: 10.11485/itetr.33.6.0_201

      CiNii Article

      researchmap

      Other Link: http://hdl.handle.net/10069/22664

    • Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

      ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS5446   556 - +   2009

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      In this paper, we propose a new probabilistic model, Bay of Timestamps (BoT), for chronological text mining. BoT is an extension of latent Dirichlet allocation (LDA), and has two remarkable features when compared with a previously proposed Topics over Time (ToT), which is also an extension of LDA. First, we can avoid overfitting to temporal data, because temporal data are modeled in a Bayesian manner similar to word frequencies. Second, BoT has a conditional probability where no functions requiring time-consuming computations appear. The experiments using newswire documents show that BoT achieves more moderate fitting to temporal data in shorter execution time than ToT.

      DOI: 10.1007/978-3-642-00672-2-51

      DOI: 10.1007/978-3-642-00672-2_51

      researchmap

    • Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices Peer-reviewed

      Tomonari Masada, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

      NEXT-GENERATION APPLIED INTELLIGENCE, PROCEEDINGS5579   491 - 500   2009

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LIDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we accelerate CVB inference, an efficient deterministic inference method for LIDA, with Nvidia CUDA. In the evaluation experiments, we used a set of 50,000 documents and a set of 10,000 images. We could obtain inference results comparable to sequential CVB inference.

      DOI: 10.1007/978-3-642-02568-6_50

      researchmap

    • Dynamic hyperparameter optimization for bayesian topical trend analysis Peer-reviewed

      Tomonari Masada, Daiji Fukagawa, Atsuhiro Takasu, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

      International Conference on Information and Knowledge Management, Proceedings   1831 - 1834   2009

      More details

      Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

      This paper presents a new Bayesian topical trend analysis. We regard the parameters of topic Dirichlet priors in latent Dirichlet allocation as a function of document timestamps and optimize the parameters by a gradient-based algorithm. Since our method gives similar hyperparameters to the documents having similar timestamps, topic assignment in collapsed Gibbs sampling is affected by timestamp similarities. We compute TFIDF-based document similarities by using a result of collapsed Gibbs sampling and evaluate our proposal by link detection task of Topic Detection and Tracking. Copyright 2009 ACM.

      DOI: 10.1145/1645953.1646242

      Scopus

      researchmap

      Other Link: http://dblp.uni-trier.de/db/conf/cikm/cikm2009.html#conf/cikm/MasadaFTHSO09

    • Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation Peer-reviewed

      Tomonari Masada, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

      ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS5678   253 - 264   2009

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      This paper provides a new method for multi-topic Bayesian analysis for microarray data. Our method achieves a further maximization of lower bounds in a marginalized variational Bayesian inference (MVB) for Latent Process Decomposition (LPD), which is an effective probabilistic model for microarray data. In our method, hyperparameters in LPD are updated by empirical Bayes point estimation. The experiments based on microarray data of realistically large size show efficiency of our hyperparameter reestimation technique.

      DOI: 10.1007/978-3-642-03348-3_26

      researchmap

    • GPU acceleration of multiple topic extraction from images by LDA document model Peer-reviewed

      MASADA Tomonari, HAMADA Tsuyoshi, SHIBATA Yuichiro, OGURI Kiyoshi

      ITE Technical Report32 ( 0 ) 1 - 6   2008

      More details

      Language:Japanese   Publisher:The Institute of Image Information and Television Engineers  

      In this paper, we propose a GPU acceleration of multi-topic extraction from images by using LDA (latent Dirichlet allocation). LDA is originally proposed as a probabilistic model for documents by Blei et al. In recent days, LDA is applied to multimedia information other than documents. We provide the results of experiments where we apply LDA to Professor Wang's 10,000 test images and extract multiple visional topics. We adpot collapsed variational Bayesian inference method for LDA and accelerate this by using Nvidia CUDA compatible GPU devices.

      DOI: 10.11485/itetr.32.54.0_1

      CiNii Article

      researchmap

    • A Sub-Petaflops High Performance Computing System using GPUs Peer-reviewed

      Hamada Tsuyoshi, Masada Tomonari, Shibata Yuichiro, Oguri Kiyoshi

      ITE Technical Report32 ( 0 ) 17 - 19   2008

      More details

      Language:Japanese   Publisher:The Institute of Image Information and Television Engineers  

      DOI: 10.11485/itetr.32.54.0_17

      CiNii Article

      researchmap

    • Unmixed spectrum clustering for template composition in lung sound classification Peer-reviewed

      Tomonari Masada, Senya Kiyasu, Sueharu Miyahara

      ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS5012   964 - 969   2008

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      In this paper, we propose a method for composing templates of lung sound classification. First, we obtain a sequence of power spectra by FFT for each given lung sound and compute a small number of component spectra by ICA for each of the overlapping sets of tens of consecutive power spectra. Second, we put component spectra obtained from various lung sounds into a single set and conduct clustering a large number of times. When component spectra belong to the same cluster in all clustering results, these spectra show robust similarity. Therefore, we can use such spectra to compose a template of lung sound classification.

      DOI: 10.1007/978-3-540-68125-0_100

      researchmap

    • Comparing LDA with pLSI as a dimensionality reduction method in document clustering Peer-reviewed

      Tomonari Masada, Senya Kiyasu, Sueharu Miyahara

      LARGE-SCALE KNOWLEDGE RESOURCES4938   13 - 26   2008

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER-VERLAG BERLIN  

      In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.

      DOI: 10.1007/978-3-540-78159-2_2

      researchmap

    • P2P情報検索における単語の重みに基づいたデータ分散配置手法(共著)

      倉沢央, 若木宏美, 正田備也, 高須淳宏, 安達淳

      情報処理学会 マルチメディア、分散、協調とモバイルシンポジウム(DICOMO2007)   7 2007

      More details

      Language:Japanese   Publishing type:Research paper (scientific journal)  

      researchmap

    • Accuracy of Document Classification with Dirichlet Mixtures Peer-reviewed

      MASADA TOMONARI, TAKASU ATSUHIRO, ADACHI JUN

        48 ( SIG11(TOD34) ) 14 - 26   15 6 2007

      More details

      Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:Information Processing Society of Japan (IPSJ)  

      The naive Bayes classifier is a well-known method for document classification. However, the naive Bayes classifier gives a satisfying classification accuracy only after an appropriate tuning of the smoothing parameter. Moreover, we should find appropriate parameter values separately for different document sets. In this paper, we focus on an effective probabilistic framework for document classification, called Dirichlet mixtures, which requires no parameter tuning and provides satisfying classification accuracies with respect to various document sets. Many researches in the field of image processing and of natural language processing utilize Dirichlet mixtures. Especially, in the field of natural language processing, many experiments are conducted by using real document data sets. However, most researches use the perplexity as an evaluation measure. While the perplexity is a purely theoretical measure, the accuracy is popular for document classification in the field of information retrieval or of text mining. The accuracy is computed by comparing correct labels with predictions made by the classifier. In this paper, we conduct an evaluation experiment by using 20 newsgroups data set and the Korean Web newspaper articles under the intention that we will use Dirichlet mixtures for multilingual applications. In the experiment, we compare the naive Bayes classifier with the classifier based on Dirichlet mixtures and clarify their qualitative and quantitative differences.

      CiNii Article

      researchmap

      Other Link: http://hdl.handle.net/10069/16317

    • P2P情報検索における索引とファイルの分散配置手法

      倉沢央, 正田備也, 高須淳宏, 安達淳

      情報処理学会研究報告2007 ( 36 ) 147 - 154   5 4 2007

      More details

      Language:Japanese   Publishing type:Research paper (scientific journal)  

      researchmap

    • トピック指向単語クラスタリングを用いた複数トピックの包括的提示による検索支援

      若木裕美, 正田備也, 高須淳宏, 安達淳

      電子情報通信学会第18回データ工学ワークショップ (DEWS 2007)   3 2007

      More details

      Language:Japanese   Publishing type:Research paper (scientific journal)  

      researchmap

    • Detection of abnormal lung sounds through investigation of breathing cycle Peer-reviewed

      Senya Kiyasu, Kohsuke Yanagihara, Tomonari Masada, Sueharu Miyahara, Mikio Oka

      Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers61 ( 12 ) 1769 - 1773   2007

      More details

      Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:Inst. of Image Information and Television Engineers  

      The purpose of our research is to develop a method for recognizing abnormal lung sounds without the need of a medical specialist. Listening to the sounds of the human body is one of the most important methods of checking someone's health. However, identification of abnormal lung sounds is difficult for an untrained person. We differentiated true abnormal sounds from interfering noise by investigating the fact that lung sounds are generated periodically in relation to the breathing cycle.

      DOI: 10.3169/itej.61.1769

      Scopus

      researchmap

    • Using a Knowledge Base to Disambiguate Personal Name in Web Search Results Peer-reviewed

      Quang Minh Vu, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

      APPLIED COMPUTING 2007, VOL 1 AND 2   839 - +   2007

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ASSOC COMPUTING MACHINERY  

      Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document similarities and to find documents related to the same person. Some previous researchers have used the vector space model or have tried to extract common named entities for measuring similarities. We propose a new method that uses Web directories as a knowledge base to find shared contexts in document pairs and uses the measurement of shared contexts to determine similarities between document pairs. Experimental results show that our proposed method outperforms the vector space model method and the named entity recognition method.

      DOI: 10.1145/1244002.1244188

      researchmap

      Other Link: http://dblp.uni-trier.de/db/conf/sac/sac2007.html#conf/sac/VuMTA07

    • Disambiguation of people in web search using a knowledge base Peer-reviewed

      Quang Minh Vu, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

      2007 IEEE International Conference on Research, Innovation and Vision for the Future, RIVF 2007   185 - 191   2007

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

      Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document similarities and to find documents related to the same person. Some previous researchers have used the vector space model or have tried to extract common named entities for measuring similarities. We propose a new method that uses Web directories as a knowledge base to find shared contexts in document pairs and uses the measurement of shared contexts to determine similarities between document pairs. Experimental results show that our proposed method outperforms the vector space model method and the named entity recognition method. © 2007 IEEE.

      DOI: 10.1109/RIVF.2007.369155

      Scopus

      researchmap

    • Query Refinement based on Topical Term Clustering. Peer-reviewed

      Hiromi Wakaki, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

      Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 2007, 8th International Conference, Carnegie Mellon University, Pittsburgh, PA, USA, May 30 - June 1, 2007. Proceedings, CD-ROM   2007

      More details

      Publisher:CID  

      researchmap

    • Citation data clustering for author name disambiguation. Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu, Jun Adachi

      Proceedings of the 2nf International Conference on Scalable Information Systems, Infoscale 2007, Suzhou, China, June 6-8, 2007   62   2007

      More details

    • Using web directories for similarity measurement in personal name disambiguation Peer-reviewed

      Quang Minh Vu, Atsuhiro Takasu, Tomonari Masada, Jun Adachi

      Proceedings - 21st International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINAW'072   379 - 384   2007

      More details

      Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE Computer Society  

      In this paper, we target on the problem of personal name disambiguation in search results returned by personal name queries. Usually, a personal name refers to several people. Therefore, when a search engine returns a set of documents containing that name, they are often relevant to several individuals with the same namesake. Automatic differentiation of people in the resulting documents may help users to search for the person of interest easier. We propose a method that uses web directories to improve the similarity measurement in personal name disambiguation. We carried out experiments on real web documents in which we compared our method with the vector space model method and the named entity recognition method. The results show that our method has advantages over these previous methods. © 2007 IEEE.

      DOI: 10.1109/AINAW.2007.367

      Scopus

      researchmap

    • 具体性指向単語クラスタリングによる網羅的トピックの発見と検索 質問拡張支援

      若木裕美, 正田備也, 高須淳宏, 安達淳

      電子情報通信学会第17回データ工学ワークショップ (DEWS 2006), 2C-i4   3 2006

      More details

      Language:Japanese   Publishing type:Research paper (scientific journal)  

      researchmap

    • A new measure for query disambiguation using term co-occurrences Peer-reviewed

      Hiromi Wakaki, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

      INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS4224   904 - 911   2006

      More details

      Language:English   Publishing type:Research paper (scientific journal)   Publisher:SPRINGER-VERLAG BERLIN  

      This paper explores techniques that discover terms to replace given query terms from a selected subset of documents. The Internet allows access to large numbers of documents archived in digital format. However, no user can be an expert in every field, and they trouble finding the documents that suit their purposes experts when they cannot formulate queries that narrow the search to the context they have in mind. Accordingly, we propose a method for extracting terms from searched documents to replace user-provided query terms. Our results show that our method is successful in discovering terms that can be used to narrow the search.

      DOI: 10.1007/11875581_108

      researchmap

    • Link-Based Clustering for Finding Subrelevant Web Pages Peer-reviewed

      Tomonari Masada, Atsuhiro Takasu, Jun Adach i

      Proc. International Workshop on Web Document Analysis, 2005 (WDA2005)   9 2005

      More details

      Language:English  

      researchmap

    • 検索語の曖昧性を解消するキーワードの提示手法

      若木裕美, 正田備也, 高須淳宏, 安達淳

      情報処理学会研究報告「データベースシステム」137 ( 137 ) 269 - 276   7 2005

      More details

      Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

      CiNii Article

      researchmap

    • 共著関係に基づくグラフを用いた書誌 情報における著者同定手法の提案と評価

      鈴木康平, 正田備也, 高須淳宏, 安達淳

      情報処理学会研究報告「データベースシ ステム」(夏のデータベースワークショップDBWS2005), 2005. ( 137 )   7 2005

      More details

      Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

      researchmap

    • Improving Web Search by Query Expansion with a Small Number of Terms. Peer-reviewed

      Tomonari Masada, Teruhito Kanazawa, Atsuhiro Takasu, Jun Adachi

      Proceedings of the Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-5, National Center of Sciences, Tokyo, Japan, December 6-9, 2005   2005

      More details

      Publisher:National Institute of Informatics (NII)  

      researchmap

      Other Link: http://dblp.uni-trier.de/db/conf/ntcir/ntcir2005.html#conf/ntcir/MasadaKTA05

    • Decomposing the Web graph into parameterized connected components Peer-reviewed

      T Masada, A Takasu, J Adachi

      IEICE TRANSACTIONS ON INFORMATION AND SYSTEMSE87D ( 2 ) 380 - 388   2 2004

      More details

      Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

      We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the World Wide Web, page grouping is expected to provide a general grasp of the Web for effective information search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our grouping method is a generalization of decomposition into strongly connected components, in which each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by adjusting a parameter, called the threshold parameter. We call the resulting groups parameterized connected components (PCCs). The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our grouping method. This paper also includes experimental results for 15 million Web pages, which show the contribution of our method to efficient Web surfer navigation.

      researchmap

      Other Link: http://dblp.uni-trier.de/db/journals/ieicet/ieicet87d.html#journals/ieicet/MasadaTA04

    • R2D2 at NTCIR-4 Web Retrieval Task. Peer-reviewed

      Teruhito Kanazawa, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

      Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization, NTCIR-4, National Center of Sciences, Tokyo, Japan, June 2-4, 2004   2004

      More details

      Publisher:National Institute of Informatics (NII)  

      researchmap

      Other Link: http://dblp.uni-trier.de/db/conf/ntcir/ntcir2004.html#conf/ntcir/KanazawaMTA04

    • Web page grouping based on parameterized connectivity Peer-reviewed

      T Masada, A Takasu, J Adachi

      DATABASE SYSTEMS FOR ADVANCED APPLICATIONS2973   374 - 380   2004

      More details

      Language:English   Publishing type:Research paper (scientific journal)   Publisher:SPRINGER-VERLAG BERLIN  

      We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the Web, page grouping is expected to provide a general grasp of the Web for effective Web search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our method is a generalization of the decomposition into strongly connected components. Each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by a parameter, called the threshold parameter. We call the resulting groups parameterized connected components. The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our method.

      DOI: 10.1007/978-3-540-24571-1_34

      researchmap

    • パラメータ化された連結成分分解を用いたWeb情報の有効利用

      正田備也, 高須淳宏, 安達淳

      情報処理学会研究報告「データベースシステム」 (夏のデータベースワークショップDBWS2003), 2003. ( 131(71 )   22 7 2003

      More details

      Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

      researchmap

    • パラメータ化された連結成分分解によるWebページのグループ化

      正田備也, 高須淳宏, 安達淳

      情報処理学会データベースシステム研究会、情処研報2002 ( 67, DB ) 297 - 304   7 2002

      More details

      Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

      researchmap

    • A Package for Triangulations. Peer-reviewed

      Tsuyoshi Ono, Yoshiaki Kyoda, Tomonari Masada, Kazuyoshi Hayase, Tetsuo Shibuya, Motoki Nakade, Mary Inaba, Hiroshi Imai, Keiko Imai, David Avis

      Proceedings of the Twelfth Annual Symposium on Computational Geometry, Philadelphia, PA, USA, May 24-26, 1996   V-17-V-18 - 17   1996

    • Enumeration of Regular Triangulations. Peer-reviewed

      Tomonari Masada, Hiroshi Imai, Keiko Imai

      Proceedings of the Twelfth Annual Symposium on Computational Geometry, Philadelphia, PA, USA, May 24-26, 1996   224 - 233   1996

    ▼display all

    Misc.

    • Text Mining for "Zenkyoto Generation"

        ( 2020 ) 297 - 302   5 12 2020

      More details

      Language:Japanese  

      CiNii Article

      researchmap

    • Learning Tasks That Enhance Student Participation in Lecture Class

      NIWA Kazuhisa, MASADA Tomonari, FUKUZAWA Katsuhiko, MINE Mariko, YAMAJI Hiroki

      Journal of the Center for Educational Innovation Nagasaki University5 ( 5 ) 19 - 24   3 2014

      More details

      Language:Japanese   Publisher:Nagasaki University  

      General education reform at Nagasaki University has required new pedagogies that enhance student participation in lecture class. The authors addressed this urgent issue by developing widely applicable methods in an interdisciplinary course titled "Information and Society." The course consisted of four lecture series of ICT application, in which 72 students engaged in learning tasks that were designed to facilitate note-taking of key concepts and generalreflection of the lecture content as well as the assessment of their comprehension level. The main instructor edited students' descriptions to put them onto the course site so that the whole class could share the learning and prepare for feedback sessions. Students also responded to questionnaires that were designed to inquire their prior conceptualizations. Future directions using effective learning tasks in lecture class are discussed.

      CiNii Article

      researchmap

      Other Link: http://hdl.handle.net/10069/34322

    • Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

      Tomonari Masada

      International Journal of Organizational and Collective Intelligence2 ( 2 ) 49 - 62   2011

      More details

    • An Automatic Optimization Technique of DMA Transfer and Data Allocation for Reconfigurable Machines

      SHIDA Sayaka, DOHI Keisuke, SHIBATA Yuichiro, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

      The IEICE transactions on information and systems92 ( 12 ) 2127 - 2136   1 12 2009

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      CiNii Article

      researchmap

    • Evaluation of circuit proliferation method that uses concept of pressure in PCA

      ARAKI Yuta, SHIBATA Yuichiro, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

      IEICE technical report109 ( 320 ) 19 - 24   26 11 2009

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      PCA is hardwired logic with self-reconfigurability that can dynamically modify the structure and append the functionality. However, an area management scheme in a distributed manner must be established in order to leverage the dynamic reconfigurability, which is still a challenging topic for PCA. This paper introduces a simple dynamic circuit construction method like cell proliferation, and proposes a new rule for proliferation. The evaluation results using random graphs show that the new rule can decrease the number of proliferation procedures compared to old rules.

      CiNii Article

      researchmap

    • FPGA implementation and accuracy evaluation of a power-supply voltage control circuit

      SOEJIMA Masato, SAKEMI Jyunya, SHIBATA Yuichiro, KUROKAWA Fujio, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

      IEICE technical report109 ( 198 ) 19 - 24   10 9 2009

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      Demands for more steady and more efficient direct voltage power supplies have been increasing in the context of energy conservation measures. Digital DC-DC converters are especially gathering attention because they have a high degree of reliability and flexibility. This paper addresses an FPGA-based control mechanism for a DC-DC converter. This approach enables programmability for a wide range of control algorithms while higher operational speed is anticipated. In this paper, we discuss required arithmetic precision and trade-off relationship between control accuracy and hardware costs through prototype implementation of an FPGA-based DC-DC converter. Evaluation of the prototype system reveals that fixed point arithmetic with 10-bit fraction part is enough in terms of dynamic characteristics. Design of a counter module which uses multiple phase-shifted clock signals to increase the PWM resolution while keeping the system clock frequency low is also discussed.

      CiNii Article

      researchmap

    • A Memory Access Optimization Method for Reconfigurable Systems Based on a Multithread Programming Model

      DOHI Keisuke, SHIDA Sayaka, SHIBATA Yuichiro, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

      IEICE technical report109 ( 26 ) 61 - 66   7 5 2009

      More details

      Language:English   Publisher:The Institute of Electronics, Information and Communication Engineers  

      Reconfigurable systems are known to be able to achieve higher performance than traditional microprocessor architecture for many application fields. However, in order to extract a full potential of the reconfigurable systems, programmers often have to design and describe the best suited code for their target architecture with specialized knowledge. The aim of this paper is to assist the users of reconfigurable systems by implementing a programming environment with a multithread model. The experimental results show our translator automatically generates efficient performance-aware code segments including DMA transfer and shift registers for memory access optimization.

      CiNii Article

      researchmap

    • A Sub-Petaflops High Performance Computing System using GPUs

      HAMADA Tsuyoshi, MASADA Tomonari, SHIBATA Yuichiro, OGURI Kiyoshi

      IEICE technical report108 ( 324 ) 17 - 19   21 11 2008

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      CiNii Article

      researchmap

    • GPU acceleration of multiple topic extraction from images by LDA document model

      MASADA Tomonari, HAMADA Tsuyoshi, SHIBATA Yuichiro, OGURI Kiyoshi

      IEICE technical report108 ( 324 ) 1 - 6   21 11 2008

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      In this paper, we propose a GPU acceleration of multi-topic extraction from images by using LDA (latent Dirichlet allocation). LDA is originally proposed as a probabilistic model for documents by Blei et al. In recent days, LDA is applied to multimedia information other than documents. We provide the results of experiments where we apply LDA to Professor Wang's 10,000 test images and extract multiple visional topics. We adpot collapsed variational Bayesian inference method for LDA and accelerate this by using Nvidia CUDA compatible GPU devices.

      CiNii Article

      researchmap

    • Dimensionality Reduction via Latent Dirichlet Allocation for Document Clustering

      MASADA Tomonari, KIYASU Senya, MIYAHARA Sueharu

      IPSJ SIG Notes2007 ( 65 ) 381 - 386   3 7 2007

      More details

      Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

      In this paper, we employ the latent Dirichlet allocation as a method for the dimensionality reduction of feature vectors and reveal its effectiveness in document clustering. In the evaluation experiment, we perform clustering on the document sets of Japanese and Korean Web news articles. We regard the categories assigned to each article as the ground truth of clustering evaluation. We compare the clustering results obtained by using the feature vectors whose entries are term frequencies with the results obtained by using the feature vectors whose dimensions are reduced by the latent Dirichlet allocation.

      CiNii Article

      researchmap

      Other Link: http://id.nii.ac.jp/1001/00018810/

    • Dimensionality Reduction via Latent Dirichlet Allocation for Document Clustering

      MASADA Tomonari, KIYASU Senya, MIYAHARA Sueharu

      IEICE technical report107 ( 131 ) 381 - 386   2 7 2007

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      In this paper, we employ the latent Dirichlet allocation as a method for the dimensionality reduction of feature vectors and reveal its effectiveness in document clustering. In the evaluation experiment, we perform clustering on the document sets of Japanese and Korean Web news articles. We regard the categories assigned to each article as the ground truth of clustering evaluation. We compare the clustering results obtained by using the feature vectors whose entries are term frequencies with the results obtained by using the feature vectors whose dimensions are reduced by the latent Dirichlet allocation.

      CiNii Article

      researchmap

    • Personal Name Disambiguation in Web Search Using Knowledge Base (jointly worked)

      Quang Minh VU, Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

      DBSJ Letters5 ( 4 ) 53 - 56   2007

      More details

    • Name Disambiguation in Web Search Using Knowledge Base

      MINH VU Quang, MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

      IPSJ SIG Notes2006 ( 78 ) 185 - 192   13 7 2006

      More details

      Language:English   Publisher:Information Processing Society of Japan (IPSJ)  

      Results of queries by personal names often contain documents related to several people because of namesake problem. In order to discriminate documents related to different people, it is required an effective method to measure document similarities and to find out relevant documents of the same person. Some previous researches have used cosine similarity method or have tried to extract common named entities for measuring similarities. We propose a new method which uses web directories as knowledge base to find out shared contexts in document pairs and uses the measurement of shared contexts as similarities between document pairs. Experimental results show that our proposed method outperforms cosine similarity method and common named entities method.

      CiNii Article

      researchmap

      Other Link: http://id.nii.ac.jp/1001/00018907/

    • Name Disambiguation in Web Search Using Knowledge Base

      VU Quang MINH, MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

      IEICE technical report106 ( 149 ) 143 - 148   6 7 2006

      More details

      Language:English   Publisher:The Institute of Electronics, Information and Communication Engineers  

      Results of queries by personal names often contain documents related to several people because of namesake problem. In order to discriminate documents related to different people, it is required an effective method to measure document similarities and to find out relevant documents of the same person. Some previous researches have used cosine similarity method or have tried to extract common named entities for measuring similarities. We propose a new method which uses web directories as knowledge base to find out shared contexts in document pairs and uses the measurement of shared contexts as similarities between document pairs. Experimental results show that our proposed method outperforms cosine similarity method and common named entities method.

      CiNii Article

      researchmap

    • 検索語の曖昧性解消のためのトピック指向単語抽出および単語クラスタリング

      若木裕美, 正田備也, 高須淳宏, 安達淳

      情報処理学会論文誌(トランザクション)データベース47 ( SIG19 ) 72 - 85   2006

      More details

    • Topic-oriented Term Extraction and Term Clustering for Query Focusing

      Hiromi WAKAKI, Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

      IPSJ Transactions on Databases47 ( SIG19 ) 72 - 85   2006

      More details

    • Query Ambiguity Indication Using Infrequent Term Cooccurrences

      WAKAKI Hiromi, MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

      IEICE technical report. Data engineering105 ( 172 ) 1 - 6   7 7 2005

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      Conventional search engines are designed mainly for general keyword search. Therefore, in many cases, we can find no appropriate combination of query terms. In this paper, we present a query disambiguation method by using infrequent term cooccurrences. This strategy comes from the following idea : terms appearing with a wide variety of terms cannot establish an independent topic. Based on this hypothesis, terms are weighted. The experimental results show that the terms ranked higher by our method can improve the average precision of Web search when added to the original query terms. As compared with other term ranking methods, our method gives higher ranks to the terms denoting more particular and adequate stuff and referring to more specific concepts.

      CiNii Article

      researchmap

    • リンク情報の利用によるWeb検索性能の改善

      正田備也, 高須淳宏, 安達淳

      情報処理学会論文誌(トランザクション)データベース46 ( SIG8 ) 48 - 59   2005

      More details

    • Improving Web Search Performance with Hyperlink Information

      Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

      IPSJ Transactions on Databases46 ( SIG8 ) 48 - 59   2005

      More details

    • Decomposing the Web Graph into Parameterized Connected Components

      MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

      IEICE Trans. Information and Systems87 ( 2 ) 380 - 388   1 2 2004

      More details

      Language:English   Publisher:The Institute of Electronics, Information and Communication Engineers  

      We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the World Wide Web, page grouping is expected to provide a general grasp of the Web for effective information search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our grouping method is a generalization of decomposition into strongly connected components, in which each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by adjusting a parameter, called the threshold parameter. We call the resulting groups parameterized connected components (PCCs). The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our grouping method. This paper also includes experimental results for 15 million Web pages, which show the contribution of our method to efficient Web surfer navigation.

      CiNii Article

      researchmap

    • 新しい連結性概念とWeb ページのグループ化への応用

      正田備也, 高須淳宏, 安達淳

      DBSJ Letters2 ( 1 ) 3 - 6   2003

      More details

      Language:Japanese   Publisher:日本データベース学会  

      CiNii Article

      researchmap

    • A New Notion of Connectivity and its Application to Web Page Grouping

      Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

        2 ( 1 ) 3 - 6   2003

      More details

    • Enumerating triangulations in general dimensions

      H Imai, T Masada, F Takeuchi, K Imai

      INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS12 ( 6 ) 455 - 480   12 2002

      More details

      Language:English   Publisher:WORLD SCIENTIFIC PUBL CO PTE LTD  

      We propose algorithms to enumerate (1) regular triangulations, (2) spanning regular triangulations, (3) equivalence classes of regular triangulations with respect to symmetry, and (4) all triangulations. All of the algorithms are for arbitrary points in general dimension. They work in output-size sensitive time with memory only of several times the size of a triangulation. For the enumeration of regular triangulations, we use the fact by Gel'fand, Zelevinskii and Kapranov that regular triangulations correspond to the vertices of the secondary polytope. We use reverse search technique by Avis and Fukuda, its extension for enumerating equivalence classes of objects, and a reformulation of a maximal independent set enumeration algorithm. The last approach can be extended for enumeration of dissections.

      DOI: 10.1142/S0218195902000980

      researchmap

    • Grouping Web pages based on parameterized connectivity

      MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

      IEICE technical report. Data engineering102 ( 208 ) 137 - 142   11 7 2002

      More details

      Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

      The rapid growth of the amount of information on WWW makes Web search methods based only on textual information more and more unrealistic. In recent years, many researches provide attractive link-based retrieving methods. This paper proposes a method for link-based Web page grouping, which aims to reduce the complexity of following text-based retrievals by enlarging the size of units for those retrievals. This method also makes the granularity of groups controllable by adjusting one threshold parameter. This paper includes the results of preliminary experiments, which clarify the characteristic of proposed grouping method.

      CiNii Article

      researchmap

    • パラメータ化された連結性に基づくWeb ページのグループ化

      正田備也, 高須淳宏, 安達淳

      DBSJ Letters1 ( 1 ) 47 - 50   2002

      More details

      Language:Japanese   Publisher:日本データベース学会  

      CiNii Article

      researchmap

    • Grouping Web Pages Based on Parameterized Connectivity

      Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

        1 ( 1 ) 47 - 50   2002

      More details

    ▼display all

    Presentations

    • Documents as a Bag of Maximal Substrings: An Unsupervised Feature Extraction for Document Clustering

      13th International Conference on Enterprise Information Systems (ICEIS 2011)  2011 

      More details

    • Semi-supervised Bibliographic Element Segmentation with Latent Permutations

      International Conference on Asia-Pacific Digital Libraries (ICADL 2011)  2011 

      More details

    • Infinite Latent Process Decomposition

      IEEE International Conference on Bioinformatics & Biomedicine (BIBM 2010)  2010 

      More details

      Presentation type:Poster presentation  

      researchmap

    • Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

      The 1st International Workshop on Web Intelligent Systems and Services (WISS 2010)  2010 

      More details

    • 시간에 따른 의미 변화 인지를 위한 가중치 구조의 적용

      2010 IEEK Summer Conference  2010 

      More details

    • Modeling Topical Trends over Continuous Time with Priors

      the seventh International Symposium on Neural Networks (ISNN 2010)  2010 

      More details

    • An Adaptive Weighting Scheme for Time-dependent Semantic Change Recognition

      2010 

      More details

    • Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA compatible devices

      IEA/AIE 2009  2009 

      More details

    • Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

      Proc. of the Joint Conference on Asia-Pacific Web Conference (APWeb) and Web-Age Information Management (WAIM)  2009 

      More details

    • Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

      ADMA 2009  2009 

      More details

    • Dynamic Hyperparameter Optimization for Bayesian Topical Trend Analysis

      CIKM 2009  2009 

      More details

      Presentation type:Poster presentation  

      researchmap

    • Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA compatible devices

      2009 

      More details

    • Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

      2009 

      More details

    • Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

      2009 

      More details

    • Dynamic Hyperparameter Optimization for Bayesian Topical Trend Analysis

      2009 

      More details

      Presentation type:Poster presentation  

      researchmap

    • Character Categorization via Latent Dirichlet Allocation for Kana Sequence Segmentation with Conditional Random Fields

      16th International Conference on Computers in Education (ICCE 2008)  2008 

      More details

      Presentation type:Poster presentation  

      researchmap

    • Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification

      Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2008  2008 

      More details

    • Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering.

      3rd International Conference on Large-scale Knowledge Resources  2008 

      More details

    • Character Categorization via Latent Dirichlet Allocation for Kana Sequence Segmentation with Conditional Random Fields

      2008 

      More details

      Presentation type:Poster presentation  

      researchmap

    • Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification

      2008 

      More details

    • Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering.

      2008 

      More details

    • Clustering Images with Multinomial Mixuture Models.

      8th International Symposium on advanced Intelligent Systems (ISIS 2007)  2007 

      More details

    • 書誌情報における著者名の曖昧性解消のためのクラスタリング手法の提案

      第18回データ工学ワークショップ  2007 

      More details

    • Clustering Images with Multinomial Mixuture Models.

      2007 

      More details

    • Link-Based Clustering for Finding Subrelevant Web Pages

      Third International Workshop on Web Document Analysis  2005 

      More details

    • Link-Based Clustering for Finding Subrelevant Web Pages

      2005 

      More details

    • Web Page Grouping Based on Parameterized Connectivity

      The 9th International Conference on Database Systems for Advanced Applications  2004 

      More details

    • Web Page Grouping Based on Parameterized Connectivity

      2004 

      More details

    • パラメータ化された連結性とWebページのグループ化への応用

      第2回 情報科学技術フォーラム (FIT2003)  2003 

      More details

    • パラメータ化された連結成分分解を用いたWeb情報の有効利用

      夏のデータベース・ワークショップ DBWS2003  2003 

      More details

    • パラメータ化された連結性に基づくWebページのグループ化

      第14回データ工学ワークショップ  2003 

      More details

    • パラメータ化された連結性に基づくWebページのグループ化

      第1回 情報科学技術フォーラム  2002 

      More details

    • パラメータ化された連結成分分解によるWebページのグループ化

      夏のデータベースワークショップ DBWS2002  2002 

      More details

    • Enumeration of Regular Triangulations

      12th annual ACM Symposium on Computational Geometry  1996 

      More details

    • Enumeration of Regular Triangulations

      1996 

      More details

    ▼display all

    Professional Memberships

    •  
      THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS

      More details

    •  
      INFORMATION PROCESSING SOCIETY OF JAPAN

      More details

    Works

    • 部分文字列の出現頻度を文書の特微量として用いたベイズ的トピックモデルに関する研究

      2011
      -
      2012

      More details

    • 統計学的ライムを利用した情報ナビゲーション

      2010
      -
      2012

      More details

    • 外的知識を利用としたッマルチトピック・モデルによる多様なテキスト情報の連結

      2010
      -
      2011

      More details

    • 「情報処理学会論文誌:データベース(TOD)」編集委員

      2007
      -
      2011

      More details

    • テキストの時間情報を利用したマルチトピック・ モデルによる文書間・単語間類似度への時間性の導入

      2009
      -
      2010

      More details

    • テキストの時間情報を利用したマルチトピック・モデルによる注目すべき話題群の時間的変遷の分析

      2008
      -
      2009

      More details

    ▼display all

    Research Projects

    • Topic models bridging between documents as members composing a corpus and documents as sequences composed by words

      Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research 

      More details

      4 2021 - 3 2024

      Grant number:21K12017

      Grant amount:\4030000 ( Direct Cost: \3100000 、 Indirect Cost:\930000 )

      researchmap

    • Research on the effectiveness of using RNN in topic models

      Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C) 

      MASADA Tomonari

      More details

      4 2018 - 3 2021

      Grant number:18K11440

      Grant amount:\4420000 ( Direct Cost: \3400000 、 Indirect Cost:\1020000 )

      Topic models, including LDA (latent Dirichlet allocation), can automatically extract semantically meaningful themes from a large corpus. However, text analysis using topic models often only considers word frequencies in a document and does not consider the way words are arranged. This work aims to improve topic models with RNN (recurrent neural network) for modeling word order. Several previous studies propose a method for combining RNN with topic models. Therefore, we have tried to propose a new method. As a result, we have proposed a new topic model using NNs (neural networks), where we perform no VAE (variational autoencoder) inference. We instead maximize the target given in the original LDA paper by training NNs in an amortized manner and obtaining posterior parameters as output of NNs. However, we currently only use MLP (multilayer perceptron) and thus have not achieved our goals yet. We now have a plan to replace MLP with RNN or other more recent NN architectures in near future.

      researchmap

    • Exploring Data Processing and Analysis Methods for Predicting Defect Occurrences in Semiconductor Production Lines

      Sony Semiconductor Manufacturing Corporation, Japan 

      More details

      4 2019 - 3 2020

      Authorship:Principal investigator  Grant type:Collaborative (industry/university)

      Grant amount:\2730000 ( Direct Cost: \2481000 、 Indirect Cost:\249000 )

      We propose a novel method to use the topics obtained by topic modeling for sensor data analysis. This paper describes a case study where we perform an exploratory data analysis of manufacturing sensor data by using latent Dirichlet allocation (LDA) as a tool to discover remarkable change patterns. Our target is a set of time-series data originating from the sensors installed in a closed factory environment. Each sensor gives a different type of measurement of the same manufacturing process, which is operated repeatedly in a lot-by-lot manner. We first discretize the data based on the histogram of sensor measurements and construct a bag-of-words representation. We then apply LDA to discover change patterns across tens of thousands of lots. When we apply LDA to natural language documents, the resulting topics are widely different from each other because the documents intrinsically show considerable diversity. In contrast, our data, which come from the repeatedly operated manufacturing process, only show limited diversity. As a result, LDA provides topics closely similar to each other. Our main and unexpected finding is that the difference between similar topics is useful in discovering remarkable change patterns. We performed an experiment over the data sets containing sensor measurements collected in the factory. The results have revealed that subtle difference between very similar topics often corresponds to an interesting change pattern of sensor measurements.

      researchmap

    • A Study on Digital Library System for Experimental Information Extraction, Visualization and Recommendation

      Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) 

      Takasu Atsuhiro, Ohta Manabu, Maneeroj Saranya

      More details

      4 2015 - 3 2018

      Grant number:15H02789

      Grant amount:\15860000 ( Direct Cost: \12200000 、 Indirect Cost:\3660000 )

      Researchers need to survey research trend in the related research fields in various tasks, such as research planning, research trend analysis, and writing papers. Digital libraries have been playing an important role in providing research papers fulltext. Fulltext search is a main technology for retrieving research papers. This study focuses on experiment information included in papers and developed sequence analysis models for extracting experiment information. We also developed a recommender system for actively providing scholarly information.

      researchmap

    • Tiny data mining: reconstruction of large scale data with probability distributions as bases

      Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C) 

      MASADA Tomonari

      More details

      4 2014 - 3 2017

      Grant number:26330256

      Grant amount:\4810000 ( Direct Cost: \3700000 、 Indirect Cost:\1110000 )

      The aim of our research is to make a efficient and effective summary of a large set of documents like news articles, academic papers, novels, etc. When the number of given documents is very large, we can only read a small portion of it. As a result, we may miss the documents containing our favorite topics. Therefore, our research aims to extract word lists from the give document set as a summary. For example, one among the extracted word lists was "game, hit, pitcher, and trade," we can know that there are documents discussing baseball. In this manner, by looking at the extracted word lists, we can know what kind of topics are discussed in the given document set. Furthermore, our research also provides a clue to find which documents are closely related to which word lists. Therefore, we can also find the documents relevant to the word lists we choose. While an existing method called topic modeling is adopted in our research, we propose its new application and its new implementation.

      researchmap

    • A Study on Information Alignment by Composite Generative Model

      Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) 

      TAKASU Atsuhiro, MASADA Tomonari, FUKAGAWA Daiji

      More details

      4 2011 - 3 2015

      Grant number:23300040

      Grant amount:\19890000 ( Direct Cost: \15300000 、 Indirect Cost:\4590000 )

      The purpose of this study is to develop topic models for analyzing information in various aspects. We first develop a topic model for handling time as well as text, where we add timestamps to each document. The model generates both text and timestamps simultaneously. Next we extend the model to treat networked documents where documents are linked each other like citations of academic papers. We apply the models to researcher recommendation systems and empirically show that features extracted by the models are effective for recommendation.

      researchmap

    • Information Navigation using Statistical Rhymes

      Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B) 

      MASADA Tomonari

      More details

      2010 - 2011

      Grant number:22700150

      Grant amount:\4030000 ( Direct Cost: \3100000 、 Indirect Cost:\930000 )

      This project is based on the following assumption : Words that co-occur in statistically significant frequency can be used as a guide in useful information navigation system even when those co-occurrences are not based on semantic similarity or relatedness. We call such co-occurrences statistical rhyme. We have been trying to extract statistical rhymes with Bayesian probabilistic models. We consequently succeeded in proposing a new LDA(latent Dirichlet allocation)-like topic extraction method that can give a segmentation of word token sequences appearing in bibliographic data, which we can observe in references section of academic papers or in publications section of researchers' Web sites. Our method split each bibliographic data into the segments each corresponding to different data field, e. g. authors, paper title, journal, pages, publication year, etc. Further, we improved segmentation accuracy by making the inference semi-supervised.

      researchmap

    • Algorithms for sub-pixel analysis of remotely, sensed hyperspectral images

      Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C) 

      KIYASU Senya, MIYAHARA Sueharu, MASADA Tomonari

      More details

      2005 - 2007

      Grant number:17560376

      Grant amount:\3740000 ( Direct Cost: \3500000 、 Indirect Cost:\240000 )

      In this research, we developed several algorithms for sub-pixel analysis of land cover for remotely sensed multispectral image. Several techniques of sub-pixel analysis for remotely sensed image have been developed which estimate the proportion of components of land cover in a pixel. However, when the available training data do not correctly represent the spectral characteristics of the categories in the pixel, large errors may appear in the results of estimation.
      We developed the algorithm by which a hyperspectral image is analyzed as follows. At first, we provide small size of initial training data and determine pure pixels in the image. In the next step, component spectra are adaptively estimated for each mixed pixel using the surrounding pure pixels. Then the proportions of components in the mixed pixels are estimated based on the determined component spectra. We confirmed the validity of the method by numerical simulation and applied it to remotely sensed multispectral images.

      researchmap

    ▼display all